Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

N+1 Problem in Hibernate and Spring Data JPA

$
0
0

1. Overview

Spring JPA and Hibernate provide a powerful tool for seamless database communication. However, because clients delegate more control to the frameworks, the resulting generated queries might be far from optimal.

In this tutorial, we’ll review a common N+1 problem while using Spring JPA and Hibernate. We’ll check different situations that might cause the problem.

2. Social Media Platform

To visualize the issue better, we need to outline the relationships between entities. Let’s take a simple social network platform as an example. There’ll be only Users and Posts:

We’re using Iterable in the diagrams, and we’ll provide concrete implementations for each example: List or Set.

To test the number of requests, we’ll use a dedicated library instead of checking the logs. However, we’ll refer to logs to better understand the structure of the requests.

The fetch type of relationships is assumed as default if not mentioned explicitly in each example. All to-one relationships have eager fetch and to-many – lazy. Also, the code examples use Lombok to reduce the noise in the code.

3. N+1 Problem

The N+1 problem is the situation when, for a single request, for example, fetching Users, we make additional requests for each User to get their information. Although this problem often is connected to lazy loading, it’s not always the case.

We can get this issue with any type of relationship. However, it usually arises from many-to-many or one-to-many relationships.

3.1. Lazy Fetch

First of all, let’s see how lazy loading might cause the N+1 problem. We’ll consider the following example:

@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author")
    protected List<Post> posts;
    // constructors, getters, setters, etc.
}

Users have one-to-many relationships with the Posts. This means that each User has multiple Posts. We didn’t explicitly identify the fetch strategy for the fields. The strategy is inferred from the annotations. As was mentioned previously, @OneToMany has lazy fetch by default:

@Target({METHOD, FIELD}) 
@Retention(RUNTIME)
public @interface OneToMany {
    Class targetEntity() default void.class;
    CascadeType[] cascade() default {};
    FetchType fetch() default FetchType.LAZY;
    String mappedBy() default "";
    boolean orphanRemoval() default false;
}

If we’re trying to get all the Users, lazy fetch won’t pull more information than we accessed:

@Test
void givenLazyListBasedUser_WhenFetchingAllUsers_ThenIssueOneRequests() {
    getUserService().findAll();
    assertSelectCount(1);
}

Thus, to get all Users, we’ll issue a single request. Let’s try to access Posts. Hibernate will issue an additional request because the information wasn’t fetched beforehand. For a single User, it means two requests overall:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenLazyListBasedUser_WhenFetchingOneUser_ThenIssueTwoRequest(Long id) {
    getUserService().getUserByIdWithPredicate(id, user -> !user.getPosts().isEmpty());
    assertSelectCount(2);
}

The getUserByIdWithPredicate(Long, Predicate) method filters the Users, but its main goal in the tests is to trigger the loading. We’ll have 1+1 requests, but if we scale it, we’ll get the N+1 problem:

@Test
void givenLazyListBasedUser_WhenFetchingAllUsersCheckingPosts_ThenIssueNPlusOneRequests() {
    int numberOfRequests = getUserService().countNumberOfRequestsWithFunction(users -> {
        List<List<Post>> usersWithPosts = users.stream()
          .map(User::getPosts)
          .filter(List::isEmpty)
          .toList();
        return users.size();
    });
    assertSelectCount(numberOfRequests + 1);
}

We should be careful about lazy fetch. In some cases, lazy loading makes sense to reduce the data we get from a database. However, if we’re accessing lazily-fetched information in most cases, we might increase the volume of requests. To make the best judgment, we must investigate the access patterns.

3.2. Eager Fetch

In most cases, eager loading can help us with the N+1 problem. However, the result depends on the relationships between our entities. Let’s consider a similar User class but with an explicitly set eager fetch:

@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    private List<Post> posts;
    // constructors, getters, setters, etc.
}

If we fetch a single user, the fetch type will force Hibernate to load all the data in a single request:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedUser_WhenFetchingOneUser_ThenIssueOneRequest(Long id) {
    getUserService().getUserById(id);
    assertSelectCount(1);
}

At the same time, the situation with fetching all users changes. We’ll get N+1 straight away whether we want to use the Posts or not:

@Test
void givenEagerListBasedUser_WhenFetchingAllUsers_ThenIssueNPlusOneRequests() {
    List<User> users = getUserService().findAll();
    assertSelectCount(users.size() + 1);
}

Although eager fetch changed how Hibernate pulls the data, it’s hard to call it a successful optimization.

4. Multiple Collections

Let’s introduce Groups in our initial domain:

The Group contains a List of Users:

@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany
    private List<User> members;
    // constructors, getters, setters, etc.
}

4.1. Lazy Fetch

This relationship would generally behave similarly to the previous examples with lazy fetch. We’ll get a new request for each access to lazily pulled information.

Thus, unless we access users directly, we’ll have a single request:

@Test
void givenLazyListBasedGroup_whenFetchingAllGroups_thenIssueOneRequest() {
    groupService.findAll();
    assertSelectCount( 1);
}
@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenLazyListBasedGroup_whenFetchingAllGroups_thenIssueOneRequest(Long groupId) {
    Optional<Group> group = groupService.findById(groupId);
    assertThat(group).isPresent();
    assertSelectCount(1);
}

However, it would create the N+1 problem if we try to access each User in a group:

@Test
void givenLazyListBasedGroup_whenFilteringGroups_thenIssueNPlusOneRequests() {
    int numberOfRequests = groupService.countNumberOfRequestsWithFunction(groups -> {
        groups.stream()
          .map(Group::getMembers)
          .flatMap(Collection::stream)
          .collect(Collectors.toSet());
        return groups.size();
    });
    assertSelectCount(numberOfRequests + 1);
}

The countNumberOfRequestsWithFunction(ToIntFunction) method counts the requests and also triggers lazy loading.

4.2. Eager Fetch

Let’s check the behavior with eager fetch. While requesting a single group, we’ll get the following result:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests(Long groupId) {
    Optional<Group> group = groupService.findById(groupId);
    assertThat(group).isPresent();
    assertSelectCount(1 + group.get().getMembers().size());
}

It’s reasonable, as we need to get the information for each User eagerly. At the same time, when we get all groups, the number of requests jumps significantly:

@Test
void givenEagerListBasedGroup_whenFetchingAllGroups_thenIssueNPlusMPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    Set<User> users = groups.stream().map(Group::getMembers).flatMap(List::stream).collect(Collectors.toSet());
    assertSelectCount(groups.size() + users.size() + 1);
}

We need to get the information about Users, and then, for each User, we fetch their Posts. Technically, we have an N+M+1 situation. Thus, neither lazy nor eager fetch entirely resolved the problem.

4.3. Using Set

Let’s approach this situation differently. Let’s replace Lists with Sets. We’ll be using eager fetch, as lazy Sets and List behave similarly:

@Entity
public class Group {
    @Id
    private Long id;
    private String name;
    @ManyToMany(fetch = FetchType.EAGER)
    private Set<User> members;
    // constructors, getters, setters, etc.
}
@Entity
public class User {
    @Id
    private Long id;
    private String username;
    private String email;
    @OneToMany(cascade = CascadeType.ALL, mappedBy = "author", fetch = FetchType.EAGER)
    protected Set<Post> posts;
    // constructors, getters, setters, etc.
}
@Entity
public class Post {
    @Id
    private Long id;
    @Lob
    private String content;
    @ManyToOne
    private User author;
    // constructors, getters, setters, etc.
}

Let’s run similar tests to see if this makes any difference:

@ParameterizedTest
@ValueSource(longs = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10})
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenCreateCartesianProductInOneQuery(Long groupId) {
    groupService.findById(groupId);
    assertSelectCount(1);
}

We resolved the N+1 problem while getting a single Group. Hibernate fetched Users and their Posts in one request. Also, getting all Groups has decreased the number of requests, but it’s still N+1:

@Test
void givenEagerSetBasedGroup_whenFetchingAllGroups_thenIssueNPlusOneRequests() {
    List<Group> groups = groupService.findAll();
    assertSelectCount(groups.size() + 1);
}

Although we partially solved the problem, we created another one. Hibernates uses several JOINs, creating the Cartesian product:

SELECT g.id, g.name, gm.interest_group_id,
       u.id, u.username, u.email,
       p.id, p.author_id, p.content
FROM group g
         LEFT JOIN (group_members gm JOIN user u ON u.id = gm.members_id)
                   ON g.id = gm.interest_group_id
         LEFT JOIN post p ON u.id = p.author_id
WHERE g.id = ?

The query might become overly complex and, with many dependencies between objects, pull a huge chunk of a database.

Due to the nature of Sets, Hibernate can ensure that all the duplicates in the result set are from the Cartesian product. This is not possible with lists, so data should be fetched in separate requests to maintain its integrity when using lists.

Most relationships align with the Set invariants. It makes little sense to allow Users to have several identical Posts. At the same time, we can provide a fetch mode explicitly instead of relying on default behavior.

5. Tradeoffs

Picking a fetch type might help reduce the number of requests in simple cases. However, using simple annotations, we have limited control over query generation. Also, it’s done transparently, and small changes in the domain model might create a dramatic impact.

The best way to address the issue is to observe the system’s behavior and identify the access patterns. Creating separate methods, SQL and JPQL queries can help tailor them for each case. Also, we can use fetch mode to hint Hibernate about how we load related entities.

Adding simple tests can help with unintended changes in the model. This way, we can ensure that new relationships won’t create the Cartesian product or N+1 problem.

6. Conclusion

While eager fetch type can mitigate some simple issues with additional queries, it might cause other issues. It’s necessary to test the application to ensure its performance.

Different combinations of fetch types and relationships can often produce an unexpected result. That’s why it’s better to cover crucial parts with tests.

As usual, all the code from this tutorial is available over on GitHub.

       

Viewing all articles
Browse latest Browse all 4535

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>