1. Introduction
You’ve probably read some of the complaints about bad Hibernate performance or maybe you’ve struggled with some of them yourself. I have been using Hibernate for more than 15 years now and I have run into more than enough of these issues.
Over the years, I’ve learned that these problems can be avoided and that you can find a lot of them in your log file. In this post, I want to show you how you can find and fix 3 of them.
If you like this post, make sure to read to the end of it to download a free cheat sheet with even more Hibernate performance tuning tips.
2. Find and Fix Performance Problems
2.1. Log SQL Statements in Production
The first performance issue is extremely easy to spot and often ignored. It’s the logging of SQL statements in a production environment.
Writing some log statements doesn’t sound like a big deal, and there are a lot of applications out there which do exactly that. But it is extremely inefficient, especially via System.out.println as Hibernate does it if you set the show_sql parameter in your Hibernate configuration to true:
Hibernate: select order0_.id as id1_2_, order0_.orderNumber as orderNum2_2_, order0_.version as version3_2_ from purchaseOrder order0_ Hibernate: select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=? Hibernate: select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=? Hibernate: select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=?
In one of my projects, I improved the performance by 20% within a few minutes by setting show_sql to false. That’s the kind of achievement you like to report in the next stand-up meeting
It’s pretty obvious how you can fix this performance issue. Just open your configuration (e.g. your persistence.xml file) and set the show_sql parameter to false. You don’t need this information in production anyways.
But you might need them during development. If you don’t, you use 2 different Hibernate configurations (which you shouldn’t) you deactivated the SQL statement logging there as well. The solution for that is to use 2 different log configurations for development and production which are optimized for the specific requirements of the runtime environment.
Development Configuration
The development configuration should provide as many useful information as possible so that you can see how Hibernate interacts with the database. You should therefore at least log the generated SQL statements in your development configuration. You can do this by activating DEBUG message for the org.hibernate.SQL category. If you also want to see the values of your bind parameters, you have to set the log level of org.hibernate.type.descriptor.sql to TRACE:
log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p [%c] - %m%n log4j.rootLogger=info, stdout # basic log level for all messages log4j.logger.org.hibernate=info # SQL statements and parameters log4j.logger.org.hibernate.SQL=debug log4j.logger.org.hibernate.type.descriptor.sql=trace
The following code snippet shows some example log messages which Hibernate writes with this log configuration. As you can see, you get detailed information about the executed SQL query and all set and retrieved parameter values:
23:03:22,246 DEBUG SQL:92 - select order0_.id as id1_2_, order0_.orderNumber as orderNum2_2_, order0_.version as version3_2_ from purchaseOrder order0_ where order0_.id=1 23:03:22,254 TRACE BasicExtractor:61 - extracted value ([id1_2_] : [BIGINT]) - [1] 23:03:22,261 TRACE BasicExtractor:61 - extracted value ([orderNum2_2_] : [VARCHAR]) - [order1] 23:03:22,263 TRACE BasicExtractor:61 - extracted value ([version3_2_] : [INTEGER]) - [0]
Hibernate provides you with a lot more internal information about a Session if you activate the Hibernate statistics. You can do this by setting the system property hibernate.generate_statistics to true.
But please, only activate the statistics on your development or test environment. Collecting all these information slows down your application and you might create your performance problems yourself if you activate it this in production.
You can see some example statistics in the following code snippet:
23:04:12,123 INFO StatisticalLoggingSessionEventListener:258 - Session Metrics { 23793 nanoseconds spent acquiring 1 JDBC connections; 0 nanoseconds spent releasing 0 JDBC connections; 394686 nanoseconds spent preparing 4 JDBC statements; 2528603 nanoseconds spent executing 4 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches; 0 nanoseconds spent performing 0 L2C puts; 0 nanoseconds spent performing 0 L2C hits; 0 nanoseconds spent performing 0 L2C misses; 9700599 nanoseconds spent executing 1 flushes (flushing a total of 9 entities and 3 collections); 42921 nanoseconds spent executing 1 partial-flushes (flushing a total of 0 entities and 0 collections) }
I regularly use these statistics in my daily work to find performance issues before they occur in production and I could write several posts just about that. So let’s just focus on the most important ones.
The lines 2 to 5 show you how many JDBC connections and statements Hibernate used during this session and how much time it spent on it. You should always have a look at these values and compare them with your expectations.
If there are a lot more statements than you expected, you most likely have the most common performance problem, the n+1 select issue. You can find it in nearly all applications, and it might create huge performance issues on a bigger database. I explain this issue in more details in the next section.
The lines 7 to 9 show how Hibernate interacted with the 2nd level cache. This is one of Hibernate’s 3 caches, and it stores entities in a session independent way. If you use the 2nd level in your application, you should always monitor these statistics to see if Hibernate gets the entities from there.
Production Configuration
The production configuration should be optimized for performance and avoid any messages that are not urgently required. In general, that means that you should only log error messages. If you use Log4j, you can achieve that with the following configuration:
If you use Log4j, you can achieve that with the following configuration:
log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.Target=System.out log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d{HH:mm:ss,SSS} %-5p [%c] - %m%n log4j.rootLogger=info, stdout # basic log level for all messages log4j.logger.org.hibernate=error
2.2. N+1 Select Issue
As I already explained, the n+1 select issue is the most common performance problem. A lot of developers blame the OR-Mapping concept for this issue, and they’re not completely wrong. But you can easily avoid it if you understand how Hibernate treats lazily fetched relationships. The developer is, therefore, to blame as well because it’s his responsibility to avoid these kinds of issues. So let me explain first why this issue exists and then show you an easy way to prevent it. If you are already familiar with the n+1 select issues, you can jump directly to the solution.
Hibernate provides a very convenient mapping for relationships between entities. You just need an attribute with the type of the related entity and a few annotations to define it:
@Entity @Table(name = "purchaseOrder") public class Order implements Serializable { @OneToMany(mappedBy = "order", fetch = FetchType.LAZY) private Set<OrderItem> items = new HashSet<OrderItem>(); ... }
When you now load an Order entity from the database, you just need to call the getItems() method to get all items of this order. Hibernate hides the required database queries to get the related OrderItem entities from the database.
When you started with Hibernate, you probably learned that you should use FetchType.LAZY for most of the relationships and that it’s the default for to-many relationships. This tells Hibernate to only fetch the related entities if you use the attribute which maps the relationship. Fetching only the data you need is a good thing in general, but it also requires Hibernate to execute an additional query to initialize each relationship. This can result in a huge number of queries, if you work on a list of entities, like I do in the following code snippet:
List<Order> orders = em.createQuery("SELECT o FROM Order o").getResultList(); for (Order order : orders) { log.info("Order: " + order.getOrderNumber()); log.info("Number of items: " + order.getItems().size()); }
You probably wouldn’t expect that this few lines of code can create hundreds or even thousands of database queries. But it does if you use FetchType.LAZY for the relationship to the OrderItem entity:
22:47:30,065 DEBUG SQL:92 - select order0_.id as id1_2_, order0_.orderNumber as orderNum2_2_, order0_.version as version3_2_ from purchaseOrder order0_ 22:47:30,136 INFO NamedEntityGraphTest:58 - Order: order1 22:47:30,140 DEBUG SQL:92 - select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=? 22:47:30,171 INFO NamedEntityGraphTest:59 - Number of items: 2 22:47:30,171 INFO NamedEntityGraphTest:58 - Order: order2 22:47:30,172 DEBUG SQL:92 - select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=? 22:47:30,174 INFO NamedEntityGraphTest:59 - Number of items: 2 22:47:30,174 INFO NamedEntityGraphTest:58 - Order: order3 22:47:30,174 DEBUG SQL:92 - select items0_.order_id as order_id4_0_0_, items0_.id as id1_0_0_, items0_.id as id1_0_1_, items0_.order_id as order_id4_0_1_, items0_.product_id as product_5_0_1_, items0_.quantity as quantity2_0_1_, items0_.version as version3_0_1_ from OrderItem items0_ where items0_.order_id=? 22:47:30,176 INFO NamedEntityGraphTest:59 - Number of items: 2
Hibernate performs one query to get all Order entities and an additional one for each of the n Order entities to initialize the orderItem relationship. So you now know why this kind of issue is called n+1 select issue and why it can create huge performance problems.
What makes it even worse is, that you often don’t recognize it on a small test database, if you haven’t checked your Hibernate statistics. The code snippet requires only a few dozen queries if the test database doesn’t contain a lot of orders. But that will be completely different if you use your productive database which contains several thousand of them.
I said earlier that you can easily avoid these issues. And that’s true. You just have to initialize the orderItem relationship when you select the Order entities from the database.
But please, only do that, if you use the relationship in your business code and don’t use FetchType.EAGER to always fetch the related entities. That just replaces your n+1 issue with another performance problem.
Initialize a Relationships with a @NamedEntityGraph
There are several different options to initialize relationships. I prefer to use a @NamedEntityGraph which is is one of my favorite features introduced in JPA 2.1. It provides a query independent way to specify a graph of entities which Hibernate shall fetch from the database. In following code snippet, you can see an example of a simple graph that lets Hibernate eagerly fetch the items attribute of an entity:
@Entity @Table(name = "purchase_order") @NamedEntityGraph( name = "graph.Order.items", attributeNodes = @NamedAttributeNode("items")) public class Order implements Serializable { ... }
There isn’t much you need to do to define an entity graph with a @NamedEntityGraph annotation. You just have to provide a unique name for the graph and one @NamedAttributeNode annotation for each attribute Hibernate shall fetch eagerly. In this example, it’s only the items attribute which maps the relationship between an Order and several OrderItem entities.
Now you can use the entity graph to control the fetching behaviour or a specific query. You, therefore, have to instantiate an EntityGraph based on the @NamedEntityGraph definition and provide it as a hint to the EntityManager.find() method or your query. I do this in the following code snippet where I select the Order entity with id 1 from the database:
EntityGraph graph = this.em.getEntityGraph("graph.Order.items"); Map hints = new HashMap(); hints.put("javax.persistence.fetchgraph", graph); return this.em.find(Order.class, 1L, hints);
Hibernate uses this information to create one SQL statement which gets the attributes of the Order entity and the attributes of the entity graph from the database:
17:34:51,310 DEBUG [org.hibernate.loader.plan.build.spi.LoadPlanTreePrinter] (pool-2-thread-1) LoadPlan(entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order) - Returns - EntityReturnImpl( entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order, querySpaceUid=<gen:0>, path=blog.thoughts.on.java.jpa21.entity.graph.model.Order) - CollectionAttributeFetchImpl( collection=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items, querySpaceUid=<gen:1>, path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items) - (collection element) CollectionFetchableElementEntityGraph( entity=blog.thoughts.on.java.jpa21.entity.graph.model.OrderItem, querySpaceUid=<gen:2>, path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items.<elements>) - EntityAttributeFetchImpl(entity=blog.thoughts.on.java.jpa21.entity.graph.model.Product, querySpaceUid=<gen:3>, path=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items.<elements>.product) - QuerySpaces - EntityQuerySpaceImpl(uid=<gen:0>, entity=blog.thoughts.on.java.jpa21.entity.graph.model.Order) - SQL table alias mapping - order0_ - alias suffix - 0_ - suffixed key columns - {id1_2_0_} - JOIN (JoinDefinedByMetadata(items)) : <gen:0> -> <gen:1> - CollectionQuerySpaceImpl(uid=<gen:1>, collection=blog.thoughts.on.java.jpa21.entity.graph.model.Order.items) - SQL table alias mapping - items1_ - alias suffix - 1_ - suffixed key columns - {order_id4_2_1_} - entity-element alias suffix - 2_ - 2_entity-element suffixed key columns - id1_0_2_ - JOIN (JoinDefinedByMetadata(elements)) : <gen:1> -> <gen:2> - EntityQuerySpaceImpl(uid=<gen:2>, entity=blog.thoughts.on.java.jpa21.entity.graph.model.OrderItem) - SQL table alias mapping - items1_ - alias suffix - 2_ - suffixed key columns - {id1_0_2_} - JOIN (JoinDefinedByMetadata(product)) : <gen:2> -> <gen:3> - EntityQuerySpaceImpl(uid=<gen:3>, entity=blog.thoughts.on.java.jpa21.entity.graph.model.Product) - SQL table alias mapping - product2_ - alias suffix - 3_ - suffixed key columns - {id1_1_3_} 17:34:51,311 DEBUG [org.hibernate.loader.entity.plan.EntityLoader] (pool-2-thread-1) Static select for entity blog.thoughts.on.java.jpa21.entity.graph.model.Order [NONE:-1]: select order0_.id as id1_2_0_, order0_.orderNumber as orderNum2_2_0_, order0_.version as version3_2_0_, items1_.order_id as order_id4_2_1_, items1_.id as id1_0_1_, items1_.id as id1_0_2_, items1_.order_id as order_id4_0_2_, items1_.product_id as product_5_0_2_, items1_.quantity as quantity2_0_2_, items1_.version as version3_0_2_, product2_.id as id1_1_3_, product2_.name as name2_1_3_, product2_.version as version3_1_3_ from purchase_order order0_ left outer join OrderItem items1_ on order0_.id=items1_.order_id left outer join Product product2_ on items1_.product_id=product2_.id where order0_.id=?
Initializing only one relationship is good enough for a blog post but in a real project, you will most likely want to build more complex graphs. So let’s do that.
You can, of course, provide an array of @NamedAttributeNode annotations to fetch multiple attributes of the same entity and you can use @NamedSubGraph to define the fetching behaviour for an additional level of entities. I use that in the following code snippet to fetch not only all related OrderItem entities but also the Product entity for each OrderItem:
@Entity @Table(name = "purchase_order") @NamedEntityGraph( name = "graph.Order.items", attributeNodes = @NamedAttributeNode(value = "items", subgraph = "items"), subgraphs = @NamedSubgraph(name = "items", attributeNodes = @NamedAttributeNode("product"))) public class Order implements Serializable { ... }
As you can see, the definition of a @NamedSubGraph is very similar to the definition of a @NamedEntityGraph. You can then reference this subgraph in a @NamedAttributeNode annotation to define the fetching behaviour for this specific attribute.
The combination of these annotations allows you to define complex entity graphs which you can use to initialize all relationships you use in your use case and avoid n+1 select issues. If you want to specify your entity graph dynamically at runtime, you can do this also via a Java API.
2.3. Update Entities One by One
Updating entities one by one feels very natural if you think in an object oriented way. You just get the entities you want to update and call a few setter methods to change their attributes like you do it with any other object.
This approach works fine if you only change a few entities. But it gets very inefficient when you work with a list of entities and is the third performance issues you can easily spot in your log file. You just have to look for a bunch SQL UPDATE statements that look completely the same, as you can see in the following log file:
22:58:05,829 DEBUG SQL:92 - select product0_.id as id1_1_, product0_.name as name2_1_, product0_.price as price3_1_, product0_.version as version4_1_ from Product product0_ 22:58:05,883 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=? 22:58:05,889 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=? 22:58:05,891 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=? 22:58:05,893 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=? 22:58:05,900 DEBUG SQL:92 - update Product set name=?, price=?, version=? where id=? and version=?
The relational representation of the database records is a much better fit for these use cases than the object oriented one. With SQL, you could just write one SQL statement that updates all the records you want to change.
You can do the same with Hibernate if you use JPQL, native SQL or the CriteriaUpdate API. All 3 of very similar, so let’s use JPQL in this example.
You can define a JPQL UPDATE statement in a similar way as you know it from SQL. You just define which entity you want to update, how to change the values of its attributes and limit the affected entities in the WHERE statement.
You can see an example of it in the following code snippet where I increase the price of all products by 10%:
em.createQuery("UPDATE Product p SET p.price = p.price*0.1").executeUpdate();
Hibernate creates an SQL UPDATE statement based on the JPQL statement and sends it to the database which performs the update operation.
It’s pretty obvious that this approach is a lot faster if you have to update a huge number entities. But it also has a drawback. Hibernate doesn’t know which entities are affected by the update operation and doesn’t update its 1st level cache. You should, therefore, make sure not to read and update an entity with a JPQL statement within the same Hibernate Session or you have to detach it to remove it from the cache.
3. Summary and Performance Cheat Sheet
Within this post, I’ve shown you 3 Hibernate performance issues which you can find in your log files.
2 of them were caused by a huge number of SQL statements. This a common reason for performance issues, if you work with Hibernate. Hibernate hides the database access behind its API and that makes it often difficult to guess the actual number of SQL statements. You should therefore always check the executed SQL statements when you make a change to your persistence tier.
If you like to learn more Hibernate performance tuning, make sure to download my cheat sheet with 7 Hibernate Performance Tuning Tips.