1. Overview
Going out to the database is expensive. We may be able to improve performance and consistency by batching multiple inserts into one.
In this tutorial, we’ll look at how to do this with Spring Data JPA.
2. Spring JPA Repository
First, we’ll need a simple entity. Let’s call it Customer:
@Entity public class Customer { @Id @GeneratedValue(strategy = GenerationType.AUTO) private Long id; private String firstName; private String lastName; // constructor, getters, setters }
And then, we need our repository:
public interface CustomerRepository extends CrudRepository<Customer, Long> { }
This exposes a saveAll method for us, which will batch several inserts into one.
So, let’s leverage that in a controller:
@RestController public class CustomerController { @Autowired CustomerRepository customerRepository; @PostMapping("/customers") public ResponseEntity<String> insertCustomers() { Customer c1 = new Customer("James", "Gosling"); Customer c2 = new Customer("Doug", "Lea"); Customer c3 = new Customer("Martin", "Fowler"); Customer c4 = new Customer("Brian", "Goetz"); List<Customer> customers = Arrays.asList(c1, c2, c3, c4); customerRepository.saveAll(customers); return ResponseEntity.created("/customers"); } // ... @GetMapping to read customers }
3. Testing Our Endpoint
Testing our code is simple with MockMvc:
@Autowired private MockMvc mockMvc; @Test public void whenInsertingCustomers_thenCustomersAreCreated() throws Exception { this.mockMvc.perform(post("/customers")) .andExpect(status().isCreated())); }
4. Are We Sure We’re Batching?
So, actually, there is a just a bit more configuration to do – let’s do a quick demo to illustrate the difference.
First, let’s add the following property to application.properties to see some statistics:
spring.jpa.properties.hibernate.generate_statistics=true
At this point, if we run the test, we’ll see stats like the following:
11232586 nanoseconds spent preparing 4 JDBC statements; 4076610 nanoseconds spent executing 4 JDBC statements; 0 nanoseconds spent executing 0 JDBC batches;
So, we created four customers, which is great, but note that none of them were inside a batch.
The reason is that batching isn’t switched on by default in some cases.
In our case, it’s because we are using id auto-generation. So, by default, saveAll does each insert separately.
So, let’s switch it on:
spring.jpa.properties.hibernate.jdbc.batch_size=4 spring.jpa.properties.hibernate.order_inserts=true
The first property tells Hibernate to collect inserts in batches of four. The order_inserts property tells Hibernate to take the time to group inserts by entity, creating larger batches.
So, the second time we run our test, we’ll see the inserts were batched:
16577314 nanoseconds spent preparing 4 JDBC statements; 2207548 nanoseconds spent executing 4 JDBC statements; 2003005 nanoseconds spent executing 1 JDBC batches;
We can apply the same approach to deletes and updates (remembering that Hibernate also has an order_updates property).
5. Conclusion
With the ability to batch inserts, we can see some performance gains.
We, of course, need to be aware that batching is automatically disabled in some cases, and we should check and plan for this before we ship.
Make sure to check out all these code snippets over on GitHub.