Hibernate could not initialize proxy – no Session

June 4, 2020, 8:52 am

1. Overview

Working with Hibernate, we might have encountered an error that says: org.hibernate.LazyInitializationException : could not initialize proxy – no Session.

In this quick tutorial, we’ll take a closer look at the root cause of the error and learn how to avoid it.

2 Understanding the Error

Access to a lazy-loaded object outside of the context of an open Hibernate session will result in this exception.

It's important to understand what are Session, Lazy Initialisation, and Proxy Object and how they come together in the Hibernate framework.

Session is a persistence context that represents a conversation between an application and database
Lazy Loading means that the object will not be loaded to the Session context until it is accessed in code.
Hibernate creates a dynamic Proxy Object subclass that will hit the database only when we first use the object.

This error means that we try to fetch a lazy-loaded object from the database by using a proxy object, but the Hibernate session is already closed.

3. Example for LazyInitializationException

Let's see the exception in a concrete scenario.

We want to create a simple User object with associated roles. Let's use JUnit to demonstrate the LazyInitializationException error.

3.1. Hibernate Utility Class

First, let's define a HibernateUtil class to create a SessionFactory with configuration.

We’ll use the in-memory HSQLDB database.

3.2. Entities

Here's our User entity :

@Entity
@Table(name = "user")
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "id")
    private int id;

    @Column(name = "first_name")
    private String firstName;
    
    @Column(name = "last_name")
    private String lastName;
    
    @OneToMany
    private Set<Role> roles;
    
}

And the associated Role entity :

@Entity
@Table(name = "role")
public class Role {
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    @Column(name = "id")
    private int id;

    @Column(name = "role_name")
    private String roleName;
}

As we can see, there is a one-to-many relationship between User and Role.

3.3. Creating User with Roles

Next, let's create two Role objects :

Role admin = new Role("Admin");
Role dba = new Role("DBA");

Then, we create a User with the roles :

User user = new User("Bob", "Smith");
user.addRole(admin);
user.addRole(dba);

Finally, we can open a session and persist the objects :

Session session = sessionFactory.openSession();
session.beginTransaction();
user.getRoles().forEach(role -> session.save(role));
session.save(user);
session.getTransaction().commit();
session.close();

3.4. Fetching Roles

In the first scenario, we’ll see how to fetch user roles in a proper way :

@Test
public void whenAccessUserRolesInsideSession_thenSuccess() {

    User detachedUser = createUserWithRoles();

    Session session = sessionFactory.openSession();
    session.beginTransaction();
		
    User persistentUser = session.find(User.class, detachedUser.getId());
		
    Assert.assertEquals(2, persistentUser.getRoles().size());
		
    session.getTransaction().commit();
    session.close();
}

Here, we access the object inside the session, therefore there's no error.

3.5. Fetching Roles Failure

In the second scenario, we’ll call a getRoles method outside the session :

@Test
public void whenAccessUserRolesOutsideSession_thenThrownException() {
		
    User detachedUser = createUserWithRoles();

    Session session = sessionFactory.openSession();
    session.beginTransaction();
		
    User persistentUser = session.find(User.class, detachedUser.getId());
		
    session.getTransaction().commit();
    session.close();

    thrown.expect(LazyInitializationException.class);
    System.out.println(persistentUser.getRoles().size());
}

In that case, we try to access the roles after the session was closed, and, as a result, the code throws a LazyInitializationException.

4. How to Avoid the Error

Let's take a look at four different solutions to overcome the error.

4.1. Open Session in Upper Layer

The best practice is to open a session in the persistence layer, for example using the DAO Pattern.

We can open the session in upper layers to access the associated objects in a safe manner. For example, we can open the session in the View layer.

As a result, we’ll see an increase in response time, which will affect the performance of the application.

This solution is an anti-pattern in terms of the Separation of Concerns principle. In addition, it can cause data integrity violations and long-running transactions.

4.2. Turning On enable_lazy_load_no_trans Property

This Hibernate property is used to declare a global policy for lazy-loaded object fetching.

By default, this property is false. Turning it on means that each access to an associated lazy-loaded entity will be wrapped in new session running in a new transaction:

<property name="hibernate.enable_lazy_load_no_trans" value="true"/>

Using this property to avoid LazyInitializationException error is not recommended since it will slow down the performance of our application. This is because we'll end up with an n + 1 problem. Simply put, that means one SELECT for the User and N additional SELECTs to fetch the roles of each user.

This approach is not efficient and also considered an anti-pattern.

4.3. Using FetchType.EAGER strategy

We can use this strategy along with a @OneToMany annotation, for example :

@OneToMany(fetch = FetchType.EAGER)
@JoinColumn(name = "user_id")
private Set<Role> roles;

This is a kind of compromised solution for a particular usage when we need to fetch the associated collection for most of our use cases.

So it's much easier to declare the EAGER fetch type instead of explicitly fetching the collection for most of the different business flows.

4.4. Using Join Fetching

We can use a JOIN FETCH directive in JPQL to fetch the associated collection on-demand, for example :

SELECT u FROM User u JOIN FETCH u.roles

Or we can use the Hibernate Criteria API :

Criteria criteria = session.createCriteria(User.class);
criteria.setFetchMode("roles", FetchMode.EAGER);

Here, we specify the associated collection that should be fetched from the database along with the User object on the same round trip. Using this query improves the efficiency of iteration since it eliminates the need for retrieving the associated objects separately.

This is the most efficient and fine-grained solution to avoid the LazyInitializationException error.

5. Conclusion

In this article, we saw how to deal with the org.hibernate.LazyInitializationException : could not initialize proxy – no Session error.

We explored different approaches along with performance issues. It's important to use a simple and efficient solution to avoid affecting performance.

Finally, We saw how the join-fetching approach is a good way to avoid the error.

As always, the code is available over on GitHub.

↧

and Methods in the JVM

June 7, 2020, 2:49 am

≫ Next: Using Kafka MockProducer

≪ Previous: Hibernate could not initialize proxy – no Session

1. Overview

The JVM uses two distinctive methods to initialize object instances and classes.

In this quick article, we're going to see how the compiler and runtime use the <init> and <clinit> methods for initialization purposes.

2. Instance Initialization Methods

Let's start with a straightforward object allocation and assignment:

Object obj = new Object();

If we compile this snippet and take a look at its bytecode via javap -c, we'll see something like:

0: new           #2      // class java/lang/Object
3: dup
4: invokespecial #1      // Method java/lang/Object."<init>":()V
7: astore_1

To initialize the object, the JVM calls a special method named <init>. In JVM jargon, this method is an instance initialization method. A method is an instance initialization if and only if:

It is defined in a class
Its name is <init>
It returns void

Each class can have zero or more instance initialization methods. These methods usually are corresponding to constructors in JVM-based programming languages such as Java or Kotlin.

2.1. Constructors and Instance Initializer Blocks

To better understand how the Java compiler translates constructors to <init>, let's consider another example:

public class Person {
    
    private String firstName = "Foo"; // <init>
    private String lastName = "Bar"; // <init>
    
    // <init>
    {
        System.out.println("Initializing...");
    }

    // <init>
    public Person(String firstName, String lastName) {
        this.firstName = firstName;
        this.lastName = lastName;
    }
    
    // <init>
    public Person() {
    }
}

This is the bytecode for this class:

public Person(java.lang.String, java.lang.String);
  Code:
     0: aload_0
     1: invokespecial #1       // Method java/lang/Object."<init>":()V
     4: aload_0
     5: ldc           #7       // String Foo
     7: putfield      #9       // Field firstName:Ljava/lang/String;
    10: aload_0
    11: ldc           #15      // String Bar
    13: putfield      #17      // Field lastName:Ljava/lang/String;
    16: getstatic     #20      // Field java/lang/System.out:Ljava/io/PrintStream;
    19: ldc           #26      // String Initializing...
    21: invokevirtual #28      // Method java/io/PrintStream.println:(Ljava/lang/String;)V
    24: aload_0
    25: aload_1
    26: putfield      #9       // Field firstName:Ljava/lang/String;
    29: aload_0
    30: aload_2
    31: putfield      #17      // Field lastName:Ljava/lang/String;
    34: return

Even though the constructor and the initializer blocks are separate in Java, they are in the same instance initialization method at the bytecode level. As a matter of fact, this <init> method:

First, initializes the firstName and lastName fields (index 0 through 13)
Then, it prints something to the console as part of the instance initializer block (index 16 through 21)
And finally, it updates the instance variables with the constructor arguments

If we create a Person as follows:

Person person = new Person("Brian", "Goetz");

Then this translates to the following bytecode:

0: new           #7        // class Person
3: dup
4: ldc           #9        // String Brian
6: ldc           #11       // String Goetz
8: invokespecial #13       // Method Person."<init>":(Ljava/lang/String;Ljava/lang/String;)V
11: astore_1

This time JVM calls another <init> method with a signature corresponding to the Java constructor.

The key takeaway here is that the constructors and other instance initializers are equivalent to the <init> method in the JVM world.

3. Class Initialization Methods

In Java, static initializer blocks are useful when we're going to initialize something at the class level:

public class Person {

    private static final Logger LOGGER = LoggerFactory.getLogger(Person.class); // <clinit>

    // <clinit>
    static {
        System.out.println("Static Initializing...");
    }

    // omitted
}

When we compile the preceding code, the compiler translates the static block to a class initialization method at the bytecode level.

Put simply, a method is a class initialization one if and only if:

Its name is <clinit>
It returns void

Therefore, the only way to generate a <clinit> method in Java is to use static fields and static block initializers.

JVM invokes the <clinit> the first time we use the corresponding class. Therefore, the <clinit> invocation happens at runtime, and we can't see the invocation at the bytecode level.

4. Conclusion

In this quick article, we saw the difference between <init> and <clinit> methods in the JVM. The <init> method is used to initialize object instances. Also, the JVM invokes the <clinit> method to initialize a class whenever necessary.

To better understand how initialization works in the JVM, it's highly recommended to read the JVM specification.

↧

Using Kafka MockProducer

June 7, 2020, 9:24 am

≫ Next: Spring @Import Annotation

≪ Previous: and Methods in the JVM

1. Overview

Kafka is a message processing system built around a distributed messaging queue. It provides a Java library so that applications can write data to, or read data from, a Kafka topic.

Now, since most of the business domain logic is validated through unit tests, applications generally mock all I/O operations in JUnit. Kafka also provides a MockProducer to mock a producer application.

In this tutorial, we'll first implement a Kafka producer application. Later, we'll implement a unit test to verify common producer operations with MockProducer.

2. Maven Dependencies

Before we implement a producer application, we'll add a Maven dependency for kafka-clients:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>2.5.0</version>
</dependency>

3. MockProducer

The kafka-clients library contains a Java library for publishing and consuming messages in Kafka. Producer applications can use these API's to send key-value records to a Kafka topic:

public class KafkaProducer {

    private final Producer<String, String> producer;

    public KafkaProducer(Producer<String, String> producer) {
        this.producer = producer;
    }

    public Future<RecordMetadata> send(String key, String value) {
        ProducerRecord record = new ProducerRecord("topic_sports_news", key, value);
        return producer.send(record);
    }
}

Any Kafka producer must implement the Producer interface in the client's library. Kafka also provides a KafkaProducer class, which is a concrete implementation that performs the I/O operations towards a Kafka broker.

Furthermore, Kafka provides a MockProducer that implements the same Producer interface and mocks all I/O operations implemented in the KafkaProducer:

@Test
void givenKeyValue_whenSend_thenVerifyHistory() {

    MockProducer mockProducer = new MockProducer<>(true, new StringSerializer(), new StringSerializer());

    kafkaProducer = new KafkaProducer(mockProducer);
    Future<RecordMetadata> recordMetadataFuture = kafkaProducer.send("soccer", 
      "{\"site\" : \"baeldung\"}");

    assertTrue(mockProducer.history().size() == 1);
}

Although such I/O operations can also be mocked with Mockito, MockProducer gives us access to a lot of features that we would need to implement on top of our mock. One such feature is the history() method. MockProducer caches the records for which send() is called, thereby allowing us to validate the publish behavior of the producer.

Moreover, we can also validate the metadata like topic name, partition, record key, or value:

assertTrue(mockProducer.history().get(0).key().equalsIgnoreCase("data"));
assertTrue(recordMetadataFuture.get().partition() == 0);

4. Mocking a Kafka Cluster

In our mocked tests so far, we've assumed a topic with just one partition. However, for achieving maximum concurrency between producer and consumer threads, Kafka topics are usually split into multiple partitions.

This allows producers to write data into multiple partitions. This is usually achieved by partitioning the records based on key and mapping specific keys to a particular partition:

public class EvenOddPartitioner extends DefaultPartitioner {

    @Override
    public int partition(String topic, Object key, byte[] keyBytes, Object value, 
      byte[] valueBytes, Cluster cluster) {
        if (((String)key).length() % 2 == 0) {
            return 0;
        }
        return 1;
    }
}

Because of this, all even-length keys will be published to partition “0” and, likewise, odd-length keys to partition “1”.

MockProducer enables us to validate such partition assignment algorithms by mocking the Kafka cluster with multiple partitions:

@Test
void givenKeyValue_whenSendWithPartitioning_thenVerifyPartitionNumber() 
  throws ExecutionException, InterruptedException {
    PartitionInfo partitionInfo0 = new PartitionInfo(TOPIC_NAME, 0, null, null, null);
    PartitionInfo partitionInfo1 = new PartitionInfo(TOPIC_NAME, 1, null, null, null);
    List<PartitionInfo> list = new ArrayList<>();
    list.add(partitionInfo0);
    list.add(partitionInfo1);

    Cluster cluster = new Cluster("kafkab", new ArrayList<Node>(), list, emptySet(), emptySet());
    this.mockProducer = new MockProducer<>(cluster, true, new EvenOddPartitioner(), 
      new StringSerializer(), new StringSerializer());

    kafkaProducer = new KafkaProducer(mockProducer);
    Future<RecordMetadata> recordMetadataFuture = kafkaProducer.send("partition", 
      "{\"site\" : \"baeldung\"}");

    assertTrue(recordMetadataFuture.get().partition() == 1);
}

We mocked a Cluster with two partitions, 0 and 1. We can then verify that EvenOddPartitioner publishes the record to partition 1.

5. Mocking Errors with MockProducer

So far, we've only mocked the producer to send a record to a Kafka topic successfully. But, what happens if there's an exception when writing a record?

Applications usually handle such exceptions by retrying or throwing the exception to the client.

MockProducer allows us to mock exceptions during send() so that we can validate the exception-handling code:

@Test
void givenKeyValue_whenSend_thenReturnException() {
    MockProducer<String, String> mockProducer = new MockProducer<>(false, 
      new StringSerializer(), new StringSerializer())

    kafkaProducer = new KafkaProducer(mockProducer);
    Future<RecordMetadata> record = kafkaProducer.send("site", "{\"site\" : \"baeldung\"}");
    RuntimeException e = new RuntimeException();
    mockProducer.errorNext(e);

    try {
        record.get();
    } catch (ExecutionException | InterruptedException ex) {
        assertEquals(e, ex.getCause());
    }
    assertTrue(record.isDone());
}

There are two notable things in this code.

First, we called the MockProducer constructor with autoComplete as false. This tells the MockProducer to wait for input before completing the send() method.

Second, we'll call mockProducer.errorNext(e), so that MockProducer returns an exception for the last send() call.

6. Mocking Transactional Writes with MockProducer

Kafka 0.11 introduced transactions between Kafka brokers, producers, and consumers. This allowed the end-to-end Exactly-Once message delivery semantic in Kafka. In short, this means that transactional producers can only publish records to a broker with a two-phase commit protocol.

MockProducer also supports transactional writes and allows us to verify this behavior:

@Test
void givenKeyValue_whenSendWithTxn_thenSendOnlyOnTxnCommit() {
    MockProducer<String, String> mockProducer = new MockProducer<>(true, 
      new StringSerializer(), new StringSerializer())

    kafkaProducer = new KafkaProducer(mockProducer);
    kafkaProducer.initTransaction();
    kafkaProducer.beginTransaction();
    Future<RecordMetadata> record = kafkaProducer.send("data", "{\"site\" : \"baeldung\"}");

    assertTrue(mockProducer.history().isEmpty());
    kafkaProducer.commitTransaction();
    assertTrue(mockProducer.history().size() == 1);
}

Since MockProducer also supports the same APIs as the concrete KafkaProducer, it only updates the history once we commit the transaction. Such mocking behavior can help applications validate that commitTransaction() is invoked for every transaction.

7. Conclusion

In this article, we looked at the MockProducer class of the kafka-client library. We discussed that MockProducer implements the same hierarchy as the concrete KafkaProducer and, therefore, we can mock all I/O operations with a Kafka broker.

We also discussed some complex mocking scenarios and were able to test exceptions, partitioning, and transactions with the MockProducer.

As always, all code examples are available over on GitHub.

↧

Spring @Import Annotation

June 8, 2020, 1:47 am

≫ Next: Retrying Failed Requests with Spring Cloud Netflix Ribbon

≪ Previous: Using Kafka MockProducer

1. Overview

In this tutorial, we'll learn how to use the Spring @Import annotation while clarifying how it's different from @ComponentScan.

2. Configuration and Beans

Before understanding the @Import annotation, we need to know what a Spring Bean is and have a basic working knowledge of the @Configuration annotation.

Both topics are out of this tutorial's scope. Still, we can learn about them in our Spring Bean article and in the Spring documentation.

Let's assume that we already have prepared three beans – Bird, Cat, and Dog – each with its own configuration class.

Then, we can provide our context with these Config classes:

@ExtendWith(SpringExtension.class)
@ContextConfiguration(classes = { BirdConfig.class, CatConfig.class, DogConfig.class })
class ConfigUnitTest {

    @Autowired
    ApplicationContext context;

    @Test
    void givenImportedBeans_whenGettingEach_shallFindIt() {
        assertThatBeanExists("dog", Dog.class);
        assertThatBeanExists("cat", Cat.class);
        assertThatBeanExists("bird", Bird.class);
    }

    private void assertThatBeanExists(String beanName, Class<?> beanClass) {
        Assertions.assertTrue(context.containsBean(beanName));
        Assertions.assertNotNull(context.getBean(beanClass));
    }
}

3. Grouping Configurations with @Import

There's no problem in declaring all the configurations. But imagine the trouble to control dozens of configuration classes within different sources. There should be a better way.

The @Import annotation has a solution, by its capability to group Configuration classes:

@Configuration
@Import({ DogConfig.class, CatConfig.class })
class MammalConfiguration {
}

Now, we just need to remember the mammals:

@ExtendWith(SpringExtension.class)
@ContextConfiguration(classes = { MammalConfiguration.class })
class ConfigUnitTest {

    @Autowired
    ApplicationContext context;

    @Test
    void givenImportedBeans_whenGettingEach_shallFindOnlyTheImportedBeans() {
        assertThatBeanExists("dog", Dog.class);
        assertThatBeanExists("cat", Cat.class);

        Assertions.assertFalse(context.containsBean("bird"));
    }

    private void assertThatBeanExists(String beanName, Class<?> beanClass) {
        Assertions.assertTrue(context.containsBean(beanName));
        Assertions.assertNotNull(context.getBean(beanClass));
    }
}

Well, probably we'll forget our Bird soon, so let's do one more group to include all the animal configuration classes:

@Configuration
@Import({ MammalConfiguration.class, BirdConfig.class })
class AnimalConfiguration {
}

Finally, no one was left behind, and we just need to remember one class:

@ExtendWith(SpringExtension.class)
@ContextConfiguration(classes = { AnimalConfiguration.class })
class AnimalConfigUnitTest {
    // same test validating that all beans are available in the context
}

4. @Import vs @ComponentScan

Before proceeding with @Import examples, let's have a quick stop and compare it to @ComponentScan.

4.1. Similarities

Both annotations can accept any @Component or @Configuration class.

Let's add a new @Component using @Import:

@Configuration
@Import(Bug.class)
class BugConfig {
}

@Component(value = "bug")
class Bug {
}

Now, the Bug bean is available just like any other bean.

4.2. Conceptual Difference

Simply put, we can reach the same result with both annotations. So, is there any difference between them?

To answer this question, let's remember that Spring generally promotes the convention-over-configuration approach.

Making an analogy with our annotations, @ComponentScan is more like convention, while @Import looks like configuration.

4.3. What Happens in Real Applications

Typically, we start our applications using @ComponentScan in a root package so it can find all components for us. If we're using Spring Boot, then @SpringBootApplication already includes @ComponentScan, and we're good to go. This shows the power of convention.

Now, let's imagine that our application is growing a lot. Now we need to deal with beans from all different places, like components, different package structures, and modules built by ourselves and third parties.

In this case, adding everything into the context risks starting conflicts about which bean to use. Besides that, we may get a slow start-up time.

On the other hand, we don't want to write an @Import for each new component because doing so is counterproductive.

Take our animals, for instance. We could indeed hide the imports from the context declaration, but we still need to remember the @Import for each Config class.

4.4. Working Together

We can aim for the best of both worlds. Let's picture that we have a package only for our animals. It could also be a component or module and keep the same idea.

Then we can have one @ComponentScan just for our animal package:

package com.baeldung.importannotation.animal;

// imports...

@Configuration
@ComponentScan
public class AnimalScanConfiguration {
}

And an @Import to keep control over what we'll add to the context:

package com.baeldung.importannotation.zoo;

// imports...

@Configuration
@Import(AnimalScanConfiguration.class)
class ZooApplication {
}

Finally, any new bean added to the animal package will be automatically found by our context. And we still have explicit control over the configurations we are using.

5. Conclusion

In this quick tutorial, we learned how to use @Import to organize our configurations.

We also learned that @Import is very similar to @ComponentScan, except for the fact that @Import has an explicit approach while @ComponentScan uses an implicit one.

Also, we looked at possible difficulties controlling our configurations in real applications and how to deal with these by combining both annotations.

As usual, the complete code is available over on GitHub.

↧

Retrying Failed Requests with Spring Cloud Netflix Ribbon

June 9, 2020, 12:23 am

≫ Next: Inject Arrays and Lists From Spring Properties Files

≪ Previous: Spring @Import Annotation

1. Overview

Spring Cloud provides client-side load balancing through the use of Netflix Ribbon. Ribbon's load balancing mechanism can be supplemented with retries.

In this tutorial, we're going to explore this retry mechanism.

First, we'll see why it's important that our applications need to be built with this feature in mind. Then, we'll build and configure an application with Spring Cloud Netflix Ribbon to demonstrate the mechanism.

2. Motivation

In a cloud-based application, it's a common practice for a service to make requests to other services. But in such a dynamic and volatile environment, networks could fail or services could be temporarily unavailable.

We want to handle failures in a graceful manner and recover quickly. In many cases, these issues are short-lived. If we repeated the same request shortly after the failure occurred, maybe it would succeed.

This practice helps us to improve the application's resilience, which is one of the key aspects of a reliable cloud application.

Nevertheless, we need to keep an eye on retries since they can also lead to bad situations. For example, they can increase latency which might not be desirable.

3. Setup

In order to experiment with the retry mechanism, we need two Spring Boot services. First, we'll create a weather-service that will display today's weather information through a REST endpoint.

Second, we'll define a client service that will consume the weather endpoint.

3.1. The Weather Service

Let's build a very simple weather service that will fail sometimes, with a 503 HTTP status code (service unavailable). We'll simulate this intermittent failure by choosing to fail when the number of calls is a multiple of a configurable successful.call.divisor property:

@Value("${successful.call.divisor}")
private int divisor;
private int nrOfCalls = 0;

@GetMapping("/weather")
public ResponseEntity<String> weather() {
    LOGGER.info("Providing today's weather information");
    if (isServiceUnavailable()) {
        return new ResponseEntity<>(HttpStatus.SERVICE_UNAVAILABLE);
    }
    LOGGER.info("Today's a sunny day");
    return new ResponseEntity<>("Today's a sunny day", HttpStatus.OK);
}

private boolean isServiceUnavailable() {
    return ++nrOfCalls % divisor != 0;
}

Also, to help us observe the number of retries made to the service, we have a message logger inside the handler.

Later on, we're going to configure the client service to trigger the retry mechanism when the weather service is temporarily unavailable.

3.2. The Client Service

Our second service will use Spring Cloud Netflix Ribbon.

First, let's define the Ribbon client configuration:

@Configuration
@RibbonClient(name = "weather-service", configuration = RibbonConfiguration.class)
public class WeatherClientRibbonConfiguration {

    @LoadBalanced
    @Bean
    RestTemplate getRestTemplate() {
        return new RestTemplate();
    }

}

Our HTTP Client is annotated with @LoadBalanced which means we want it to be load balanced with Ribbon.

We'll now add a ping mechanism to determine the service's availability, and also a round-robin load balancing strategy:

public class RibbonConfiguration {
 
    @Bean
    public IPing ribbonPing() {
        return new PingUrl();
    }
 
    @Bean
    public IRule ribbonRule() {
        return new RoundRobinRule();
    }
}

Next, we need to turn off Eureka from the Ribbon client since we're not using service discovery. Instead, we're using a manually defined list of weather-service instances available for load balancing.

So, let's also add this all to the application.yml file:

weather-service:
    ribbon:
        eureka:
            enabled: false
        listOfServers: http://localhost:8081, http://localhost:8082

Finally, let's build a controller and make it call the backend service:

@RestController
public class MyRestController {

    @Autowired
    private RestTemplate restTemplate;

    @RequestMapping("/client/weather")
    public String weather() {
        String result = this.restTemplate.getForObject("http://weather-service/weather", String.class);
        return "Weather Service Response: " + result;
    }
}

4. Enabling the Retry Mechanism

4.1. Configuring application.yml properties

We need to put weather service properties in our client application's application.yml file:

weather-service:
  ribbon:
    MaxAutoRetries: 3
    MaxAutoRetriesNextServer: 1
    retryableStatusCodes: 503, 408
    OkToRetryOnAllOperations: true

The above configuration uses the standard Ribbon properties we need to define to enable retries:

MaxAutoRetries – the number of times a failed request is retried on the same server (default 0)
MaxAutoRetriesNextServer – the number of servers to try excluding the first one (default 0)
retryableStatusCodes – the list of HTTP status codes to retry
OkToRetryOnAllOperations – when this property is set to true, all types of HTTP requests are retried, not just GET ones (default)

We're going to retry a failed request when the client service receives a 503 (service unavailable) or 408 (request timeout) response code.

4.2. Required Dependencies

Spring Cloud Netflix Ribbon leverages Spring Retry to retry failed requests.

We have to make sure the dependency is on the classpath. Otherwise, the failed requests won't be retried. We can omit the version since it's managed by Spring Boot:

<dependency>
    <groupId>org.springframework.retry</groupId>
    <artifactId>spring-retry</artifactId>
</dependency>

4.3. Retry Logic in Practice

Finally, let's see the retry logic in practice.

For this reason, we need two instances of our weather service and we'll run them on 8081 and 8082 ports. Of course, these instances should match the listOfServers list defined in the previous section.

Moreover, we need to configure the successful.call.divisor property on each instance to make sure our simulated services fail at different times:

successful.call.divisor = 5 // instance 1
successful.call.divisor = 2 // instance 2

Next, let's also run the client service on port 8080 and call:

http://localhost:8080/client/weather

Let's take a look at the weather-service‘s console:

weather service instance 1:
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information
    Providing today's weather information

weather service instance 2:
    Providing today's weather information
    Today's a sunny day

So, after several attempts (4 on instance 1 and 2 on instance 2) we've got a valid response.

5. Backoff Policy Configuration

When a network experiences a higher amount of data than it can handle, then congestion occurs. In order to alleviate it, we can set up a backoff policy.

By default, there is no delay between the retry attempts. Underneath, Spring Cloud Ribbon uses Spring Retry‘s NoBackOffPolicy object which does nothing.

However, we can override the default behavior by extending the RibbonLoadBalancedRetryFactory class:

@Component
private class CustomRibbonLoadBalancedRetryFactory 
  extends RibbonLoadBalancedRetryFactory {

    public CustomRibbonLoadBalancedRetryFactory(
      SpringClientFactory clientFactory) {
        super(clientFactory);
    }

    @Override
    public BackOffPolicy createBackOffPolicy(String service) {
        FixedBackOffPolicy fixedBackOffPolicy = new FixedBackOffPolicy();
        fixedBackOffPolicy.setBackOffPeriod(2000);
        return fixedBackOffPolicy;
    }
}

The FixedBackOffPolicy class provides a fixed delay between retry attempts. If we don't set a backoff period, the default is 1 second.

Alternatively, we can set up an ExponentialBackOffPolicy or an ExponentialRandomBackOffPolicy:

@Override
public BackOffPolicy createBackOffPolicy(String service) {
    ExponentialBackOffPolicy exponentialBackOffPolicy = 
      new ExponentialBackOffPolicy();
    exponentialBackOffPolicy.setInitialInterval(1000);
    exponentialBackOffPolicy.setMultiplier(2); 
    exponentialBackOffPolicy.setMaxInterval(10000);
    return exponentialBackOffPolicy;
}

Here, the initial delay between the attempts is 1 second. Then, the delay is doubled for each subsequent attempt without exceeding 10 seconds: 1000 ms, 2000 ms, 4000 ms, 8000 ms, 10000 ms, 10000 ms…

Additionally, the ExponentialRandomBackOffPolicy adds a random value to each sleeping period without exceding the next value. So, it may yield 1500 ms, 3400 ms, 6200 ms, 9800 ms, 10000 ms, 10000 ms…

Choosing one or another depends on how much traffic we have and how many different client services. From fixed to random, these strategies help us achieve a better spread of traffic spikes also meaning fewer retries. For example, with many clients, a random factor helps avoid several clients hitting the service at the same time while retrying.

6. Conclusion

In this article, we learned how to retry failed requests in our Spring Cloud applications using Spring Cloud Netflix Ribbon. We also discussed the benefits this mechanism provides.

Next, we demonstrated how the retry logic works through a REST application backed by two Spring Boot services. Spring Cloud Netflix Ribbon makes that possible by leveraging the Spring Retry library.

Finally, we saw how to configure different types of delays between the retry attempts.

As always, the source code for this tutorial is available over on GitHub.

↧

Inject Arrays and Lists From Spring Properties Files

June 9, 2020, 12:26 am

≫ Next: The “Cannot find symbol” Compilation Error

≪ Previous: Retrying Failed Requests with Spring Cloud Netflix Ribbon

1. Overview

In this quick tutorial, we're going to learn how to inject values into an array or List from a Spring properties file.

2. Default Behavior

We're going to start with a simple application.properties file:

arrayOfStrings=Baeldung,dot,com

Let's see how Spring behaves when we set our variable type to String[]:

@Value("${arrayOfStrings}")
private String[] arrayOfStrings;

@Test
void whenContextIsInitialized_thenInjectedArrayContainsExpectedValues() {
    assertEquals(new String[] {"Baeldung", "dot", "com"}, arrayOfStrings);
}

We can see that Spring correctly assumes our delimiter is a comma and initializes the array accordingly.

We should also note that, by default, injecting an array works correctly only when we have comma-separated values.

3. Injecting Lists

If we try to inject a List in the same way, we'll get a surprising result:

@Value("${arrayOfStrings}")
private List<String> unexpectedListOfStrings;

@Test
void whenContextIsInitialized_thenInjectedListContainsUnexpectedValues() {
    assertEquals(Collections.singletonList("Baeldung,dot,com"), unexpectedListOfStrings);
}

Our List contains a single element, which is equal to the value we set in our properties file.

In order to properly inject a List, we need to use a special syntax called Spring Expression Language (SpEL):

@Value("#{'${arrayOfStrings}'.split(',')}")
private List<String> listOfStrings;

@Test
void whenContextIsInitialized_thenInjectedListContainsExpectedValues() {
    assertEquals(Arrays.asList("Baeldung", "dot", "com"), listOfStrings);
}

We can see that our expression starts with # instead of the $ that we're used to with @Value.

We should also note that we're invoking a split method, which makes the expression a bit more complex than a usual injection.

If we'd like to keep our expression a bit simpler, we can declare our property in a special format:

listOfStrings={'Baeldung','dot','com'}

Spring will recognize this format, and we'll be able to inject our List using a somewhat simpler expression:

@Value("#{${listOfStrings}}")
private List<String> listOfStringsV2;

@Test
void whenContextIsInitialized_thenInjectedListV2ContainsExpectedValues() {
    assertEquals(Arrays.asList("Baeldung", "dot", "com"), listOfStringsV2);
}

4. Using Custom Delimiters

Let's create a similar property, but this time, we're going to use a different delimiter:

listOfStringsWithCustomDelimiter=Baeldung;dot;com

As we've seen when injecting Lists, we can use a special expression where we can specify our desired delimiter:

@Value("#{'${listOfStringsWithCustomDelimiter}'.split(';')}")
private List<String> listOfStringsWithCustomDelimiter;

@Test
void whenContextIsInitialized_thenInjectedListWithCustomDelimiterContainsExpectedValues() {
    assertEquals(Arrays.asList("Baeldung", "dot", "com"), listOfStringsWithCustomDelimiter);
}

5. Injecting Other Types

Let's take a look at the following properties:

listOfBooleans=false,false,true
listOfIntegers=1,2,3,4
listOfCharacters=a,b,c

We can see that Spring supports basic types out-of-the-box, so we don't need to do any special parsing:

@Value("#{'${listOfBooleans}'.split(',')}")
private List<Boolean> listOfBooleans;

@Value("#{'${listOfIntegers}'.split(',')}")
private List<Integer> listOfIntegers;

@Value("#{'${listOfCharacters}'.split(',')}")
private List<Character> listOfCharacters;

@Test
void whenContextIsInitialized_thenInjectedListOfBasicTypesContainsExpectedValues() {
    assertEquals(Arrays.asList(false, false, true), listOfBooleans);
    assertEquals(Arrays.asList(1, 2, 3, 4), listOfIntegers);
    assertEquals(Arrays.asList('a', 'b', 'c'), listOfCharacters);
}

This is only supported via SpEL, so we can't inject an array in the same way.

6. Reading Properties Programmatically

In order to read properties programmatically, we first need to get the instance of our Environment object:

@Autowired
private Environment environment;

Then, we can simply use the getProperty method to read any property by specifying its key and expected type:

@Test
void whenReadingFromSpringEnvironment_thenPropertiesHaveExpectedValues() {
    String[] arrayOfStrings = environment.getProperty("arrayOfStrings", String[].class);
    List<String> listOfStrings = (List<String>)environment.getProperty("arrayOfStrings", List.class);

    assertEquals(new String[] {"Baeldung", "dot", "com"}, arrayOfStrings);
    assertEquals(Arrays.asList("Baeldung", "dot", "com"), listOfStrings);
}

7. Conclusion

In this quick tutorial, we've learned how to easily inject arrays and Lists through quick and practical examples.

As always, the code is available over on GitHub.

↧

The “Cannot find symbol” Compilation Error

June 10, 2020, 12:13 am

≫ Next: What is [Ljava.lang.Object;?

≪ Previous: Inject Arrays and Lists From Spring Properties Files

1. Overview

In this tutorial, we'll review what compilation errors are, and then specifically explain what the “cannot find symbol” error is and how it's caused.

2. Compile Time Errors

During compilation, the compiler analyses and verifies the code for numerous things; reference types, type casts, and method declarations to name a few. This part of the compilation process is important since during this phase we'll get a compilation error.

Basically, there are three types of compile-time errors:

We can have syntax errors. One of the most common mistakes any programmer can make is forgetting to put the semicolon at the end of the statement; some others are forgetting imports, mismatching parentheses, or omitting the return statement
Next, there are type-checking errors. This is a process of verifying type safety in our code. With this check, we're making sure that we have consistent types of expressions. For example, if we define a variable of type int, we should never assign a double or String value to it
Meanwhile, there is a possibility that the compiler crashes. This is very rare but it can happen. In this case, it's good to know that our code might not be a problem, but that it's rather an external issue

3. The “cannot find symbol” Error

The “cannot find symbol” error comes up mainly when we try to use a variable that is not defined or declared in our program.

When our code compiles, the compiler needs to verify all identifiers we have. The errror “cannot find symbol” means we're referring to something that the compiler doesn't know about.

3.1. What Can Cause “cannot find symbol” Error?

Really, there's only one cause: The compiler couldn't find the definition of a variable we're trying to reference.

But, there are many reasons why this happens. To help us understand why, let's remind ourselves what Java code consists of.

Our Java source code consists of:

Keywords: true, false, class, while
Literals: numbers and text
Operators and other non-alphanumeric tokens: -, /, +, =, {
Identifiers: main, Reader, i, toString, etc.
Comments and whitespace

4. Misspelling

The most common issues are all spelling-related. If we recall that all Java identifiers are case-sensitive, we can see that:

StringBiulder
stringBuilder
String_Builder

would all be different ways to incorrectly refer to the StringBuilder class.

5. Instance Scope

This error can also be caused when using something that was declared outside of the scope of the class.

For example, let's say we have an Article class that calls a generateId method:

public class Article {
    private int length;
    private long id;

    public Article(int length) {
        this.length = length;
        this.id = generateId();
    }
}

But, we declare the generateId method in a separate class:

public class IdGenerator {
    public long generateId() {
        Random random = new Random();
        return random.nextInt();
    }
}

With this setup, the compiler will give a “cannot find symbol” error for generateId on line 7 of the Article snippet. The reason is that the syntax of line 7 implies that the generateId method is declared in Article.

Like in all mature languages, there's more than one way to address this issue. But, one way would be to construct IdGenerator in the Article class and then call the method:

public class Article {
    private int length;
    private long id;

    public Article(int length) {
        this.length = length;
        this.id = new IdGenerator().generateId();
    }
}

6. Undefined Variables

Sometimes we forget to declare the variable. As we can see from the snippet below, we're trying to manipulate the variable we haven't declared, in this case, text:

public class Article {
    private int length;

    // ...

    public void setText(String newText) {
        this.text = newText; // text variable was never defined
    }
}

We solve this problem by declaring the variable text of type String:

public class Article {
    private int length;
    private String text;
    // ...

    public void setText(String newText) {
        this.text = newText;
    }
}

7. Variable Scope

When a variable declaration is out of scope at the point we tried to use it, it'll cause an error during compilation. This typically happens when we work with loops.

Variables inside the loop aren't accessible outside the loop:

public boolean findLetterB(String text) {
    for (int i=0; i < text.length(); i++) {
        Character character = text.charAt(i);
        if (String.valueOf(character).equals("b")) {
            return true;
        }
        return false;
    }

    if (character == "a") {  // <-- error!
        ...
    }
}

The if statement should go inside of the for loop if we need to examine characters more:

public boolean findLetterB(String text) {
    for (int i = 0; i < text.length(); i++) {
        Character character = text.charAt(i);
        if (String.valueOf(character).equals("b")) {
            return true;
        } else if (String.valueOf(character).equals("a")) {
            ...
        }
        return false;
    }
}

8. Invalid Use of Methods or Fields

The “cannot find symbol” error will also occur if we use a field as a method or vice versa:

public class Article {
    private int length;
    private long id;
    private List<String> texts;

    public Article(int length) {
        this.length = length;
    }
    // getters and setters
}

Now, if we try to refer to the Article's texts field as if it were a method:

Article article = new Article(300);
List<String> texts = article.texts();

then we'd see the error.

That's because the compiler is looking for a method called texts, which there isn't one.

Actually, there's a getter method that we can use instead:

Article article = new Article(300);
List<String> texts = article.getTexts();

Mistakenly operating on an array rather than an array element is also an issue:

for (String text : texts) {
    String firstLetter = texts.charAt(0); // it should be text.charAt(0)
}

And so is forgetting the new keyword as in:

String s = String(); // should be 'new String()'

9. Package and Class Imports

Another problem is forgetting to import the class or package. For example, using a List object without importing java.util.List:

// missing import statement: 
// import java.util.List

public class Article {
    private int length;
    private long id;
    private List<String> texts;  <-- error!
    public Article(int length) {
        this.length = length;
    }
}

This code would not compile since the program doesn't know what List is.

10. Wrong Imports

Importing the wrong type, due to IDE completion or auto-correction is also a common issue.

Think of the situation when we want to use dates in Java. A lot of times we could import a wrong Date class, which doesn't provide methods and functionalities as other date classes that we might need:

Date date = new Date();
int year, month, day;

To get the year, month, or day for java.util.Date, we also need to import Calendar class and extract the information from there.

Simply invoking getDate() from java.util.Date won't work:

...
date.getDay();
date.getMonth();
date.getYear();

Instead, we use the Calendar object:

...
Calendar cal = Calendar.getInstance(TimeZone.getTimeZone("Europe/Paris"));
cal.setTime(date);
year = cal.get(Calendar.YEAR);
month = cal.get(Calendar.MONTH);
day = cal.get(Calendar.DAY_OF_MONTH);

However, if we have imported the LocalDate class, we wouldn't need additional code that provides us the information we need:

...
LocalDate localDate=date.toInstant().atZone(ZoneId.systemDefault()).toLocalDate();
year = localDate.getYear();
month = localDate.getMonthValue();
day = localDate.getDayOfMonth();

11. Conclusion

Compilers work on a fixed set of rules which are language-specific. If a code doesn't stick to these rules, the compiler cannot perform a conversion process which results in a compilation error. When we face the “Cannot find symbol” compilation error, the key is to identify the cause.

From the error message, we can find the line of code where the error occurs, and which element is wrong. Knowing the most common issues causing this error, will make solving it very easy and fast.

↧

What is [Ljava.lang.Object;?

June 11, 2020, 2:46 am

≫ Next: HTTP Server with Netty

≪ Previous: The “Cannot find symbol” Compilation Error

1. Overview

In this tutorial, we'll learn what [Ljava.lang.Object means and how to access the proper values of the object.

2. Java Object Class

In Java, if we want to print a value directly from an object, the first thing that we could try is to call its toString method:

Object[] arrayOfObjects = { "John", 2, true };
assertTrue(arrayOfObjects.toString().startsWith("[Ljava.lang.Object;"));

If we run the test, it will be successful, but usually, it's not a very useful result.

What we want to do is print the values inside the array. Instead, we have [Ljava.lang.Object. The name of the class, as implemented in Object.class:

getClass().getName() + '@' + Integer.toHexString(hashCode())

When we get the class name directly from the object, we are getting the internal names from the JVM with their types, that's why we have extra characters like [ and L, they represent the Array and the ClassName types, respectively.

3. Printing Meaningful Values

To be able to print the result correctly, we can use some classes from the java.util package.

3.1. Arrays

For example, we can use two of the methods in the Arrays class to deal with the conversion.

With one-dimensional arrays, we can use the toString method:

Object[] arrayOfObjects = { "John", 2, true };
assertEquals(Arrays.toString(arrayOfObjects), "[John, 2, true]");

For deeper arrays, we have the deepToString method:

Object[] innerArray = { "We", "Are", "Inside" };
Object[] arrayOfObjects = { "John", 2, innerArray };
assertEquals(Arrays.deepToString(arrayOfObjects), "[John, 2, [We, Are, Inside]]");

3.2. Streaming

One of the significant new features in JDK 8 is the introduction of Java streams, which contains classes for processing sequences of elements:

Object[] arrayOfObjects = { "John", 2, true };
List<String> listOfString = Stream.of(arrayOfObjects)
  .map(Object::toString)
  .collect(Collectors.toList());
assertEquals(listOfString.toString(), "[John, 2, true]");

First, we've created a stream using the helper method of. We've converted all the objects inside the array to a string using map, then we've inserted it to a list using collect to print the values.

4. Conclusion

In this tutorial, we've seen how we can print meaningful information from an array and avoid the default [Ljava.lang.Object;.

We can always find the source code for this article over on GitHub.

↧

HTTP Server with Netty

June 11, 2020, 9:50 am

≫ Next: The Constructor Return Type in Java

≪ Previous: What is [Ljava.lang.Object;?

1. Overview

In this tutorial, we're going to implement a simple upper-casing server over HTTP with Netty, an asynchronous framework that gives us the flexibility to develop network applications in Java.

2. Server Bootstrapping

Before we start, we should be aware of the basics concepts of Netty, such as channel, handler, encoder, and decoder.

Here we'll jump straight into bootstrapping the server, which is mostly the same as a simple protocol server:

public class HttpServer {

    private int port;
    private static Logger logger = LoggerFactory.getLogger(HttpServer.class);

    // constructor

    // main method, same as simple protocol server

    public void run() throws Exception {
        ...
        ServerBootstrap b = new ServerBootstrap();
        b.group(bossGroup, workerGroup)
          .channel(NioServerSocketChannel.class)
          .handler(new LoggingHandler(LogLevel.INFO))
          .childHandler(new ChannelInitializer() {
            @Override
            protected void initChannel(SocketChannel ch) throws Exception {
                ChannelPipeline p = ch.pipeline();
                p.addLast(new HttpRequestDecoder());
                p.addLast(new HttpResponseEncoder());
                p.addLast(new CustomHttpServerHandler());
            }
          });
        ...
    }
}

So, here only the childHandler differs as per the protocol we want to implement, which is HTTP for us.

We're adding three handlers to the server's pipeline:

Netty's HttpResponseEncoder – for serialization
Netty's HttpRequestDecoder – for deserialization
Our own CustomHttpServerHandler – for defining our server's behavior

Let's look at the last handler in detail next.

3. CustomHttpServerHandler

Our custom handler's job is to process inbound data and send a response.

Let's break it down to understand its working.

3.1. Structure of the handler

CustomHttpServerHandler extends Netty's abstract SimpleChannelInboundHandler and implements its lifecycle methods:

public class CustomHttpServerHandler extends SimpleChannelInboundHandler {
    private HttpRequest request;
    StringBuilder responseData = new StringBuilder();

    @Override
    public void channelReadComplete(ChannelHandlerContext ctx) {
        ctx.flush();
    }

    @Override
    protected void channelRead0(ChannelHandlerContext ctx, Object msg) {
       // implementation to follow
    }

    @Override
    public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) {
        cause.printStackTrace();
        ctx.close();
    }
}

As the method name suggests, channelReadComplete flushes the handler context after the last message in the channel has been consumed so that it's available for the next incoming message. The method exceptionCaught is for handling exceptions if any.

So far, all we've seen is boilerplate code.

Now let's get on with the interesting stuff, the implementation of channelRead0.

3.2. Reading the Channel

Our use case is simple, the server will simply transform the request body and query parameters, if any, to uppercase. A word of caution here on reflecting request data in the response – we are doing this only for demonstration purposes, to understand how we can use Netty to implement an HTTP server.

Here, we'll consume the message or request, and set up its response as recommended by the protocol (note that RequestUtils is something we'll write in just a moment):

if (msg instanceof HttpRequest) {
    HttpRequest request = this.request = (HttpRequest) msg;

    if (HttpUtil.is100ContinueExpected(request)) {
        writeResponse(ctx);
    }
    responseData.setLength(0);            
    responseData.append(RequestUtils.formatParams(request));
}
responseData.append(RequestUtils.evaluateDecoderResult(request));

if (msg instanceof HttpContent) {
    HttpContent httpContent = (HttpContent) msg;
    responseData.append(RequestUtils.formatBody(httpContent));
    responseData.append(RequestUtils.evaluateDecoderResult(request));

    if (msg instanceof LastHttpContent) {
        LastHttpContent trailer = (LastHttpContent) msg;
        responseData.append(RequestUtils.prepareLastResponse(request, trailer));
        writeResponse(ctx, trailer, responseData);
    }
}

As we can see, when our channel receives an HttpRequest, it first checks if the request expects a 100 Continue status. In that case, we immediately write back with an empty response with a status of CONTINUE:

private void writeResponse(ChannelHandlerContext ctx) {
    FullHttpResponse response = new DefaultFullHttpResponse(HTTP_1_1, CONTINUE, 
      Unpooled.EMPTY_BUFFER);
    ctx.write(response);
}

After that, the handler initializes a string to be sent as a response and adds the request's query parameters to it to be sent back as-is.

Let's now define the method formatParams and place it in a RequestUtils helper class to do that:

StringBuilder formatParams(HttpRequest request) {
    StringBuilder responseData = new StringBuilder();
    QueryStringDecoder queryStringDecoder = new QueryStringDecoder(request.uri());
    Map<String, List<String>> params = queryStringDecoder.parameters();
    if (!params.isEmpty()) {
        for (Entry<String, List<String>> p : params.entrySet()) {
            String key = p.getKey();
            List<String> vals = p.getValue();
            for (String val : vals) {
                responseData.append("Parameter: ").append(key.toUpperCase()).append(" = ")
                  .append(val.toUpperCase()).append("\r\n");
            }
        }
        responseData.append("\r\n");
    }
    return responseData;
}

Next, on receiving an HttpContent, we take the request body and convert it to upper case:

StringBuilder formatBody(HttpContent httpContent) {
    StringBuilder responseData = new StringBuilder();
    ByteBuf content = httpContent.content();
    if (content.isReadable()) {
        responseData.append(content.toString(CharsetUtil.UTF_8).toUpperCase())
          .append("\r\n");
    }
    return responseData;
}

Also, if the received HttpContent is a LastHttpContent, we add a goodbye message and trailing headers, if any:

StringBuilder prepareLastResponse(HttpRequest request, LastHttpContent trailer) {
    StringBuilder responseData = new StringBuilder();
    responseData.append("Good Bye!\r\n");

    if (!trailer.trailingHeaders().isEmpty()) {
        responseData.append("\r\n");
        for (CharSequence name : trailer.trailingHeaders().names()) {
            for (CharSequence value : trailer.trailingHeaders().getAll(name)) {
                responseData.append("P.S. Trailing Header: ");
                responseData.append(name).append(" = ").append(value).append("\r\n");
            }
        }
        responseData.append("\r\n");
    }
    return responseData;
}

3.3. Writing the Response

Now that our data to be sent is ready, we can write the response to the ChannelHandlerContext:

private void writeResponse(ChannelHandlerContext ctx, LastHttpContent trailer,
  StringBuilder responseData) {
    boolean keepAlive = HttpUtil.isKeepAlive(request);
    FullHttpResponse httpResponse = new DefaultFullHttpResponse(HTTP_1_1, 
      ((HttpObject) trailer).decoderResult().isSuccess() ? OK : BAD_REQUEST,
      Unpooled.copiedBuffer(responseData.toString(), CharsetUtil.UTF_8));
    
    httpResponse.headers().set(HttpHeaderNames.CONTENT_TYPE, "text/plain; charset=UTF-8");

    if (keepAlive) {
        httpResponse.headers().setInt(HttpHeaderNames.CONTENT_LENGTH, 
          httpResponse.content().readableBytes());
        httpResponse.headers().set(HttpHeaderNames.CONNECTION, 
          HttpHeaderValues.KEEP_ALIVE);
    }
    ctx.write(httpResponse);

    if (!keepAlive) {
        ctx.writeAndFlush(Unpooled.EMPTY_BUFFER).addListener(ChannelFutureListener.CLOSE);
    }
}

In this method, we created a FullHttpResponse with HTTP/1.1 version, adding the data we'd prepared earlier.

If a request is to be kept-alive, or in other words, if the connection is not to be closed, we set the response's connection header as keep-alive. Otherwise, we close the connection.

4. Testing the Server

To test our server, let's send some cURL commands and look at the responses.

Of course, we need to start the server by running the class HttpServer before this.

4.1. GET Request

Let's first invoke the server, providing a cookie with the request:

curl http://127.0.0.1:8080?param1=one

As a response, we get:

Parameter: PARAM1 = ONE

Good Bye!

We can also hit http://127.0.0.1:8080?param1=one from any browser to see the same result.

4.2. POST Request

As our second test, let's send a POST with body sample content:

curl -d "sample content" -X POST http://127.0.0.1:8080

Here's the response:

SAMPLE CONTENT
Good Bye!

This time, since our request contained a body, the server sent it back in uppercase.

5. Conclusion

In this tutorial, we saw how to implement the HTTP protocol, particularly an HTTP server using Netty.

HTTP/2 in Netty demonstrates a client-server implementation of the HTTP/2 protocol.

As always, source code is available over on GitHub.

↧

The Constructor Return Type in Java

June 11, 2020, 9:52 am

≫ Next: Java Weekly, Issue 337

≪ Previous: HTTP Server with Netty

1. Overview

In this quick tutorial, we're going to focus on the return type for a constructor in Java.

First, we'll get familiar with how object initialization works in Java and the JVM. Then, we'll dig deeper to see how object initialization and assignment work under-the-hood.

2. Instance Initialization

Let's start with an empty class:

public class Color {}

Here, we're going to create an instance from this class and assign it to some variable:

Color color = new Color();

After compiling this simple Java snippet, let's take a peek at its bytecode via the javap -c command:

0: new           #7                  // class Color
3: dup
4: invokespecial #9                  // Method Color."<init>":()V
7: astore_1

When we instantiate an object in Java, the JVM performs the following operations:

First, it finds a place in its process space for the new object.
Then, the JVM performs the system initialization process. In this step, it creates the object in its default state. The new opcode in the bytecode is actually responsible for this step.
Finally, it initializes the object with the constructor and other initializer blocks. In this case, the invokespecial opcode calls the constructor.

As shown above, the method signature for the default constructor is:

Method Color."<init>":()V

The <init> is the name of instance initialization methods in the JVM. In this case, the <init> is a function that:

takes nothing as the input (empty parentheses after the method name)
returns nothing (V stands for void)

Therefore, the return type of a constructor in Java and JVM is void.

Taking another look at our simple assignment:

Color color = new Color();

Now that we know the constructor returns void, let's see how the assignment works.

3. How Assignment Works

JVM is a stack-based virtual machine. Each stack consists of stack frames. Put simply, each stack frame corresponds to a method call. In fact, JVM creates frames with a new method call and destroys them as they finish their job:

Each stack frame uses an array to store local variables and an operand stack to store partial results. Given that, let's take another look at the bytecode:

0: new           #7                // class Color
3: dup
4: invokespecial #9               // Method Color."<init>":()V
7: astore_1

Here's how the assignment works:

The new instruction creates an instance of Color and pushes its reference onto the operand stack
The dup opcode duplicates the last item on the operand stack
The invokespecial takes the duplicated reference and consumes it for initialization. After this, only the original reference remains on the operand stack
The astore_1 stores the original reference to index 1 of the local variables array. The prefix “a” means that the item to be stored is an object reference, and the “1” is the array index

From now on, the second item (index 1) in the local variables array is a reference to the newly created object. Therefore, we don't lose the reference, and the assignment actually works — even when the constructor returns nothing!

4. Conclusion

In this quick tutorial, we learned how the JVM creates and initializes our class instances. Moreover, we saw how instance initialization works under-the-hood.

For an even more detailed understanding of the JVM, it's always a good idea to check out its specification.

↧

Java Weekly, Issue 337

June 11, 2020, 10:36 pm

≫ Next: Circular View Path Error

≪ Previous: The Constructor Return Type in Java

1. Spring and Java

>> Fault-tolerant and reliable messaging with Kafka and Spring Boot [arnoldgalovics.com]

A comprehensive example showcasing how to use Kafka for DLQ processing, retries, and manual commits.

>> Distributed Cache with Hazelcast and Spring [reflectoring.io]

Why caching is important and how does it fit into modern software architecture using Hazelcast as an example. Cool stuff.

>> Why do the Gradle/Maven wrappers matter? [andresalmiray.com]

Use wrappers and have one less thing to worry about.

Also worth reading:

>> Migrating Spring Boot's Build to Gradle [spring.io]
>> Java Records – How to use them with Hibernate and JPA [thorben-janssen.com]
>> Introduction to Event-Driven Microservices with Spring Cloud Stream [piotrminkowski.com]
>> Marrying Vue.js and Thymeleaf: Embedding Javascript Components in Server-Side Templates [reflectoring.io]
>> Running Axon Server – CQRS and Event Sourcing in Java [infoq.com]
>> Java Text Blocks [mscharhag.com]
>> ‘Code First' API Documentation with Springdoc and Spring Boot [reflectoring.io]
>> Why I (still) love Vaadin [blog.frankel.ch]
>> How to get the SQL query from JPQL or JPA Criteria [vladmihalcea.com]

Webinars and presentations:

>> Deploying a Neo4J single core on managed Kubernetes (Video) [blog.sebastian-daschner.com]
>> Writing Unit Tests for a Spring REST API With Kotlin and JUnit 5: Reading Data [petrikainulainen.net]
>> A Bootiful Podcast: Kubernetes Release SIG Tim Pepper [spring.io]
>> Demystifying the Most Significant Java Language Features from 9 to 11 [infoq.com]

2. Technical

>> Writing my first game in Unity [beust.com]

It's always refreshing to see how to start working with a technology that we don't touch on a daily basis.

Also worth reading:

>> Kong Enterprise – the service control platform [blog.codecentric.de]
>> How to restore a Neo4J backup on managed Kubernetes (Video) [blog.sebastian-daschner.com]
>> How to backup Neo4J on managed Kubernetes (Video) [blog.sebastian-daschner.com]

3. Musings

>> K-19 or How to not run a team [blog.scottlogic.com]

Lessons learned while building a soviet submarine

Also worth reading:

>> The future is not what it used to be. [blog.scottlogic.com]
>> Adopting omni-channel communication – part I: necessity [blog.codecentric.de]
>> Jumping into Go [andresalmiray.com]
>> Cost-effective batch jobs on AWS’ serverless infrastructure [blog.codecentric.de]
>> Freelancers Aren’t (Yet) Business Owners [daedtech.com]

4. Comics

And my favorite Dilberts of the week:

>> Hate Edits [dilbert.com]

>> Shocking Fake Video [dilbert.com]

>> Disbanding Task Force [dilbert.com]

5. Pick of the Week

>> How to be more productive by working less [markmanson.net]

↧

Circular View Path Error

June 12, 2020, 12:58 am

≫ Next: View Bytecode of a Class File in Java

≪ Previous: Java Weekly, Issue 337

1. Introduction

In this tutorial, we'll look at how we get and resolve Circular View Path errors in a Spring MVC application.

2. Dependencies

To demonstrate this, let's create a simple Spring Boot web project. First, we need to add the Spring Boot web starter dependency in our Maven project file:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>

3. Reproducing the Problem

Then, let's create a simple Spring Boot application with one Controller that resolves to one path:

@Controller
public class CircularViewPathController {

    @GetMapping("/path")
    public String path() {
        return "path";
    }
}

The return value is the view name that will produce response data. In our case, the return value is path which is associated with the path.html template:

<html>
<head>
    <title>path.html</title>
</head>
<body>
    <p>path.html</p>
</body>
</html>

After we start the server, we can reproduce the error by making a GET request to http://localhost:8080/path. The result will be the Circular View Path error:

{"timestamp":"2020-05-22T11:47:42.173+0000","status":500,"error":"Internal Server Error",
"message":"Circular view path [path]: would dispatch back to the current handler URL [/path] 
again. Check your ViewResolver setup! (Hint: This may be the result of an unspecified view, 
due to default view name generation.)","path":"/path"}

4. Solutions

By default, the Spring MVC framework applies the InternalResourceView class as the view resolver. As a result, if the @GetMapping value is the same as the view, the request will fail with the Circular View path error.

One possible solution would be to rename the view and change the return value in the controller method.

@Controller
public class CircularViewPathController {
  @GetMapping("/path")
  public String path() {
    return "path2";
  }
}

If we don't want to rename the view and change the return value in the controller method, then another solution is to choose another view processor for the project.

For the most common cases, we can choose the Thymeleaf Java template engine. Let's add the spring-boot-starter-thymeleaf dependency to the project:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-thymeleaf</artifactId>
</dependency>

After rebuilding the project we can run it again, and the request is successful. In this case, Thymeleaf replaces the InternalResourceView class.

5. Conclusion

In this tutorial, we looked at the Circular View path error, why it happens, and how to resolve the issue. As always, the full source code of the article is available over on GitHub.

↧

View Bytecode of a Class File in Java

June 12, 2020, 1:01 am

≫ Next: When Does JPA Set the Primary Key

≪ Previous: Circular View Path Error

1. Overview

Bytecode analysis is a common practice among Java developers for many reasons, like finding problems with code, code profiling, and searching classes with specific annotations.

In this article, we'll explore ways to view the bytecode of a class file in Java.

2. What Is the Bytecode?

Bytecode is the intermediate representation of a Java program, allowing a JVM to translate a program into machine-level assembly instructions.

When a Java program is compiled, bytecode is generated in the form of a .class file. This .class file contains non-runnable instructions and relies on a JVM to be interpreted.

3. Using javap

The Java command-line comes with the javap tool that displays information about the fields, constructors, and methods of a class file.

Based on the options used, it can disassemble a class and show the instructions that comprise the Java bytecode.

3.1. javap

Let's use the javap command to view the bytecode of the most-common Object class:

$ javap java.lang.Object

The output of the command will show the bare-minimum construct of the Object class:

public class java.lang.Object {
  public java.lang.Object();
  public final native java.lang.Class<?> getClass();
  public native int hashCode();
  public boolean equals(java.lang.Object);
  protected native java.lang.Object clone() throws java.lang.CloneNotSupportedException;
  public java.lang.String toString();
  public final native void notify();
  public final native void notifyAll();
  public final native void wait(long) throws java.lang.InterruptedException;
  public final void wait(long, int) throws java.lang.InterruptedException;
  public final void wait() throws java.lang.InterruptedException;
  protected void finalize() throws java.lang.Throwable;
  static {};
}

By default, the bytecode output will not contain fields/methods with a private access modifier.

3.2. javap -p

To view all classes and members, we can use the -p argument:

public class java.lang.Object {
  public java.lang.Object();
  private static native void registerNatives();
  public final native java.lang.Class<?> getClass();
  public native int hashCode();
  public boolean equals(java.lang.Object);
  protected native java.lang.Object clone() throws java.lang.CloneNotSupportedException;
  // ...
}

Here, we can observe a private method registerNatives is also shown in the bytecode of the Object class.

3.3. javap -v

Similarly, we can use the -v argument to view verbose information like stack size and arguments for methods of the Object class:

Classfile jar:file:/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/lib/rt.jar!/java/lang/Object.class
  Last modified Mar 15, 2017; size 1497 bytes
  MD5 checksum 5916745820b5eb3e5647da3b6cc6ef65
  Compiled from "Object.java"
public class java.lang.Object
  minor version: 0
  major version: 52
  flags: ACC_PUBLIC, ACC_SUPER
Constant pool:
   #1 = Class              #49            // java/lang/StringBuilder
   // ...
{
  public java.lang.Object();
    descriptor: ()V
    flags: ACC_PUBLIC
    Code:
      stack=0, locals=1, args_size=1
         0: return
      LineNumberTable:
        line 37: 0

  public final native java.lang.Class<?> getClass();
    descriptor: ()Ljava/lang/Class;
    flags: ACC_PUBLIC, ACC_FINAL, ACC_NATIVE
    Signature: #26                          // ()Ljava/lang/Class<*>;

  // ...
}
SourceFile: "Object.java"

3.4. javap -c

Also, the javap command allows disassembling the whole Java class by using the -c argument:

Compiled from "Object.java"
public class java.lang.Object {
  public java.lang.Object();
    Code:
       0: return
  public boolean equals(java.lang.Object);
    Code:
       0: aload_0
       1: aload_1
       2: if_acmpne     9
       5: iconst_1
       6: goto          10
       9: iconst_0
      10: ireturn
  protected native java.lang.Object clone() throws java.lang.CloneNotSupportedException;
  // ...
}

Further, the javap command allows us to check the system info, constants, and internal type signatures using various arguments.

We can list all arguments supported by the javap command by using the -help argument.

Now that we've seen a Java command-line solution for viewing the bytecode of a class file, let's examine a few bytecode-manipulation libraries.

4. Using ASM

ASM is a popular performance-oriented, low-level Java bytecode manipulation and analysis framework.

4.1. Setup

First, let's add the latest asm and asm-util Maven dependencies to our pom.xml:

<dependency>
    <groupId>org.ow2.asm</groupId>
    <artifactId>asm</artifactId>
    <version>8.0.1</version>
</dependency>
<dependency>
    <groupId>org.ow2.asm</groupId>
    <artifactId>asm-util</artifactId>
    <version>8.0.1</version>
</dependency>

4.2. View Bytecode

Then, we'll use the ClassReader and TraceClassVisitor to view the bytecode of the Object class:

try {
    ClassReader reader = new ClassReader("java.lang.Object");
    StringWriter sw = new StringWriter();
    TraceClassVisitor tcv = new TraceClassVisitor(new PrintWriter(System.out));
    reader.accept(tcv, 0);
} catch (IOException e) {
    e.printStackTrace();
}

Here, we'll note that the TraceClassVisitor object requires the PrintWriter object to extract and produce the bytecode:

// class version 52.0 (52)
// access flags 0x21
public class java/lang/Object {

  // compiled from: Object.java

  // access flags 0x1
  public <init>()V
   L0
    LINENUMBER 37 L0
    RETURN
    MAXSTACK = 0
    MAXLOCALS = 1

  // access flags 0x101
  public native hashCode()I

  // access flags 0x1
  public equals(Ljava/lang/Object;)Z
   L0
    LINENUMBER 149 L0
    ALOAD 0
    ALOAD 1
    IF_ACMPNE L1
    ICONST_1
    GOTO L2
   L1

    // ...
}

5. Using BCEL

The Byte Code Engineering Library, popularly known as Apache Commons BCEL, provides a convenient way to create/manipulate Java class files.

5.1. Maven Dependency

As usual, let's add the latest bcel Maven dependency to our pom.xml:

<dependency>
    <groupId>org.apache.bcel</groupId>
    <artifactId>bcel</artifactId>
    <version>6.5.0</version>
</dependency>

5.2. Disassemble Class and View Bytecode

Then, we can use the Repository class to generate the JavaClass object:

try { 
    JavaClass objectClazz = Repository.lookupClass("java.lang.Object");
    System.out.println(objectClazz.toString());
} catch (ClassNotFoundException e) { 
    e.printStackTrace(); 
}

Here, we've used the toString method on the objectClazz object to see bytecode in a concise format:

public class java.lang.Object
file name		java.lang.Object
compiled from		Object.java
compiler version	52.0
access flags		33
constant pool		78 entries
ACC_SUPER flag		true

Attribute(s):
    SourceFile: Object.java

14 methods:
    public void <init>()
    private static native void registerNatives()
    public final native Class getClass() [Signature: ()Ljava/lang/Class<*>;]
    public native int hashCode()
    public boolean equals(Object arg1)
    protected native Object clone()
      throws Exceptions: java.lang.CloneNotSupportedException
    public String toString()
    public final native void notify()
	
    // ...

Further, the JavaClass class provides methods like getConstantPool, getFields, and getMethods to view the details of the disassembled class.

assertEquals(objectClazz.getFileName(), "java.lang.Object");
assertEquals(objectClazz.getMethods().length, 14);
assertTrue(objectClazz.toString().contains("public class java.lang.Object"));

Similarly, set* methods are available for bytecode manipulation.

6. Using Javassist

Also, we can use the Javassist (Java Programming Assistant) library that provides high-level APIs to view/manipulate Java bytecode.

6.1. Maven Dependency

First, we'll add the latest javassist Maven dependency to our pom.xml:

<dependency>
    <groupId>org.javassist</groupId>
    <artifactId>javassist</artifactId>
    <version>3.27.0-GA</version>
</dependency>

6.2. Generate ClassFile

Then, we can use the ClassPool and ClassFile classes to generate a Java class:

try {
    ClassPool cp = ClassPool.getDefault();
    ClassFile cf = cp.get("java.lang.Object").getClassFile();
    cf.write(new DataOutputStream(new FileOutputStream("Object.class")));
} catch (NotFoundException e) {
    e.printStackTrace();
}

Here, we've used the write method, which allows us to write the class file using the DataOutputStream object:

// Compiled from Object.java (version 1.8 : 52.0, super bit)
public class java.lang.Object {
  
  // Method descriptor #19 ()V
  // Stack: 0, Locals: 1
  public Object();
    0  return
      Line numbers:
        [pc: 0, line: 37]
  
  // Method descriptor #19 ()V
  private static native void registerNatives();
  
  // Method descriptor #24 ()Ljava/lang/Class;
  // Signature: ()Ljava/lang/Class<*>;
  public final native java.lang.Class getClass();
  
  // Method descriptor #28 ()I
  public native int hashCode();
  
  // ...

Also, the object of the ClassFile class provides access to the constant pool, fields, and methods:

assertEquals(cf.getName(), "java.lang.Object"); 
assertEquals(cf.getMethods().size(), 14);

7. Jclasslib

Additionally, we can use an IDE based plugin to view the bytecode of a class file. For instance, let's explore the jclasslib Bytecode viewer plugin available for IntelliJ IDEA.

7.1. Installation

First, we'll install the plugin using the Settings/Preferences dialog:

7.2. View Bytecode of the Object Class

Then, we can choose “Show Bytecode With Jclasslib” option under the View menu to view bytecode of the selected Object class:

Next, a dialog will open to show the bytecode of the Object class:

7.3. View Details

Also, we can see various details of the bytecode like constant pool, fields, and methods using the Jclasslib plugin dialog:

Similarly, we have the Bytecode Visualizer Plugin to view the bytecode of a class file using the Eclipse IDE.

8. Conclusion

In this tutorial, we explored ways to view the bytecode of a class file in Java.

First, we examined the javap command along with its various arguments. Then, we went through a few bytecode manipulation libraries that provide the features to view and manipulate the bytecode.

Last, we looked into an IDE based plugin Jclasslib that allows us to view bytecode in IntelliJ IDEA.

As usual, all the code implementations are available over on GitHub.

↧

When Does JPA Set the Primary Key

June 12, 2020, 7:54 am

≫ Next: Spring YAML vs Properties

≪ Previous: View Bytecode of a Class File in Java

1. Overview

In this tutorial, we'll illustrate the moment when JPA assigns a value to the primary key. We'll clarify what the JPA specification says, and then, we'll show examples using various JPA strategies for primary key generation.

2. Problem Statement

As we know, JPA (Java Persistence API) uses the EntityManager to manage the lifecycle of an Entity. At some point, the JPA provider needs to assign acvalue to the primary key. So, we may find ourselves asking, when does this happen? And where is the documentation that states this?

The JPA specification says:

A new entity instance becomes both managed and persistent by invoking the persist method on it or by cascading the persist operation.

So, we'll focus on the EntityManager.persist() method in this article.

3. The Generate Value Strategy

When we invoke the EntityManager.persist() method, the entity's state is changed according to the JPA specification:

If X is a new entity, it becomes managed. The entity X will be entered into the database at or before transaction commit or as a result of the flush operation.

This means there are various ways to generate the primary key. Generally, there are two solutions:

Pre-allocate the primary key
Allocate primary key after persisting in the database

To be more specific, JPA offers four strategies to generate the primary key:

GenerationType.AUTO
GenerationType.IDENTITY
GenerationType.SEQUENCE
GenerationType.TABLE

Let's take a look at them one by one.

3.1. GenerationType.AUTO

AUTO is the default strategy for @GeneratedValue. If we just want to have a primary key, we can use the AUTO strategy. The JPA provider will choose an appropriate strategy for the underlying database:

@Entity
@Table(name = "app_admin")
public class Admin {

    @Id
    @GeneratedValue
    private Long id;

    @Column(name = "admin_name")
    private String name;

    // standard getters and setters
}

3.2. GenerationType.IDENTITY

The IDENTITY strategy relies on the database auto-increment column. The database generates the primary key after each insert operation. JPA assigns the primary key value after performing the insert operation or upon transaction commit:

@Entity
@Table(name = "app_user")
public class User {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;

    @Column(name = "user_name")
    private String name;

    // standard getters and setters
}

Here, we verify the id values before and after the transaction commit:

@Test
public void givenIdentityStrategy_whenCommitTransction_thenReturnPrimaryKey() {
    User user = new User();
    user.setName("TestName");
        
    entityManager.getTransaction().begin();
    entityManager.persist(user);
    Assert.assertNull(user.getId());
    entityManager.getTransaction().commit();

    Long expectPrimaryKey = 1L;
    Assert.assertEquals(expectPrimaryKey, user.getId());
}

The IDENTITY strategy is supported by MySQL, SQL Server, PostgreSQL, DB2, Derby, and Sybase.

3.3. GenerationType.SEQUENCE

By using the SEQUENCE strategy, JPA generates the primary key using a database sequence. We first need to create a sequence on the database side before applying this strategy:

CREATE SEQUENCE article_seq
  MINVALUE 1
  START WITH 50
  INCREMENT BY 50

JPA sets the primary key after we invoke the EntityManager.persist() method and before we commit the transaction.

Let's define an Article entity with the SEQUENCE strategy:

@Entity
@Table(name = "article")
public class Article {
    
    @Id
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "article_gen")
    @SequenceGenerator(name="article_gen", sequenceName="article_seq")
    private Long id;

    @Column(name = "article_name")
    private String name

    // standard getters and setters
}

The sequence starts from 50, so the first id will be the next value, 51.

Now, let's test the SEQUENCE strategy:

@Test
public void givenSequenceStrategy_whenPersist_thenReturnPrimaryKey() {
    Article article = new Article();
    article.setName("Test Name");

    entityManager.getTransaction().begin();
    entityManager.persist(article);
    Long expectPrimaryKey = 51L;
    Assert.assertEquals(expectPrimaryKey, article.getId());

    entityManager.getTransaction().commit();
}

The SEQUENCE strategy is supported by Oracle, PostgreSQL, and DB2.

3.4. GenerationType.TABLE

The TABLE strategy generates the primary key from a table and works the same regardless of the underlying database.

We need to create a generator table on the database side to generate the primary key. The table should at least have two columns: one column to represent the generator's name and another to store the primary key value.

Firstly, let's create a generator table:

@Table(name = "id_gen")
@Entity
public class IdGenerator {

    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Long id;
    
    @Column(name = "gen_name")
    private String gen_name;

    @Column(name = "gen_value")
    private Long gen_value;

    // standard getters and setters
}

Then, we need to insert two initial values to the generator table:

INSERT INTO id_gen (gen_name, gen_val) VALUES ('id_generator', 0);
INSERT INTO id_gen (gen_name, gen_val) VALUES ('task_gen', 10000);

JPA assigns the primary key values after calling EntityManager.persist() method and before the transaction commit.

Let's now use the generator table with the TABLE strategy. We can use allocationSize to pre-allocate some primary keys:

@Entity
@Table(name = "task")
public class Task {
    
    @TableGenerator(name = "id_generator", table = "id_gen", pkColumnName = "gen_name", valueColumnName = "gen_value",
        pkColumnValue="task_gen", initialValue=10000, allocationSize=10)
    @Id
    @GeneratedValue(strategy = GenerationType.TABLE, generator = "id_generator")
    private Long id;

    @Column(name = "name")
    private String name;

    // standard getters and setters
}

And the id starts from 10,000 after we invoke the persist method:

@Test
public void givenTableStrategy_whenPersist_thenReturnPrimaryKey() {
    Task task = new Task();
    task.setName("Test Task");

    entityManager.getTransaction().begin();
    entityManager.persist(task);
    Long expectPrimaryKey = 10000L;
    Assert.assertEquals(expectPrimaryKey, task.getId());

    entityManager.getTransaction().commit();
}

4. Conclusion

This article illustrates the moment when JPA sets the primary key under different strategies. In addition, we also learned about the usage of each of these strategies through examples.

The complete code can be found over on GitHub.

↧

Spring YAML vs Properties

June 13, 2020, 12:55 am

≫ Next: Returning the Generated Keys in JDBC

≪ Previous: When Does JPA Set the Primary Key

1. Introduction

YAML is a human-friendly notation used in configuration files. Why would we prefer this data serialization over the properties file in Spring Boot? Besides readability and reduction of repetition, YAML is the perfect language to write Configuration as Code for the deployments.

In the same way, the use of YAML for Spring DevOps facilitates the storage of the configuration variables in the environment as the 12 Factor Authenticator recommends.

In this tutorial, we'll compare Spring YAML versus properties file in order to check the main advantages of using one over the other. But remember, the selection of YAML over properties file configuration is sometimes a decision of personal taste.

2. YAML Notation

YAML stands for a recursive acronym for “YAML Ain't Markup Language“. It provides the following characteristics:

More clarity and human-friendliness
Perfect for hierarchical configuration data
It supports enhance capabilities such as maps, lists, and scalar types

Those capabilities make YAML the perfect companion for Spring configuration files. Let's see how it works!

3. Spring YAML Configuration

As it was mentioned in the previous sections, YAML is an extraordinary data format for configuration files. It's much more readable, and it provides enhanced capabilities over the properties file. Therefore, it makes sense to recommend this notation over the properties file configuration. Furthermore, from version 1.2, YAML is a superset of JSON.

In addition, in Spring the configuration files placed outside the artifact override those inside the packaged jar. Another interesting feature of Spring configuration is the possibility to assign environment variables at runtime. This is extremely important for DevOps deployments.

Spring profiles allow separating the environments and apply different properties to them. YAML adds the possibility to include several profiles in the same file.

In our case, for deployment purposes, we'll have three: testing, development, and production:

spring:
  profiles:
    active:
    - test

---

spring:
  profiles: test
name: test-YAML
environment: testing
servers:
  - www.abc.test.com
  - www.xyz.test.com
  
---

spring:
  profiles: prod
name: prod-YAML
environment: production
servers:
  - www.abc.com
  - www.xyz.com
    
---

spring:
    profiles: dev
name: ${DEV_NAME:dev-YAML}
environment: development
servers:
    - www.abc.dev.com
    - www.xyz.dev.com

Let's now check the spring.profiles.active property which assigns the test environment by default. We can redeploy the artifact using different profiles without building again the source code.

Another interesting feature in Spring is that you can enable the profile via the environment variable:

export SPRING_PROFILES_ACTIVE=dev

We'll see the relevance of this environment variable in the Testing section. Finally, we can configure YAML properties assigning directly the value from the environment:

name: ${DEV_NAME:dev-YAML}

We can see that if no environment variable is configured, a default value test-YAML is used.

4. Reduction of Repetition and Readability

The hierarchical structure of YAML provides ways of reducing the upper levels of the configuration properties file. Let's see the differences with an example:

component:
  idm:
    url: myurl
    user: user
    password: password
    description: >
      this should be a long 
      description
  service:
    url: myurlservice
    token: token
    description: >
      this should be another long 
      description

The same configuration would become redundant using properties file:

component.idm.url=myurl
component.idm.user=user
component.idm.password=password
component.idm.description=this should be a long \
                          description
component.service.url=myurlservice
component.service.token=token
component.service.description=this should be another long \ 
                              description

The hierarchical nature of YAML greatly enhances legibility. It is not only a question of avoiding repetitions but also the indentation, well used, perfectly describes what the configuration is about and what is for. With YAML, as in the case of properties file with a backslash \, it is possible to break the content into multiple lines with > character.

5. Lists and Maps

We can configure lists and maps using YAML and properties file.

There are two ways to assign values and store them in a list:

servers:
  - www.abc.test.com
  - www.xyz.test.com
  
external: [www.abc.test.com, www.xyz.test.com]

Both examples provide the same result. The equivalent configuration using properties file would be more difficult to read:

servers[0]=www.abc.test.com
servers[1]=www.xyz.test.com

external=www.abc.test.com, www.xyz.test.com

Again YAML version is more human-readable and clear.

In the same way, we can configure maps:

map:
  firstkey: key1
  secondkey: key2

6. Testing

Now, let's check if everything is working as expected. If we check the logging of the application, we can see that the environment selected by default is testing:

2020-06-11 13:58:28.846  INFO 10720 --- [main] com.baeldung.yaml.MyApplication: ...
using environment:testing
name:test-YAML
servers:[www.abc.test.com, www.xyz.test.com]
external:[www.abc.test.com, www.xyz.test.com]
map:{firstkey=key1, secondkey=key2}
Idm:
   Url: myurl
   User: user
   Password: password
   Description: this should be a long description

Service:
   Url: myurlservice
   Token: token
   Description: this should be another long description

We can overwrite the name by configuring DEV_NAME in the environment:

export DEV_NAME=new-dev-YAML

We can see that the name of the environment changes executing the application with dev profile:

2020-06-11 17:00:45.459  INFO 19636 --- [main] com.baeldung.yaml.MyApplication: ...
using environment:development
name:new-dev-YAML
servers:[www.abc.dev.com, www.xyz.dev.com]

Let's run for the production environment using SPRING_PROFILES_ACTIVE=prod:

export SPRING_PROFILES_ACTIVE=prod

2020-06-11 17:03:33.074  INFO 20716 --- [main] ...
using environment:production
name:prod-YAML
servers:[www.abc.com, www.xyz.com]

7. Conclusion

In this tutorial, we described the intricacies of the use of YAML configuration compared to the properties file.

We showed that YAML provides human friendliness capabilities, it reduces repetition and is more concise than its properties file variant.

As always, the code is available over on GitHub.

↧

Returning the Generated Keys in JDBC

June 13, 2020, 3:08 am

≫ Next: boolean and boolean[] Memory Layout in the JVM

≪ Previous: Spring YAML vs Properties

1. Overview

In this quick tutorial, we're going to see how we can get the last auto-generated keys with pure JDBC.

2. Setup

In order to be able to execute SQL queries, we're going to use an in-memory H2 database.

For our first step, then, let's add its Maven dependency:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>1.4.200</version>
</dependency>

Also, we'll use a very simple table with just two columns:

public class JdbcInsertIdIntegrationTest {

    private static Connection connection;

    @BeforeClass
    public static void setUp() throws Exception {
        connection = DriverManager.getConnection("jdbc:h2:mem:generated-keys", "sa", "");
        connection
          .createStatement()
          .execute("create table persons(id bigint auto_increment, name varchar(255))");
    }

    @AfterClass
    public static void tearDown() throws SQLException {
        connection
          .createStatement()
          .execute("drop table persons");
        connection.close();
    }

    // omitted
}

Here, we're connecting to the generated-keys in-memory database and creating a table named persons in it.

3. Return Generated Keys Flag

One way to fetch the keys after automatic generation is to pass Statement.RETURN_GENERATED_KEYS to the prepareStatement() method:

String QUERY = "insert into persons (name) values (?)";
try (PreparedStatement statement = connection.prepareStatement(QUERY, Statement.RETURN_GENERATED_KEYS)) {
    statement.setString(1, "Foo");
    int affectedRows = statement.executeUpdate();
    assertThat(affectedRows).isPositive();

    // omitted
} catch (SQLException e) {
    // handle the database related exception appropriately
}

After preparing and executing the query, we can call the getGeneratedKeys() method on the PreparedStatement to get the id:

try (ResultSet keys = statement.getGeneratedKeys()) {
    assertThat(keys.next()).isTrue();
    assertThat(keys.getLong(1)).isGreaterThanOrEqualTo(1);
}

As shown above, we first call the next() method to move the result cursor. Then we use the getLong() method to get the first column and convert it to long at the same time.

Moreover, it's also possible to use the same technique with normal Statements:

try (Statement statement = connection.createStatement()) {
    String query = "insert into persons (name) values ('Foo')";
    int affectedRows = statement.executeUpdate(query, Statement.RETURN_GENERATED_KEYS);
    assertThat(affectedRows).isPositive();

    try (ResultSet keys = statement.getGeneratedKeys()) {
        assertThat(keys.next()).isTrue();
        assertThat(keys.getLong(1)).isGreaterThanOrEqualTo(1);
    }
}

Also, it's worth mentioning that we're using try-with-resources extensively to let the compiler to clean up after us.

4. Returning Columns

As it turns out, we can also ask JDBC to return specific columns after issuing a query. In order to do that, we just have to pass an array of column names:

try (PreparedStatement statement = connection.prepareStatement(QUERY, new String[] { "id" })) {
    statement.setString(1, "Foo");
    int affectedRows = statement.executeUpdate();
    assertThat(affectedRows).isPositive();

    // omitted
}

As shown above, we're telling the JDBC to return the value of id column after executing the given query. Similar to the previous example, we can fetch the id afterward:

try (ResultSet keys = statement.getGeneratedKeys()) {
    assertThat(keys.next()).isTrue();
    assertThat(keys.getLong(1)).isGreaterThanOrEqualTo(1);
}

We can use the same approach with simple Statements, too:

try (Statement statement = connection.createStatement()) {
    int affectedRows = statement.executeUpdate("insert into persons (name) values ('Foo')", 
      new String[] { "id" });
    assertThat(affectedRows).isPositive();

    try (ResultSet keys = statement.getGeneratedKeys()) {
        assertThat(keys.next()).isTrue();
        assertThat(keys.getLong(1)).isGreaterThanOrEqualTo(1);
    }
}

5. Conclusion

In this quick tutorial, we saw how we can fetch the generated keys after query execution with pure JDBC.

As usual, all the examples are available over on GitHub.

↧

boolean and boolean[] Memory Layout in the JVM

June 15, 2020, 10:42 pm

≫ Next: Writing IntelliJ IDEA Plugins Using Gradle

≪ Previous: Returning the Generated Keys in JDBC

1. Overview

In this quick article, we're going to see what is the footprint of a boolean value in the JVM in different circumstances.

First, we'll inspect the JVM to see the object sizes. Then, we'll understand the rationale behind those sizes.

2. Setup

To inspect the memory layout of objects in the JVM, we're going to use the Java Object Layout (JOL) extensively. Therefore, we need to add the jol-core dependency:

<dependency>
    <groupId>org.openjdk.jol</groupId>
    <artifactId>jol-core</artifactId>
    <version>0.10</version>
</dependency>

3. Object Sizes

If we ask JOL to print the VM details in terms of Object Sizes:

System.out.println(VM.current().details());

When the compressed references are enabled (the default behavior), we'll see the output:

# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

In the first few lines, we can see some general information about the VM. After that, we learn about object sizes:

Java references consume 4 bytes, booleans/bytes are 1 byte, chars/shorts are 2 bytes, ints/floats are 4 bytes, and finally, longs/doubles are 8 bytes
These types consume the same amount of memory even when we use them as array elements

So, in the presence of compressed references, each boolean value takes 1 byte. Similarly, each boolean in a boolean[] consumes 1 byte. However, alignment paddings and object headers can increase the space consumed by boolean and boolean[] as we'll see later.

3.1. No Compressed References

Even if we disable the compressed references via -XX:-UseCompressedOops, the boolean size won't change at all:

# Field sizes by type: 8, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 8, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]

On the other hand, Java references are taking twice the memory.

So despite what we might expect at first, booleans are consuming 1 byte instead of just 1 bit.

3.2. Word Tearing

In most architecture, there is no way to access a single bit atomically. Even if we wanted to do so, we probably would end up writing to adjacent bits while updating another one.

One of the design goals of the JVM is to prevent this phenomenon, known as word tearing. That is, in the JVM, every field and array element should be distinct; updates to one field or element must not interact with reads or updates of any other field or element.

To recap, addressability issues and word tearing are the main reasons why booleans are more than just one single bit.

4. Ordinary Object Pointers (OOPs)

Now that we know booleans are 1 byte, let's consider this simple class:

class BooleanWrapper {
    private boolean value;
}

If we inspect the memory layout of this class using JOL:

System.out.println(ClassLayout.parseClass(BooleanWrapper.class).toPrintable());

Then JOL will print the memory layout:

 OFFSET  SIZE      TYPE DESCRIPTION                               VALUE
      0    12           (object header)                           N/A
     12     1   boolean BooleanWrapper.value                      N/A
     13     3           (loss due to the next object alignment)
Instance size: 16 bytes
Space losses: 0 bytes internal + 3 bytes external = 3 bytes total

The BooleanWrapper layout consists of:

12 bytes for the header, including two mark words and one klass word. The HotSpot JVM uses the mark word to store the GC metadata, identity hashcode, and locking information. Also, it uses the klass word to store class metadata such as runtime type checks
1 byte for the actual boolean value
3 bytes of padding for alignment purposes

By default, object references should be aligned by 8 bytes. Therefore, the JVM adds 3 bytes to 13 bytes of header and boolean to make it 16 bytes.

Therefore, boolean fields may consume more memory because of their field alignment.

4.1. Custom Alignment

If we change the alignment value to 32 via -XX:ObjectAlignmentInBytes=32, then the same class layout changes to:

OFFSET  SIZE      TYPE DESCRIPTION                               VALUE
      0    12           (object header)                           N/A
     12     1   boolean BooleanWrapper.value                      N/A
     13    19           (loss due to the next object alignment)
Instance size: 32 bytes
Space losses: 0 bytes internal + 19 bytes external = 19 bytes total

As shown above, the JVM adds 19 bytes of padding to make the object size a multiple of 32.

5. Array OOPs

Let's see how the JVM lays out a boolean array in memory:

boolean[] value = new boolean[3];
System.out.println(ClassLayout.parseInstance(value).toPrintable());

This will print the instance layout as following:

OFFSET  SIZE      TYPE DESCRIPTION                              
      0     4           (object header)  # mark word
      4     4           (object header)  # mark word
      8     4           (object header)  # klass word
     12     4           (object header)  # array length
     16     3   boolean [Z.<elements>    # [Z means boolean array                        
     19     5           (loss due to the next object alignment)

In addition to two mark words and one klass word, array pointers contain an extra 4 bytes to store their lengths.

Since our array has three elements, the size of the array elements is 3 bytes. However, these 3 bytes will be padded by 5 field alignment bytes to ensure proper alignment.

Although each boolean element in an array is just 1 byte, the whole array consumes much more memory. In other words, we should consider the header and padding overhead while computing the array size.

6. Conclusion

In this quick tutorial, we saw that boolean fields are consuming 1 byte. Also, we learned that we should consider the header and padding overheads in object sizes.

For a more detailed discussion, it's highly recommended to check out the oops section of the JVM source code. Also, Aleksey Shipilëv has a much more in-depth article in this area.

As usual, all the examples are available over on GitHub.

↧

Writing IntelliJ IDEA Plugins Using Gradle

June 17, 2020, 12:59 am

≫ Next: Exploring JVM Tuning Flags

≪ Previous: boolean and boolean[] Memory Layout in the JVM

1. Introduction

Over the past few years, IntelliJ from JetBrains has quickly become the top IDE for Java developers. In our most recent State of Java report, IntelliJ was the IDE of choice for 61% of respondents, up from 55% the year before.

One feature that makes IntelliJ so appealing to Java developers is the ability to extend and create new functionality using plugins.

In this tutorial, we'll look at writing an IntelliJ plugin using the new recommended way with Gradle to demonstrate a few ways we can extend the IDE. This article is a re-mix of a previous one that describes the creation of the same plugin using the Plugin Devkit.

2. Main Types of Plugins

The most common types of plugins include functionality for:

Custom language support: the ability to write, interpret, and compile code written in different languages
Framework integration: support for third-party frameworks such as Spring
Tool integration: integration with external tools such as Gradle
User interface add-ons: new menu items, tool windows, progress bars, and more

Plugins will often fall into multiple categories. For example, the Git plugin that ships with IntelliJ interacts with the git executable installed on the system. The plugin provides its tool window and popup menu items, while also integrating into the project creation workflow, preferences window, and more.

3. Create a Plugin

There are two supported ways of creating plugins. We'll use the recommended way for new projects with Gradle instead of using their Plugin Devkit.

Creating a Gradle-based plugin is done by using the New > Project menu.

Note that we must include Java and the IntelliJ Platform Plugin to ensure the required plugin classes are available on the classpath.

As of this writing, we can only use JDK 8 for writing IntelliJ plugins.

4. Example Plugin

We'll create a plugin that provides quick access to the popular Stack Overflow website from multiple areas in the IDE. It'll include:

a Tools menu item to visit the Ask a Question page
a popup menu item in both text editor and console output to search Stack Overflow for highlighted text

4.1. Creating Actions

Actions are the most common way to access a plugin. Actions get triggered by events in the IDE, such as clicking a menu item or a toolbar button.

The first step in creating an action is to create a Java class that extends AnAction. For our Stack Overflow plugin, we'll create two actions.

The first action opens the Ask a Question page in a new browser window:

public class AskQuestionAction extends AnAction {
    @Override
    public void actionPerformed(AnActionEvent e) {
        BrowserUtil.browse("https://stackoverflow.com/questions/ask");
    }
}

We use the built-in BrowserUtil class to handle all the nuances of opening a web page on different operating systems and browsers.

We need two parameters to perform a search on StackOverflow: the language tag and the text to search for.

To get the language tag, we'll use the Program Structure Interface (PSI). This API parses all the files in a project and provides a programmatic way to inspect them.

In this case, we use the PSI to determine the programming language of a file:

Optional<PsiFile> psiFile = Optional.ofNullable(e.getData(LangDataKeys.PSI_FILE));
String languageTag = psiFile.map(PsiFile::getLanguage)
  .map(Language::getDisplayName)
  .map(String::toLowerCase)
  .map(lang -> "[" + lang + "]")
  .orElse("");

To get the text to search for, we'll use the Editor API to retrieve highlighted text on the screen:

Editor editor = e.getRequiredData(CommonDataKeys.EDITOR);
CaretModel caretModel = editor.getCaretModel();
String selectedText = caretModel.getCurrentCaret().getSelectedText();

Even though this action is the same for both editor and console windows, accessing the selected text works the same way.

Now, we can put this all together in an actionPerformed declaration:

@Override
public void actionPerformed(@NotNull AnActionEvent e) {
    Optional<PsiFile> psiFile = Optional.ofNullable(e.getData(LangDataKeys.PSI_FILE));
    String languageTag = psiFile.map(PsiFile::getLanguage)
      .map(Language::getDisplayName)
      .map(String::toLowerCase)
      .map(lang -> "[" + lang + "]")
      .orElse("");

    Editor editor = e.getRequiredData(CommonDataKeys.EDITOR);
    CaretModel caretModel = editor.getCaretModel();
    String selectedText = caretModel.getCurrentCaret().getSelectedText();

    BrowserUtil.browse("https://stackoverflow.com/search?q=" + languageTag + selectedText);
}

This action also overrides a second method named update, which allows us to enable or disable the action under different conditions. In this case, we disable the search action if there is no selected text:

Editor editor = e.getRequiredData(CommonDataKeys.EDITOR);
CaretModel caretModel = editor.getCaretModel();
e.getPresentation().setEnabledAndVisible(caretModel.getCurrentCaret().hasSelection());

4.2. Registering Actions

Once we have our actions written, we need to register them with the IDE. There are two ways to do this.

The first way is using the plugin.xml file, which is created for us when we start a new project.

By default, the file will have an empty <actions> element, which is where we'll add our actions:

<actions>
    <action
      id="StackOverflow.AskQuestion.ToolsMenu"
      class="com.baeldung.intellij.stackoverflowplugin.AskQuestionAction"
      text="Ask Question on Stack Overflow"
      description="Ask a Question on Stack Overflow">
        <add-to-group group-id="ToolsMenu" anchor="last"/>
    </action>
    <action
      id="StackOverflow.Search.Editor"
      class="com.baeldung.intellij.stackoverflowplugin.SearchAction"
      text="Search on Stack Overflow"
      description="Search on Stack Overflow">
        <add-to-group group-id="EditorPopupMenu" anchor="last"/>
    </action>
    <action
      id="StackOverflow.Search.Console"
      class="com.baeldung.intellij.stackoverflowplugin.SearchAction"
      text="Search on Stack Overflow"
      description="Search on Stack Overflow">
        <add-to-group group-id="ConsoleEditorPopupMenu" anchor="last"/>
    </action>
</actions>

Using the XML file to register actions will ensure they register during IDE startup, which is usually preferable.

The second way to register actions is programmatically using the ActionManager class:

ActionManager.getInstance().registerAction("StackOverflow.SearchAction", new SearchAction());

This has the advantage of letting us dynamically register actions. For example, if we write a plugin to integrate with a remote API, we might want to register a different set of actions based on the version of the API that we call.

The disadvantage of this approach is that actions do not register at startup. We have to create an instance of ApplicationComponent to manage actions, which requires more coding and XML configuration.

5. Testing the Plugin

As with any program, writing an IntelliJ plugin requires testing. For a small plugin like the one we've written, it's sufficient to ensure the plugin compiles and that the actions we created work as expected when we click them.

We can manually test (and debug) our plugin by opening the Gradle tool window and executing the runIde task:

This will launch a new instance of IntelliJ with our plugin activated. Doing so allows us to click the different menu items we created and ensure the proper Stack Overflow pages open up.

If we wish to do more traditional unit testing, IntelliJ provides a headless environment to run unit tests. We can write tests using any test framework we want, and the tests run using real, unmocked components from the IDE.

6. Deploying the Plugin

The Gradle Plugin provides a simple way to package plugins so we can install and distribute them. Simply open the Gradle tool window and execute the buildPlugin task. This will generate a ZIP file inside the build/distributions directory.

The generated ZIP file contains the code and configuration files needed to load into IntelliJ. We can install it locally, or publish it to a plugin repository for use by others.

The screenshot below shows one of the new Stack Overflow menu items in action:

7. Conclusion

In this article, we developed a simple plugin that highlights how we can enhance the IntelliJ IDE.

While we primarily worked with actions, the IntelliJ plugin SDK offers several ways to add new functionality to the IDE. For further reading, check out their official getting started guide.

As always, the full code for our sample plugin can be found over on GitHub.

↧

Exploring JVM Tuning Flags

June 17, 2020, 1:05 am

≫ Next: Event-Driven Data with Apache Druid

≪ Previous: Writing IntelliJ IDEA Plugins Using Gradle

1. Overview

It's possible to tune the HotSpot JVM with a variety of tuning flags. As there are hundreds of such flags, keeping track of them and their default values can be a little daunting.

In this tutorial, we're going to introduce a few ways to discover such tuning flags and learn how to work with them.

2. Overview of Java Options

The java command supports a wide variety of flags falling into the following categories:

Standard options that are guaranteed to be supported by all JVM implementations out there. Usually, these options are used for everyday actions such as –classpath, -cp, –version, and so on
Extra options that are not supported by all JVM implementations and are usually subject to change. These options start with -X

Please note that we shouldn't use these extra options casually. Moreover, some of those additional options are more advanced and begin with -XX.

Throughout this article, we're going to focus on more advanced -XX flags.

3. JVM Tuning Flags

To list the global JVM tuning flags, we can enable the PrintFlagsFinal flag as follows:

>> java -XX:+PrintFlagsFinal -version
[Global flags]
    uintx CodeCacheExpansionSize                   = 65536                                  {pd product} {default}
     bool CompactStrings                           = true                                   {pd product} {default}
     bool DoEscapeAnalysis                         = true                                   {C2 product} {default}
   double G1ConcMarkStepDurationMillis             = 10.000000                                 {product} {default}
   size_t G1HeapRegionSize                         = 1048576                                   {product} {ergonomic}
    uintx MaxHeapFreeRatio                         = 70                                     {manageable} {default}

// truncated
openjdk version "14" 2020-03-17
OpenJDK Runtime Environment (build 14+36-1461)
OpenJDK 64-Bit Server VM (build 14+36-1461, mixed mode, sharing)

As shown above, some flags have default values for this particular JVM version.

Default values for some flags may be different on different platforms, which is shown in the final column. For instance, the product means that the default setting of the flag is uniform across all platforms; the pd product means that the default setting of the flag is platform-dependent. The manageable values can be changed dynamically at runtime.

3.1. Diagnostic Flags

The PrintFlagsFinal flag, however, does not show all possible tuning flags. For instance, to also see diagnostic tuning flags, we should add the UnlockDiagnosticVMOptions flag:

>> java -XX:+PrintFlagsFinal -version | wc -l
557

>> java -XX:+PrintFlagsFinal -XX:+UnlockDiagnosticVMOptions -version | wc -l
728

Clearly, there are a couple hundred more flags when we're including diagnostic options. For example, printing native memory tracking stats is only available as part of diagnostic flags:

bool PrintNMTStatistics                       = false                                  {diagnostic} {default}

3.2. Experimental Flags

To also see experimental options, we should add the UnlockExperimentalVMOptions flag:

>> java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version | wc -l
809

3.3. JVMCI Flags

As of Java 9, the JVM compiler interface or JVMCI enables us to use a compiler written in Java, such as Graal, as a dynamic compiler.

To see options related to JVMCI, we should add a few more flags and also even enable the JVMCI:

>> java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions \
>> -XX:+JVMCIPrintProperties -XX:+EnableJVMCI -XX:+PrintFlagsFinal  -version | wc -l
1516

Most of the time, however, using global, diagnostic, and experimental options should suffice and will help us to find the flag we have in mind.

3.4. Putting it All Together

These combinations of options can help us to find a tuning flag, especially when we don't remember the exact name. For instance, to find the tuning flag related to soft references in Java:

>> alias jflags="java -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal  -version"
>> jflags | grep Soft
size_t SoftMaxHeapSize                          = 4294967296                             {manageable} {ergonomic}
intx SoftRefLRUPolicyMSPerMB                    = 1000                                   {product} {default}

From the result, we can easily guess that SoftRefLRUPolicyMSPerMB is the flag we're looking for.

4. Different Types of Flags

In the previous section, we glossed over an important subject: the flag types. Let's take another look at the java -XX:+PrintFlagsFinal -version output:

[Global flags]
    uintx CodeCacheExpansionSize                   = 65536                                  {pd product} {default}
     bool CompactStrings                           = true                                   {pd product} {default}
     bool DoEscapeAnalysis                         = true                                   {C2 product} {default}
   double G1ConcMarkStepDurationMillis             = 10.000000                                 {product} {default}
   size_t G1HeapRegionSize                         = 1048576                                   {product} {ergonomic}
    uintx MaxHeapFreeRatio                         = 70                                     {manageable} {default}
// truncated

As shown above, each flag has a specific type.

Boolean options are used to either enable or disable a feature. Such options don't require a value. To enable them, we just have to put a plus sign before the option name:

-XX:+PrintFlagsFinal

On the contrary, to disable them, we have to add a minus sign before their name:

-XX:-RestrictContended

Other flag types need an argument value. It's possible to separate the value from the option name by a space, a colon, an equal sign, or the argument may directly follow the option name (the exact syntax differs for each option):

-XX:ObjectAlignmentInBytes=16 -Xms5g -Xlog:gc

5. Documentation and Source Code

Finding the right flag name is one thing. Finding what that particular flag is doing under the hood is another story.

One way to find out these sorts of details is by looking at the documentation. For instance, the documentation for the java command in the JDK tools specification section is a great place to start.

Sometimes, no amount of documentation can beat the source code. Therefore, if we have the name of a particular flag, then we can explore the JVM source code to find out what's going on.

For instance, we can check out the HotSpot JVM's source code from GitHub or even their Mercurial repository and then:

>> git clone git@github.com:openjdk/jdk14u.git openjdk
>> cd openjdk/src/hotspot
>> grep -FR 'PrintFlagsFinal' .
./share/runtime/globals.hpp:  product(bool, PrintFlagsFinal, false,                                   
./share/runtime/init.cpp:  if (PrintFlagsFinal || PrintFlagsRanges) {

Here we're looking for all files containing the PrintFlagsFinal string. After finding the responsible files, we can look around and see how that specific flag works.

6. Conclusion

In this article, we saw how we could find almost all available JVM tuning flags and also learned a few tricks to work with them more effectively.

↧

Event-Driven Data with Apache Druid

June 17, 2020, 12:50 pm

≫ Next: Testing @Cacheable on Spring Data Repositories

≪ Previous: Exploring JVM Tuning Flags

1. Introduction

In this tutorial, we'll understand how to work with event data and Apache Druid. We will cover the basics of event data and Druid architecture. As part of that, we'll create a simple data pipeline leveraging various features of Druid that covers various modes of data ingestion and different ways to query the prepared data.

2. Basic Concepts

Before we plunge into the operation details of Apache Druid, let's first go through some of the basic concepts. The space we are interested in is real-time analytics of event data on a massive scale.

Hence, it's imperative to understand what we mean by event data and what does it require to analyze them in real-time at scale.

2.1. What Is Event Data?

Event data refers to a piece of information about a change that occurs at a specific point in time. Event data is almost ubiquitous in present-day applications. From the classical application logs to modern-day sensor data generated by things, it's practically everywhere. These are often characterized by machine-readable information generated at a massive scale.

They power several functions like prediction, automation, communication, and integration, to name a few. Additionally, they are of significance in event-driven architecture.

2.2. What Is Apache Druid?

Apache Druid is a real-time analytics database designed for fast analytics over event-oriented data. Druid was started in 2011, open-sourced under the GPL license in 2012, and moved to Apache License in 2015. It's managed by the Apache Foundation with community contributions from several organizations. It provides real-time ingestion, fast query performance, and high availability.

The name Druid refers to the fact that its architecture can shift to solve different types of data problems. It's often employed in business intelligence applications to analyze a high volume of real-time and historical data.

3. Druid Architecture

Druid is a column-oriented and distributed data source written in Java. It's capable of ingesting massive amounts of event data and offering low-latency queries on top of this data. Moreover, it offers the possibility to slice and dice the data arbitrarily.

It's quite interesting to understand how Druid architecture supports these features. In this section, we'll go through some of the important parts of Druid architecture.

3.1. Data Storage Design

It's important to understand how Druid structures and stores its data, which allows for partitioning and distribution. Druid partitions the data by default during processing and stores them into chunks and segments:

Druid stores data in what we know as “datasource”, which is logically similar to tables in relational databases. A Druid cluster can handle multiple data sources in parallel, ingested from various sources.

Each datasource is partitioned — based on time, by default, and further based on other attributes if so configured. A time range of data is known as a “chunk” — for instance, an hour's data if data is partitioned by the hour.

Every chunk is further partitioned into one or more “segments”, which are single files comprising of many rows of data. A datasource may have anywhere from a few segments to millions of segments.

3.2. Druid Processes

Druid has a multi-process and distributed architecture. Hence, each process can be scaled independently, allowing us to create flexible clusters. Let's understand the important processes that are part of Druid:

Coordinator: This process is mainly responsible for segment management and distribution and communicates with historical processes to load or drop segments based on configurations
Overlord: This is the main process that is responsible for accepting tasks, coordinating task distribution, creating locks around tasks, and returning status to callers
Broker: This is the process to which all queries are sent to be executed in a distributed cluster; it collects metadata from Zookeeper and routes queries to processes having the right segments
Router: This is an optional process that can be used to route queries to different broker processes, thus providing query isolation to queries for more important data
Historical: These are the processes that store queryable data; they maintain a constant connection with Zookeeper and watch for segment information that they have to load and serve
MiddleManager: These are the worker processes that execute the submitted tasks; they forward the tasks to Peons running in separate JVMs, thus providing resource and log isolation

3.3. External Dependencies

Apart from the core processes, Druid depends on several external dependencies for its cluster to function as expected.

Let's see how a Druid cluster is formed together with core processes and external dependencies:

Druid uses deep storage to store any data that has been ingested into the system. These are not used to respond to the queries but used as a backup of data and to transfer data between processes. These can be anything from a local filesystem to a distributed object store like S3 and HDFS.

The metadata storage is used to hold shared system metadata like segment usage information and task information. However, it's never used to store the actual data. It's a relational database like Apache Derby, PostgreSQL, or MySQL.

Druid usage Apache Zookeeper for management of the current cluster state. It facilitates a number of operations in a Druid cluster like coordinator/overlord leader election, segment publishing protocol, and segment load/drop protocol.

4. Druid Setup

Druid is designed to be deployed as a scalable, fault-tolerant cluster. However, setting up a production-grade Druid cluster is not trivial. As we have seen earlier, there are many processes and external dependencies to set up and configure. As it's possible to create a cluster in a flexible manner, we must pay attention to our requirements to set up individual processes appropriately.

Also, Druid is only supported in Unix-like environments and not on Windows. Moreover, Java 8 or later is required to run Druid processes. There are several single-server configurations available for setting up Druid on a single machine for running tutorials and examples. However, for running a production workload, it's recommended to set up a full-fledged Druid cluster with multiple machines.

For the purpose of this tutorial, we'll set up Druid on a single machine with the help of the official Docker image published on the Docker Hub. This enables us to run Druid on Windows as well, which, as we have discussed earlier, is not otherwise supported. There is a Docker compose file available, which creates a container for each Druid process and its external dependencies.

We have to provide configuration values to Druid as environment variables. The easiest way to achieve this is to provide a file called “environment” in the same directory as the Docker compose file.

Once we have the Docker compose and the environment file in place, starting up Druid is as simple as running a command in the same directory:

docker-compose up

This will bring up all the containers required for a single-machine Druid setup. We have to be careful to provide enough memory to the Docker machine, as Druid consumes a significant amount of resources.

5. Ingesting Data

The first step towards building a data pipeline using Druid is to load data into Druid. This process is referred to as data ingestion or indexing in Druid architecture. We have to find a suitable dataset to proceed with this tutorial.

Now, as we have gathered so far, we have to pick up data that are events and have some temporal nature, to make the most out of the Druid infrastructure.

The official guide for Druid uses simple and elegant data containing Wikipedia page edits for a specific date. We'll continue to use that for our tutorial here.

5.1. Data Model

Let's begin by examining the structure of the data we have with us. Most of the data pipeline we create is quite sensitive to data anomalies, and hence, it's necessary to clean-up the data as much as possible.

Although there are sophisticated ways and tools to perform data analysis, we'll begin by visual inspection. A quick analysis reveals that the input data has events captured in JSON format, with a single event containing typical attributes:

{
  "time": "2015-09-12T02:10:26.679Z",
  "channel": "#pt.wikipedia",
  "cityName": null,
  "comment": "Houveram problemas na última edição e tive de refazê-las, junto com as atualizações da página.",
  "countryIsoCode": "BR",
  "countryName": "Brazil",
  "isAnonymous": true,
  "isMinor": false,
  "isNew": false,
  "isRobot": false,
  "isUnpatrolled": true,
  "metroCode": null,
  "namespace": "Main",
  "page": "Catarina Muniz",
  "regionIsoCode": null,
  "regionName": null,
  "user": "181.213.37.148",
  "delta": 197,
  "added": 197,
  "deleted": 0
}

While there are quite a number of attributes defining this event, there are a few that are of special interest to us when working with Druid:

Timestamp
Dimensions
Metrics

Druid requires a particular attribute to identify as a timestamp column. In most situations, Druid's data parser is able to automatically detect the best candidate. But we always have a choice to select from, especially if we do not have a fitting attribute in our data.

Dimensions are the attributes that Druid stores as-is. We can use them for any purpose like grouping, filtering, or applying aggregators. We have a choice to select dimensions in the ingestion specification, which we'll discuss further in the tutorial.

Metrics are the attributes that, unlike dimensions, are stored in aggregated form by default. We can choose an aggregation function for Druid to apply to these attributes during ingestion. Together with roll-up enabled, these can lead to compact data representations.

5.2. Ingestion Methods

Now, we'll discuss various ways we can perform the data ingestion in Druid. Typically, event-driven data are streaming in nature, which means they keep generating at various pace over time, like Wikipedia edits.

However, we may have data batched for a period of time to go over, where data is more static in nature, like all Wikipedia edits that happened last year.

We may also have diverse data use cases to solve, and Druid has fantastic support for most of them. Let's go over two of the most common ways to use Druid in a data pipeline:

Streaming Ingestion
Batched Ingestion

The most common way to ingest data in Druid is through the Apache Streaming service, where Druid can read data directly from Kafka. Druid supports other platforms like Kinesis as well. We have to start supervisors on the Overload process, which creates and manages Kafka indexing tasks. We can start the supervisor by submitted a supervisor spec as a JSON file over the HTTP POST command of the Overload process.

Alternatively, we can ingest data in batch — for example, from a local or remote file. It offers a choice for Hadoop-based batch ingestion for ingesting data from the Hadoop filesystem in the Hadoop file format. More commonly, we can choose the native batch ingestion either sequentially or in parallel. It's a more convenient and simpler approach as it does not have any external dependencies.

5.3. Defining the Task Specification

For this tutorial, we'll set up a native batch ingestion task for the input data we have. We have the option of configuring the task from the Druid console, which gives us an intuitive graphical interface. Alternately, we can define the task spec as a JSON file and submit it to the overlord process using a script or the command line.

Let's first define a simple task spec for ingesting our data in a file called wikipedia-index.json:

{
  "type" : "index_parallel",
  "spec" : {
    "dataSchema" : {
      "dataSource" : "wikipedia",
      "dimensionsSpec" : {
        "dimensions" : [
          "channel",
          "cityName",
          "comment",
          "countryIsoCode",
          "countryName",
          "isAnonymous",
          "isMinor",
          "isNew",
          "isRobot",
          "isUnpatrolled",
          "metroCode",
          "namespace",
          "page",
          "regionIsoCode",
          "regionName",
          "user",
          { "name": "added", "type": "long" },
          { "name": "deleted", "type": "long" },
          { "name": "delta", "type": "long" }
        ]
      },
      "timestampSpec": {
        "column": "time",
        "format": "iso"
      },
      "metricsSpec" : [],
      "granularitySpec" : {
        "type" : "uniform",
        "segmentGranularity" : "day",
        "queryGranularity" : "none",
        "intervals" : ["2015-09-12/2015-09-13"],
        "rollup" : false
      }
    },
    "ioConfig" : {
      "type" : "index_parallel",
      "inputSource" : {
        "type" : "local",
        "baseDir" : "quickstart/tutorial/",
        "filter" : "wikiticker-2015-09-12-sampled.json.gz"
      },
      "inputFormat" : {
        "type": "json"
      },
      "appendToExisting" : false
    },
    "tuningConfig" : {
      "type" : "index_parallel",
      "maxRowsPerSegment" : 5000000,
      "maxRowsInMemory" : 25000
    }
  }
}

Let's understand this task spec with respect to the basics we've gone through in previous sub-sections:

We have chosen the index_parallel task, which provides us native batch ingestion in parallel
The datasource we'll be using in this task has the name “wikipedia”
The timestamp for our data is coming from the attribute “time”
There are a number of data attributes we are adding as dimensions
We're not using any metrics for our data in the current task
Roll-up, which is enabled by default, should be disabled for this task
The input source for the task is a local file named wikiticker-2015-09-12-sampled.json.gz
We're not using any secondary partition, which we can define in the tuningConfig

This task spec assumes that we've downloaded the data file wikiticker-2015-09-12-sampled.json.gz and kept it on the local machine where Druid is running. This may be trickier when we're running Druid as a Docker container. Fortunately, Druid comes with this sample data present by default at the location quickstart/tutorial.

5.4. Submitting the Task Specification

Finally, we can submit this task spec to the overlord process through the command line using a tool like curl:

curl -X 'POST' -H 'Content-Type:application/json' -d @wikipedia-index.json http://localhost:8081/druid/indexer/v1/task

Normally, the above command returns the ID of the task if the submission is successful. We can verify the state of our ingestion task through the Druid console or by performing queries, which we'll go through in the next section.

5.5. Advanced Ingestion Concepts

Druid is best suited for when we have a massive scale of data to deal with — certainly not the kind of data we've seen in this tutorial! Now, to enable features at scale, Druid architecture must provide suitable tools and tricks.

While we'll not use them in this tutorial, let's quickly discuss roll-up and partitioning.

Event data can soon grow in size to massive volumes, which can affect the query performance we can achieve. In many scenarios, it may be possible for us to summarise data over time. This is what we know as roll-up in Druid. When roll-up is enabled, Druid makes an effort to roll-up rows with identical dimensions and timestamps during ingestion. While it can save space, roll-up does lead to a loss in query precision, hence, we must use it rationally.

Another potential way to achieve better performance at the face of rising data volume is distributing the data and, hence, the workload. By default, Druid partitions the data based on timestamps into time chunks containing one or more segments. Further, we can decide to do secondary partitioning using natural dimensions to improve data locality. Moreover, Druid sorts data within every segment by timestamp first and then by other dimensions that we configure.

6. Querying Data

Once we have successfully performed the data ingestion, it should be ready for us to query. There are multiple ways to query data in Druid. The simplest way to execute a query in Druid is through the Druid console. However, we can also execute queries by sending HTTP commands or using a command-line tool.

The two prominent ways to construct queries in Druid are native queries and SQL-like queries. We're going to construct some basic queries in both these ways and send them over HTTP using curl. Let's find out how we can create some simple queries on the data we have ingested earlier in Druid.

6.1. Native Queries

Native queries in Druid use JSON objects, which we can send to a broker or a router for processing. We can send the queries over the HTTP POST command, amongst other ways, to do the same.

Let's create a JSON file by the name simple_query_native.json:

{
  "queryType" : "topN",
  "dataSource" : "wikipedia",
  "intervals" : ["2015-09-12/2015-09-13"],
  "granularity" : "all",
  "dimension" : "page",
  "metric" : "count",
  "threshold" : 10,
  "aggregations" : [
    {
      "type" : "count",
      "name" : "count"
    }
  ]
}

This is a simple query that fetches the top ten pages that received the top number of page edits between the 12th and 13th of September, 2019.

Let's post this over HTTP using curl:

curl -X 'POST' -H 'Content-Type:application/json' -d @simple_query_native.json http://localhost:8888/druid/v2?pretty

This response contains the details of the top ten pages in JSON format:

[ {
  "timestamp" : "2015-09-12T00:46:58.771Z",
  "result" : [ {
    "count" : 33,
    "page" : "Wikipedia:Vandalismusmeldung"
  }, {
    "count" : 28,
    "page" : "User:Cyde/List of candidates for speedy deletion/Subpage"
  }, {
    "count" : 27,
    "page" : "Jeremy Corbyn"
  }, {
    "count" : 21,
    "page" : "Wikipedia:Administrators' noticeboard/Incidents"
  }, {
    "count" : 20,
    "page" : "Flavia Pennetta"
  }, {
    "count" : 18,
    "page" : "Total Drama Presents: The Ridonculous Race"
  }, {
    "count" : 18,
    "page" : "User talk:Dudeperson176123"
  }, {
    "count" : 18,
    "page" : "Wikipédia:Le Bistro/12 septembre 2015"
  }, {
    "count" : 17,
    "page" : "Wikipedia:In the news/Candidates"
  }, {
    "count" : 17,
    "page" : "Wikipedia:Requests for page protection"
  } ]
} ]

6.2. Druid SQL

Druid has a built-in SQL layer, which offers us the liberty to construct queries in familiar SQL-like constructs. It leverages Apache Calcite to parse and plan the queries. However, Druid SQL converts the SQL queries to native queries on the query broker before sending them to data processes.

Let's see how we can create the same query as before, but using Druid SQL. As before, we'll create a JSON file by the name simple_query_sql.json:

{
  "query":"SELECT page, COUNT(*) AS counts /
    FROM wikipedia WHERE \"__time\" /
    BETWEEN TIMESTAMP '2015-09-12 00:00:00' AND TIMESTAMP '2015-09-13 00:00:00' /
    GROUP BY page ORDER BY Edits DESC LIMIT 10"
}

Please note that the query has been broken into multiple lines for readability, but it should appear on a single line. Again, as before, we'll POST this query over HTTP, but to a different endpoint:

curl -X 'POST' -H 'Content-Type:application/json' -d @simple_query_sql.json http://localhost:8888/druid/v2/sql

The output should be very similar to what we achieved earlier with the native query.

6.3. Query Types

We saw, in the earlier section, a type of query where we fetched the top ten results for the metric “count” based on an interval. This is just one type of query that Druid supports, and it's known as the TopN query. Of course, we can make this simple TopN query much more interesting by using filters and aggregations. But that is not in the scope of this tutorial. However, there are several other queries in Druid that may interest us.

Some of the popular ones include Timeseries and GroupBy.

Timeseries queries return an array of JSON objects, where each object represents a value as described in the time-series query — for instance, the daily average of a dimension for the last one month.

GroupBy queries return an array of JSON objects, where each object represents a grouping as described in the group-by query. For example, we can query for the daily average of a dimension for the past month grouped by another dimension.

There are several other query types, including Scan, Search, TimeBoundary, SegmentMetadata, and DatasourceMetadata.

6.4. Advanced Query Concepts

Druid offers some complex methods to create sophisticated queries for creating interesting data applications. These include various ways to slice and dice the data while still being able to provide incredible query performance.

While a detailed discussion of them is beyond the scope of this tutorial, let's discuss some of the important ones like Joins and Lookups, Multitenancy, and Query Caching.

Druid supports two ways of joining the data. The first is the join operators, and the second is query-time lookups. However, for better query performance, it's advisable to avoid query-time joins.

Multitenancy refers to the feature of supporting multiple tenants on the same Druid infrastructure while still offering them logical isolation. It's possible to achieve this in Druid through separate data sources per tenant or data partitioning by the tenant.

And finally, query caching is the key to performance in data-intensive applications. Druid supports query result caching at the segment and the query result levels. Further, the cache data can reside in memory or in external persistent storage.

7. Language Bindings

Although Druid has excellent support for creating ingestion specs and defining queries in JSON, it may be tedious sometimes to define these queries in JSON, especially when queries get complex. Unfortunately, Druid doesn't offer a client library in any specific language to help us in this regard. But there are quite a few language bindings that have been developed by the community. One such client library is also available for Java.

We'll quickly see how we can build the TopN query we used earlier using this client library in Java.

Let's begin by defining the required dependency in Maven:

<dependency>
    <groupId>in.zapr.druid</groupId>
    <artifactId>druidry</artifactId>
    <version>2.14</version>
</dependency>

After this, we should be able to use the client library and create our TopN query:

DateTime startTime = new DateTime(2015, 9, 12, 0, 0, 0, DateTimeZone.UTC);
DateTime endTime = new DateTime(2015, 9, 13, 0, 0, 0, DateTimeZone.UTC);
Interval interval = new Interval(startTime, endTime);
Granularity granularity = new SimpleGranularity(PredefinedGranularity.ALL);
DruidDimension dimension = new SimpleDimension("page");
TopNMetric metric = new SimpleMetric("count");
DruidTopNQuery query = DruidTopNQuery.builder()
  .dataSource("wikipedia")
  .dimension(dimension)
  .threshold(10)
  .topNMetric(metric)
  .granularity(granularity)
  .filter(filter)
  .aggregators(Arrays.asList(new LongSumAggregator("count", "count")))
  .intervals(Collections.singletonList(interval)).build();

After this, we can simply generate the required JSON structure, which we can use in the HTTP POST call:

ObjectMapper mapper = new ObjectMapper();
String requiredJson = mapper.writeValueAsString(query);

8. Conclusion

In this tutorial, we went through the basics of event data and Apache Druid architecture.

Further, we set up a primary Druid cluster using Docker containers on our local machine. Then, we also went through the process of ingesting a sample dataset in Druid using the native batch task. After this, we saw the different ways we have to query our data in Druid. Lastly, we went through a client library in Java to construct Druid queries.

We have just scratched the surface of features that Druid has to offer. There are several possibilities in which Druid can help us build our data pipeline and create data applications. The advanced ingestion and querying features are the obvious next steps to learn, for effectively leveraging the power of Druid.

Moreover, creating a suitable Druid cluster that scales the individual processes as per the need should be the target to maximize the benefits.

↧