Getting the Insert ID in JDBC

July 6, 2024, 12:39 pm

≫ Next: Naming Executor Service Threads and Thread Pool in Java

≪ Previous: List All Files on the Remote Server in Java

1. Introduction

When working with JDBC to insert data into a database, retrieving the auto-generated primary key is a common requirement. JDBC provides a mechanism to fetch the insert ID immediately after an insert operation.

This tutorial discusses how to get the insert ID immediately after an insert operation.

2. Setup

Before discussing and implementing logic to get the insert ID, we’ll first discuss the necessary setup steps.

To test our implementation, we’ll use an in-memory H2 database. We can add the h2 database dependency in the pom.xml file:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
    <version>2.1.214</version>
</dependency>

In our test setup, we can connect to the H2 database and populate the database with our sample table, i.e., the Employees table.

private static void populateDB() throws SQLException {
    String createTable = """
        CREATE TABLE EMPLOYEES (
            id SERIAL PRIMARY KEY ,
            first_name VARCHAR(50),
            last_name VARCHAR(50),
            salary DECIMAL(10, 2)
        );
        """;
    PreparedStatement preparedStatement = connection.prepareStatement(createTable);
    preparedStatement.execute();
}

3. Retrieving Insert IDs

When executing the insert statement, if the table has an auto-generated key (such as AUTO_INCREMENT in MySQL, a SERIAL in PostgreSQL, or IDENTITY in the H2 database), JDBC can retrieve these keys using the getGeneratedKeys() method.

For inserting a record, we can use preparedStatement.executeUpdate() which returns the number of rows updated. To fetch the insert IDs we can use Statement.RETURN_GENERATED_KEYS:

String sql = "INSERT INTO employees (first_name, last_name, salary) VALUES (?, ?, ?)";
PreparedStatement statement = connection.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS);
statement.setString(1, "first");
statement.setString(2, "last");
statement.setDouble(3, 100.0);
int numRows = statement.executeUpdate();

Now, we can call statement.getGeneratedKeys() to get the ResultSet, which allows us to fetch the inserted IDs using getLong():

ResultSet generatedKeys = statement.getGeneratedKeys();
List<Long> insertIds = new ArrayList<>();
while(generatedKeys.next()){
    insertIds.add(generatedKeys.getLong(1));
 }

In the above code, getLong(1) retrieves the first generated key from ResultSet. If the insert operation produces multiple generated keys, we can access them using their respective positions. For instance, getLong(2) would fetch the second generated key in the row, getLong(3) would generate get third, and so on. Additionally, we can also access the generated keys with column labels, for example, getLong(“id1”), getLong(“id2”), and so on.

We can verify the result by writing a unit test:

@Test
public void givenDBPopulated_WhenGetInsertIds_ThenReturnsIds() throws SQLException {
    GetInsertIds getInsertIds = new GetInsertIds();
    List<Long> actualIds = getInsertIds.insertAndReturnIds(connection);
    ResultSet resultSet = connection.prepareStatement("select id from employees").executeQuery();
    List<Long> expectedIds = new ArrayList<>();
    while (resultSet.next()){
        expectedIds.add(resultSet.getLong(1));
    }
    assertEquals(expectedIds, actualIds);
}

4. Conclusion

In this article, we discuss the mechanism to get the insert IDs of inserted records using JDBC PreparedStatement. We also implemented the logic and verified it using a unit test.

As usual, the complete source code for the examples is available over on GitHub.

↧

Naming Executor Service Threads and Thread Pool in Java

July 7, 2024, 8:30 am

≫ Next: Consumer Acknowledgments and Publisher Confirms with RabbitMQ

≪ Previous: Getting the Insert ID in JDBC

1. Overview

ExecutorService provides a convenient way to manage threads and execute concurrent tasks in Java. When working with ExecutorService, assigning meaningful names to threads and thread pools can be useful to improve debugging, monitoring, and understanding of threads. In this article, we’ll learn about different ways of naming threads and thread pools in Java’s ExecutorService.

First, we’ll see how the default names of threads are set in ExecutorService. Then, we’ll see different ways to customize the thread name using a custom ThreadFactory, BasicThreadFactory of Apache Commons, and ThreadFactoryBuilder of the Guava library.

2. Naming Threads

The thread name can be set in Java easily if we’re not using an ExecutorService. While ExecutorService uses default thread pool and thread names such as “pool-1-thread-1”, “pool-1-thread-2”, etc., specifying a custom thread name for threads managed by ExecutorService is possible.

First, let’s create a simple program to run an ExecuterService. Later, we’ll see how it displays the default thread and thread pool name:

ExecutorService executorService = Executors.newFixedThreadPool(3);
for (int i = 0; i < 5; i++) {
    executorService.execute(() -> System.out.println(Thread.currentThread().getName()));
}

Now, let’s run the program. We can see the default thread name printed:

pool-1-thread-1
pool-1-thread-2
pool-1-thread-1
pool-1-thread-3
pool-1-thread-2

2.1. Using a Custom ThreadFactory

In ExecutorService, new threads are created using a ThreadFactory. An ExecutorService uses an Executors.defaultThreadFactory to create its threads to execute tasks.

By supplying a different custom ThreadFactory to the ExecuterService, we can alter the thread’s name, priority, etc.

First, let’s create our own MyThreadFactory which implements ThreadFactory. Then, we’ll create a custom name for any new thread created using our MyThreadFactory:

public class MyThreadFactory implements ThreadFactory {
    private AtomicInteger threadNumber = new AtomicInteger(1);
    private String threadlNamePrefix = "";
    public MyThreadFactory(String threadlNamePrefix) {
        this.threadlNamePrefix = threadlNamePrefix;
    }
    public Thread newThread(Runnable runnable) {
        return new Thread(runnable, threadlNamePrefix + threadNumber.getAndIncrement());
    }
}

Now, we’ll use our custom factory MyThreadFactory to set the thread name and pass it to the ExecutorService:

MyThreadFactory myThreadFactory = new MyThreadFactory("MyCustomThread-");
ExecutorService executorService = Executors.newFixedThreadPool(3, myThreadFactory);
for (int i = 0; i < 5; i++) {
    executorService.execute(() -> System.out.println(Thread.currentThread().getName()));
}

Finally, when we run the program, we can see our custom thread name printed for threads of ExecutorService:

MyCustomThread-1
MyCustomThread-2
MyCustomThread-2
MyCustomThread-3
MyCustomThread-1

2.2. Using BasicThreadFactory From Apache Commons

BasicThreadFactory from commons-lang3 implements the ThreadFactory interface that provides configuration options for the threads, which helps set the thread name.

First, let’s add the commons-lang3 dependency to our project:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.14.0</version>
</dependency>

Next, we create the BasicThreadFactory with our custom name. After that, we create the ExecutorService with our factory:

BasicThreadFactory factory = new BasicThreadFactory.Builder()
  .namingPattern("MyCustomThread-%d").priority(Thread.MAX_PRIORITY).build();
ExecutorService executorService = Executors.newFixedThreadPool(3, factory);
for (int i = 0; i < 5; i++) {
    executorService.execute(() -> System.out.println(Thread.currentThread().getName()));
}

Here, we can see the namingPattern() method takes the name pattern for the thread name.

Finally, let’s run the program to see our custom thread name printed:

MyCustomThread-1
MyCustomThread-2
MyCustomThread-2
MyCustomThread-3
MyCustomThread-1

2.3. Using ThreadFactoryBuilder From Guava

ThreadFactoryBuilder from Guava also provides options for customizing the threads it creates.

First, let’s add the guava dependency to our project:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>33.2.0-jre</version>
</dependency>

Next, we create the ThreadFactory with our custom name, and pass it to the ExecutorService:

ThreadFactory namedThreadFactory = new ThreadFactoryBuilder()
  .setNameFormat("MyCustomThread-%d").build();
ExecutorService executorService = Executors.newFixedThreadPool(3, namedThreadFactory);
for (int i = 0; i < 5; i++) {
    executorService.execute(() -> System.out.println(Thread.currentThread().getName()));
}

Here, we can see setNameFormat() takes the name pattern for the thread name.

Finally, when we run the program, we can see our custom thread names printed:

MyCustomThread-0
MyCustomThread-1
MyCustomThread-2
MyCustomThread-2
MyCustomThread-1

These are some of the ways we can name our threads when working with ExecutorService in Java, offering flexibility based on our application’s requirements.

3. Conclusion

In this article, we’ve learned about different ways of naming threads and thread pools in Java’s ExecutorService.

First, we saw how the default name is set. Later, we customized the thread name using our custom ThreadFactory, and ThreadFactory of different APIs like Apache Commons and Guava.

As always, the example code for this article is available over on GitHub.

↧

Consumer Acknowledgments and Publisher Confirms with RabbitMQ

July 8, 2024, 10:10 am

≫ Next: IncompatibleClassChangeError in Java

≪ Previous: Naming Executor Service Threads and Thread Pool in Java

1. Overview

In this tutorial, we’ll learn how to ensure message publication to a RabbitMQ broker with publisher confirmations. Then, we’ll see how to tell the broker we successfully consumed a message with consumer acknowledgments.

2. Scenario

In simple applications, we often overlook explicit confirmation mechanisms when using RabbitMQ, relying instead on basic message publishing to a queue and automatic message acknowledgment upon consumption. However, despite RabbitMQ’s robust infrastructure, errors can still occur, necessitating a means to double-check message delivery to the broker and confirm successful message consumption. This is where publisher confirms, and consumer acknowledgments come into play, providing a safety net.

3. Waiting for Publisher Confirms

Even without errors in our application, a published message can end up lost. For instance, it can get lost in transit due to an obscure network error. To circumvent that, AMQP provides transaction semantics to guarantee messages aren’t lost. However, this comes at a significant cost. Since transactions are heavy, the time to process messages can increase significantly, especially at large volumes.

Instead, we’ll employ the confirm mode, which, despite introducing some overhead, is faster than a transaction. This mode instructs the client and the broker to initiate a message count. Subsequently, the client verifies this count using the delivery tag sent back by the broker with the corresponding number. This process ensures the secure storage of messages for subsequent distribution to consumers.

To enter confirm mode, we need to call this once on our channel:

channel.confirmSelect();

Confirmation can take time, especially for durable queues since there’s an IO delay. So, RabbitMQ waits for confirmations asynchronously but provides synchronous methods to use in our application:

Channel.waitForConfirms() — Blocks execution until all messages since the last call are ACK’d (acknowledged) or NACK’d (rejected) by the broker.
Channel.waitForConfirms(timeout) — This is the same as above, but we can limit the wait to a millisecond value. Otherwise, we’ll get a TimeoutException.
Channel.waitForConfirmsOrDie() — This one throws an exception if any message has been NACK’d since the last call. This is useful if we can’t tolerate that any messages are lost.
Channel.waitForConfirmsOrDie(timeout) — Same as above, but with a timeout.

3.1. Publisher Setup

Let’s start with a regular class to publish messages. We’ll only receive a channel and a queue to connect to:

class UuidPublisher {
    private Channel channel;
    private String queue;
    public UuidPublisher(Channel channel, String queue) {
        this.channel = channel;
        this.queue = queue;
    }
}

Then, we’ll add a method for publishing String messages:

public void send(String message) throws IOException {
    channel.basicPublish("", queue, null, message.getBytes());
}

When we send messages this way, we risk losing them during transit, so let’s include some code to ensure that the broker safely receives our messages.

3.2. Starting Confirm Mode on a Channel

We’ll start by modifying our constructor to call confirmSelect() on the channel at the end. This is necessary so we can use the “wait” methods on our channel:

public UuidPublisher(Channel channel, String queue) throws IOException {
    // ...
    this.channel.confirmSelect();
}

If we try to wait for confirmations without entering confirm mode, we’ll get an IllegalStateException. Then, we’ll choose one of the synchronous wait() methods and call it after publishing a message using our send() method. Let’s go with waiting with a timeout so we can ensure we’ll never wait forever:

public boolean send(String message) throws Exception {
    channel.basicPublish("", queue, null, message.getBytes());
    return channel.waitForConfirms(1000);
}

Returning true means the broker successfully received the message. This works well if we’re sending a few messages.

3.3. Confirming Published Messages in Batches

Since confirming messages takes time, we shouldn’t wait for confirmation after each publication. Instead, we should send a bunch of them before waiting for confirmation. Let’s modify our method to receive a list of messages and only wait after sending all of them:

public void sendAllOrDie(List<String> messages) throws Exception {
    for (String message : messages) {
        channel.basicPublish("", queue, null, message.getBytes());
    }
    channel.waitForConfirmsOrDie(1000);
}

This time, we’re using waitForConfirmsOrDie() because a false return with waitForConfirms() would mean the broker NACK’d an unknown number of messages. While this ensures we’ll get an exception if any of the messages are NACK’d, we can’t tell which failed.

4. Leveraging Confirm Mode to Guarantee Batch Publishing

When using confirm mode, it’s also possible to register a ConfirmListener on our channel. This listener takes two callback handlers: one for successful deliveries and another for broker failures. This way, we can implement a mechanism to ensure no message is left behind. We’ll start with a method that adds this listener to our channel:

private void createConfirmListener() {
    this.channel.addConfirmListener(
      (tag, multiple) -> {
        // ...
      }, 
      (tag, multiple) -> {
        // ...
      }
    );
}

In the callbacks, the tag parameter refers to the message’s sequential delivery tag, while multiple indicates whether this confirms various messages. The tag parameter will point to the latest confirmed tag in this case. Conversely, if the last callback was a NACK, all messages with a delivery tag greater than the latest NACK callback tag are also confirmed.

To coordinate these callbacks, we’ll keep unconfirmed messages in a ConcurrentSkipListMap. We’ll put our pending messages there, using its tag number as the key. This way, we can call headMap() and get a view of all previous messages up to the tag we’re receiving now:

private ConcurrentNavigableMap<Long, PendingMessage> pendingDelivery = new ConcurrentSkipListMap<>();

The callback for confirmed messages will remove all messages up to tag from our map:

(tag, multiple) -> {
    ConcurrentNavigableMap<Long, PendingMessage> confirmed = pendingDelivery.headMap(tag, true);
    confirmed.clear();
}

The headMap() will contain a single item if multiple is false and more than one otherwise. Consequently, we don’t need to check whether we receive a confirmation for multiple messages.

4.1. Implementing a Retry Mechanism for Rejected Messages

We’ll implement a retry mechanism for the callbacks for rejected messages. Also, we’ll include a maximum number of retries to avoid a situation where we retry forever. Let’s start with a class that’ll hold the current number of tries for a message and a simple method to increment this counter:

public class PendingMessage {
    private int tries;
    private String body;
    public PendingMessage(String body) {
        this.body = body;
    }
    public int incrementTries() {
        return ++this.tries;
    }
    // standard getters
}

Now, let’s use it to implement our callback. We start by getting a view of the rejected messages, then remove any items that have exceeded the maximum number of tries:

(tag, multiple) -> {
    ConcurrentNavigableMap<Long, PendingMessage> failed = pendingDelivery.headMap(tag, true);
    failed.values().removeIf(pending -> {
        return pending.incrementTries() >= MAX_TRIES;
    });
    // ...
}

Then, if we still have pending messages, we send them again. This time, we’ll also remove the message if an unexpected error occurs in our app:

if (!pendingDelivery.isEmpty()) {
    pendingDelivery.values().removeIf(message -> {
        try {
            channel.basicPublish("", queue, null, message.getBody().getBytes());
            return false;
        } catch (IOException e) {
            return true;
        }
    });
}

4.2. Putting It All Together

Finally, we can create a new method that sends messages in a batch but can detect rejected messages and try to send them again. We have to call getNextPublishSeqNo() on our channel to find out our message tag:

public void sendOrRetry(List<String> messages) throws IOException {
    createConfirmListener();
    for (String message : messages) {
        long tag = channel.getNextPublishSeqNo();
        pendingDelivery.put(tag, new PendingMessage(message));
        channel.basicPublish("", queue, null, message.getBytes());
    }
}

We create the listener before publishing messages; otherwise, we won’t receive confirmations. This will create a cycle of receiving callbacks until we’ve successfully sent or retried all messages.

5. Sending Consumer Delivery Acknowledgments

Before we look into manual acknowledgments, let’s start with an example without them. When using automatic acknowledgments, a message is considered successfully delivered as soon as the broker fires it to a consumer. Let’s see what a simple example looks like:

public class UuidConsumer {
    private String queue;
    private Channel channel;
    // all-args constructor
    public void consume() throws IOException {
        channel.basicConsume(queue, true, (consumerTag, delivery) -> {
            // processing...
        }, cancelledTag -> {
            // logging...
        });
    }
}

Automatic acknowledgments are activated when passing true to basicConsume() via the autoAck parameter. Despite being fast and straightforward, this is unsafe because the broker discards the message before we process it. So, the safest option is to deactivate it and send a manual acknowledgment with basickAck() on the channel, guaranteeing the message is successfully processed before it exits the queue:

channel.basicConsume(queue, false, (consumerTag, delivery) -> {
    long deliveryTag = delivery.getEnvelope().getDeliveryTag();
    // processing...
    channel.basicAck(deliveryTag, false);
}, cancelledTag -> {
    // logging...
});

In its simplest form, we acknowledge each message after processing it. We use the same delivery tag we received to acknowledge the consumption. Most importantly, to signal individual acknowledgments, we must pass false to basicAck(). This can be pretty slow, so let’s see how to improve it.

5.1. Defining Basic QoS on a Channel

Usually, RabbitMQ will push messages as soon as they’re available. We’ll set essential Quality of Service settings on our channel to avoid that. So, let’s include a batchSize parameter in our constructor and pass it to basicQos() on our channel, so only this amount of messages is prefetched:

public class UuidConsumer {
    // ...
    private int batchSize;
    public UuidConsumer(Channel channel, String queue, int batchSize) throws IOException {
        // ...
        this.batchSize = batchSize;
        channel.basicQos(batchSize);
    }
}

This helps keep messages available to other consumers while we process what we can.

5.2. Defining an Acknowledgement Strategy

Instead of sending an ACK to every message we process, we can improve performance by sending one ACK every time we reach our batch size. For a more complete scenario, let’s include a simple processing method. We’ll consider the message as processed if we can parse the message as a UUID:

private boolean process(String message) {
    try {
        UUID.fromString(message);
        return true;
    } catch (IllegalArgumentException e) {
        return false;
    }
}

Now, let’s modify our consume() method with a basic skeleton for sending batch acknowledgments:

channel.basicConsume(queue, false, (consumerTag, delivery) -> {
    String message = new String(delivery.getBody(), "UTF-8");
    long deliveryTag = delivery.getEnvelope().getDeliveryTag();
    if (!process(message)) {
        // ...
    } else if (deliveryTag % batchSize == 0) {
        // ...
    } else {
        // ...
    }
}

We’ll NACK the message if we can’t process it and check if we reached the batch size to ACK pending processed messages. Otherwise, we’ll store the delivery tag of the pending ACK so it’s sent in a later iteration. We’ll store that in a class variable:

private AtomicLong pendingTag = new AtomicLong();

5.3. Rejecting Messages

We reject messages if we don’t want or can’t process them; when rejecting, we can re-queue. Re-queueing is useful, for instance, if we’re over capacity and want another consumer to take it instead of telling the broker to discard it. We have two methods for this:

channel.basicReject(deliveryTag, requeue) — rejects a single message, with the option to re-queue or discard.
channel.basicNack(deliveryTag, multiple, requeue) — same as above, but with the option to reject in batches. Passing true to multiple will reject every message since the last ACK up to the current delivery tag.

Since we’re rejecting messages individually, we’ll use the first option. We’ll send it and reset the variable if there’s a pending ACK. Finally, we reject the message:

if (!process(message, deliveryTag)) {
    if (pendingTag.get() != 0) {
        channel.basicAck(pendingTag.get(), true);
        pendingTag.set(0);
    }
    channel.basicReject(deliveryTag, false);
}

5.4. Acknowledging Messages In Batches

Since delivery tags are sequential, we can use the modulo operator to check if we’ve reached our batch size. If we have, we send an ACK and reset the pendingTag. This time, passing true to the “multiple” parameter is essential so the broker knows we’ve successfully processed all messages up to and including the current delivery tag:

else if (deliveryTag % batchSize == 0) {
    channel.basicAck(deliveryTag, true);
    pendingTag.set(0);
} else {
    pendingTag.set(deliveryTag);
}

Otherwise, we just set the pendingTag to check it in another iteration. Additionally, sending multiple acknowledgments for the same tag will result in a “PRECONDITION_FAILED – unknown delivery tag” error from RabbitMQ.

It’s important to note that when sending ACKs with the multiple flag, we have to consider scenarios where we’ll never reach the batch size because there are no more messages to process. One option is to keep a watcher thread that periodically checks if there are pending ACKs to send.

6. Conclusion

In this article, we’ve explored the functionalities of publisher confirms and consumer acknowledgments in RabbitMQ, which are crucial for ensuring data safety and robustness in distributed systems.

Publisher confirmations allow us to verify successful message transmission to the RabbitMQ broker, reducing the risk of message loss. Consumer acknowledgments enable controlled and resilient message processing by confirming message consumption.

Through practical code examples, we’ve seen how to implement these features effectively, providing a foundation for building reliable messaging systems.

As always, the source code is available over on GitHub.

↧

IncompatibleClassChangeError in Java

July 8, 2024, 10:13 am

≫ Next: Counting an Occurrence in an Array

≪ Previous: Consumer Acknowledgments and Publisher Confirms with RabbitMQ

1. Overview

In this article, we’ll explore the IncompatibleClassChangeError in Java, a runtime error that occurs when the JVM detects a class change that is incompatible with the previously loaded class.

We’ll delve into its causes with examples and effective strategies for resolving it.

2. The IncompatibleClassChangeError Class in Java

The IncompatibleClassChangeError is a type of linkage Error in Java. Linkage errors usually indicate an issue with one or many dependent classes.

IncompatibleClassChangeError is a subclass of LinkageError and is raised when there is an incompatible change to the class definition of one or more dependent classes.

It should be noted that this is a subclass of Error and hence we shouldn’t try to catch these errors as it signifies an abnormality in the application or runtime.

Let’s try to simulate an IncompatibleClassChangeError in a program to understand it better.

3. Generating the Error

Let’s try to emulate a scenario which causes the IncompatibleClassChangeError.

3.1. Preparing Libraries

We start by creating a simple library which has a parent class Dinosaur and a child class Coelophysis which extends the Dinosaur:

public class Dinosaur {
    public void species(String sp) {
        if(sp == null) {
            System.out.println("I am a generic Dinosaur");
        } else {
            System.out.println(sp);
        }
    }
}
public class Coelophysis extends Dinosaur {
    public void mySpecies() {
        species("My species is Coelophysis of the Triassic Period");
    }
    public static void main(String[] args) {
        Coelophysis coelophysis = new Coelophysis();
        coelophysis.mySpecies();
    }
}

We should notice that the species() method in the parent class is non-static.

3.2. Generating a JAR From the Library

Once this is done, we run the mvn package and generate a jar file from this project.

If we create an instance of the Coelophysis class and call the species() method, it would work correctly and generate the desired output:

➜ javac Coelophysis.java
➜ java Coelophysis
My species is Coelophysis of the Triassic Period

3.3. Creating a Second Version of the Library

Next, we create another library which is similar but has a slightly different version of the parent class Dinosaur including a static species() method:

public class Dinosaur {
    public Dinosaur() {
    }
    public static void species(String sp) {
        if (sp == null) {
            System.out.println("I am a generic Dinosaur");
        } else {
            System.out.println(sp);
        }
    }
}

We create a jar for this project as well and we import both of them into our client library using the Maven system import command:

<dependency>
    <groupId>org.dinosaur</groupId>
    <artifactId>dinosaur</artifactId>
    <version>2</version>
    <scope>system</scope>
    <systemPath>${project.basedir}/src/main/java/com/baeldung/incompatibleclasschange/dinosaur-1.jar</systemPath>
</dependency>

3.4. Generating the Error

Now, when we call the Coelophysis class by passing the modified version as a classpath dependency, we get the error:

➜  java -cp dinosaur-2:dinosaur-1 Coelophysis
Exception in thread "main" java.lang.IncompatibleClassChangeError: Expecting non-static method 'void Dinosaur.species(java.lang.String)'
	at Coelophysis.mySpecies(Coelophysis.java:3)
	at Coelophysis.main(Coelophysis.java:8)

4. Common Causes of IncompatibleClassChangeError

IncompatibleClassChangeError in Java occurs when there is a binary incompatibility between classes, often caused by changes in the definition of a dependent class. Let’s walk through some common scenarios which might result in the error.

4.1. Changes to the Class Definition of a Dependent Class or Binary

Let’s consider a sub-class-parent class scenario and a change is done in some of the fields of the dependent subclass. The change can be the changing of a non-static non-private field or a method to a static one. In such a scenario the parent class generates an IncompatibleClassChangeError exception at runtime.

This happens because of the disruption introduced to the consistency expected by the JVM at runtime.

We can observe a similar behaviour with the following changes in a dependent file:

A non-final field becomes static
A class becomes an interface, and vice-versa
A non-constant field becomes non-static
Something changed in the signature of a method in the dependent classes

4.2. Changes in Inheritance Patterns

The JVM might also throw the exception when there is a change in the inheritance pattern of a sub-class which is prohibited. This includes scenarios such as implementing an interface without adding the overridden implementations of the required abstract methods, or wrongly implementing a class etc.

4.3. Different Versions of the Same Dependency in the Classpath

Let’s consider that we’re using Maven for project dependency management and have included two libraries A and B in our classpath by defining them in the pom.xml. However, both of these libraries might depend on different versions of the same third library C.

Therefore, both of these libraries try to pull different versions of the library C into the classpath which differ slightly in structure.

5. Fixing the IncompatibleClassChangeError Exception

Now that we’ve understood what causes the error, let’s see how we can fix and avoid it.

Whenever a dependent library or binary changes, we should recompile the client code against it to understand the compatibility. We have to ensure that compile time class definitions match with the run time class definitions. Maintaining backward binary compatibility is therefore very crucial to ensure that dependent client applications don’t break.

Modern IDEs like IntelliJ already check for changing dependencies in the classpath and warn for incompatible changes.

Tools like Maven also generate a complete dependency graph of all its dependencies and highlight the incompatible or breaking changes in the pom.xml. Furthermore, performing a clean build automatically regenerates sources of all the dependencies which helps in keeping this exception away.

We can also use build tools such as Maven to ensure that duplicate or conflicting versions of the same dependency aren’t present in the classpath. It’s also good practice to continually remove stale class files from the target folder to ensure the latest class files are always present for execution.

6. Conclusion

In this tutorial, we discussed the IncompatibleClassChangeError and highlighted the critical importance of maintaining consistent class structures between compile-time and runtime.

We also discussed ways this error might be generated in an application and how we can effectively prevent this error.

As always, the code for the examples is available over on GitHub.

↧

Counting an Occurrence in an Array

July 8, 2024, 10:16 am

≫ Next: Testing Quarkus With Citrus

≪ Previous: IncompatibleClassChangeError in Java

1. Overview

A common programming problem is counting the occurrences or frequencies of distinct elements in a list. It can be helpful, for example, when we want to know the highest or lowest occurrence or a specific occurrence of one or more elements.

In this tutorial, we’ll look at some common solutions to counting occurrences in an array.

2. Count Occurrences

We need to understand this problem’s constraints to approach a solution.

2.1. Constraints

First, we must understand whether we count:

occurrences of objects
occurrences of primitives

If we’re dealing with numbers, we need to know the range of values we want to count. This might be a small fixed range of values, or it could be the entire numeric range, with values appearing sparsely.

2.2. How to Approach a Solution

With primitives such as int or char, we can use a fixed-size array of counters to store the frequencies of each value. This works but has limitations due to the maximum size of the counting array that can be in memory. Furthermore, extending this to objects wouldn’t work.

Using maps is a more adaptable solution to the problem.

3. Using a Counters Array

Let’s use a counters array for positive integers in a fixed range.

3.1. Count Positive Integers in a Fixed Range

So, let’s say we have values 0…(n-1) and want to know their occurrences:

static int[] countOccurrencesWithCounter(int[] elements, int n) {
    int[] counter = new int[n];
    for (int i = 0; i < elements.length - 1; i++) {
        counter[elements[i]]++;
    }
    return counter;
}

The algorithm is straightforward and loops over the array while incrementing the counter’s position of a specific element.

Let’s look at the algorithm complexity:

Time complexity: O(n) for accessing the array
Space complexity: O(n) depending on the size of the input array

Let’s look at a unit test where we find the occurrence of the number 3 in the first ten numbers:

int[] counter = countOccurrencesWithCounter(new int[] { 2, 3, 1, 1, 3, 4, 5, 6, 7, 8 }, 10);
assertEquals(2, counter[3]);

Another interesting application of counters is for characters in a string. For example, we can look at counting frequencies in a string permutation.

3.2. Other Use Cases and Limitations

Although an array’s maximum size is quite large, it’s usually not a good practice to use it for frequencies unless we know it’s a finite set we are counting.

It wouldn’t be easy to use it for a sparse range of values. This applies, for example, to fractional numbers, where finding a suitable range to store the decimals would be difficult.

For negative numbers, we can use an offset and store the negative in the counter. For example, if we have a k offset representing the [-k, k] values range, we can create a counter array:

int[] counter = new int[(k * 2) + 1];

Then, we can store an occurrence at the value + k position.

This approach has limitations due to the range of values that might not fit the actual values for which we want to store the frequencies. Moreover, we can’t use this data structure to count object occurrences.

4. Use Maps

Maps are more appropriate for counting occurrences. Furthermore, the size of a map is limited only by the JVM memory available, making it suitable for storing a large number of entries.

Like a counter, we increment the frequency, but this time, it’s related to a specific map key. A map allows us to work with objects. Therefore, we can use generics to create a map with a generic key:

static <T> Map<T, Integer> countOccurrencesWithMap(T[] elements) {
    Map<T, Integer> counter = new HashMap<>();
    for (T element : elements) {
        counter.merge(element, 1, Integer::sum);
    }
    return counter;
}

Let’s look at the algorithm complexity:

Time complexity: O(n) for accessing the array
Space complexity: O(m) where m is the number of distinct values within the original array

Let’s look at a test to find occurrences for integers. With maps, we can also search for a negative integer:

Map<Integer, Integer> counter = countOccurrencesWithMap(new Integer[] { 2, 3, 1, -1, 3, 4, 5, 6, 7, 8, -1 });
assertEquals(2, counter.get(-1));

Likewise, we can count string occurrences:

Map<String, Integer> counter = countOccurrencesWithMap(new String[] { "apple", "orange", "banana", "apple" });
assertEquals(2, counter.get("apple"));

We could also look at Guava Multiset to store frequencies relative to specific keys.

5. Use Java 8 Streams

From Java 8, we can use streams to collect the count of the occurrences grouped by the distinct elements. It works just like the previous example with maps. However, using streams allows us to use functional programming and take advantage of parallel execution when possible.

Let’s look at the case where we count occurrences of integers:

static <T> Map<T, Long> countOccurrencesWithStream(T[] elements) {
    return Arrays.stream(elements)
      .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}

Notably, when we use arrays, we must first convert to a stream.

The algorithm complexity would be similar to using maps:

Time complexity: O(n) for accessing the array
Space complexity: O(m) where m is the number of distinct values of the array

The advantage of using streams might be related to the speed of execution. However, we still need to iterate over all the input elements and use space to create a map of occurrences.

Let’s look at a test for integers:

Map<Integer, Long> counter = countOccurrencesWithStream(new int[] { 2, 3, 1, -1, 3, 4, 5, 6, 7, 8, -1 });
assertEquals(2, counter.get(-1));

Likewise, we look at a test for strings:

Map<String, Long> counter = countOccurrencesWithStream(new String[] { "apple", "orange", "banana", "apple" });
assertEquals(2, counter.get("apple"));

6. Conclusion

In this article, we saw solutions for counting occurrences in an array. The most adaptable solution is to use a map, simple or created with a stream. However, if we have primitive integers in a fixed range, we can use counters.

As always, the code presented in this article is available over on GitHub.

↧

Testing Quarkus With Citrus

July 8, 2024, 10:19 am

≫ Next: Blazing Fast Serialization Using Apache Fury

≪ Previous: Counting an Occurrence in an Array

1. Overview

Quarkus, the Supersonic Subatomic Java, promises to deliver small artifacts, extremely fast boot time, and lower time-to-first-request. We can understand it as a framework that integrates Java standard technologies (Jakarta EE, MicroProfile, and others) and enables building a standalone application that can be deployed in any container runtime, easily fulfilling the requirements of cloud-native applications.

In this article, we’ll learn how to implement integration tests with Citrus, a framework written by Christoph Deppisch – Principal Software Engineer at Red Hat.

2. The Purpose of Citrus

The applications we develop typically don’t run isolated but communicate with other systems, such as databases, messaging systems, or online services. When testing our application, we could do this in an isolated manner by mocking the corresponding objects. But we also might want to test the communication of our application with external systems. That’s where Citrus comes into play.

Let’s take a closer look at the most common interaction scenarios.

2.1. HTTP

Our web application may have an HTTP-based API (e.g., a REST API). Citrus can act as an HTTP client that calls our application’s HTTP API and verifies the response (like REST-assured does). Our application might also be a consumer of another application’s HTTP API. Citrus could run an embedded HTTP server and act as a mock in this case:

2.2. Kafka

In this case, our application is a Kafka consumer. Citrus can act as a Kafka producer to send a record to a topic so that our application gets triggered by consuming the record. Our application also might be a Kafka producer.

Citrus can act as a consumer to verify the messages that our application sent to the topic during the test. Additionally, Citrus provides an embedded Kafka server to be independent of any external server during the test:

2.3. Relational Databases

Our application uses a relational database. Citrus can act as a JDBC client that verifies that the database has the expected state. Furthermore, Citrus provides a JDBC driver and an embedded database mock that can be instrumented to return test-case-specific results and verify the executed database queries:

2.4. Further Support

Citrus supports further external systems, such as REST, SOAP, JMS, Websocket, Mail, FTP, and Apache Camel endpoints. We can find a full listing in the documentation.

3. Citrus Tests With Quarkus

Quarkus has extensive support for writing integration tests, including mocking, test profiles, and testing native executables. Citrus provides the QuarkusTest runtime, a Quarkus Test Resource to extend Quarkus-based tests by including Citrus capabilities.

Let’s have a sample where we use the most common technologies – a REST service provider, that stores data in a relational database and sends a message to Kafka when a new item is created. For Citrus, it doesn’t matter how we implement this in detail. Our application is a black box, and only the external systems and the communication channels are crucial:

3.1. Maven Dependencies

To use Citrus in our Quarkus-based project, we can use the citrus-bom:

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.citrusframework</groupId>
            <artifactId>citrus-bom</artifactId>
            <version>4.2.1</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
<dependencies>
    <dependency>
        <groupId>org.citrusframework</groupId>
        <artifactId>citrus-quarkus</artifactId>
        <scope>test</scope>
    </dependency>
</dependencies>

We can optionally add further modules, depending on the technologies used:

<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-openapi</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-http</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-validation-json</artifactId>
</dependency>
<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-validation-hamcrest</artifactId>
    <version>${citrus.version}</version>
</dependency>
<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-sql</artifactId>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.citrusframework</groupId>
    <artifactId>citrus-kafka</artifactId>
    <scope>test</scope>
</dependency>

3.2. Application Configuration

There isn’t any global Quarkus configuration that needs to be done for Citrus. There’s just a warning in the logs about split packages, that we can avoid by adding this line to the application.properties file:

%test.quarkus.arc.ignored-split-packages=org.citrusframework.*

3.3. Test Setup for the Boundary

A typical test with Citrus would have the elements:

a @CitrusSupport annotation that adds the Quarkus Test Resource to extend the Quarkus-based test processing
a @CitrusConfiguration annotation that includes one or multiple configuration classes for Citrus, that are used for the global configuration of communication endpoints and dependency injection into the test classes
fields to get endpoints and other Citrus-provided objects injected

So, if we want to test the boundary, we would need an HTTP client to send a request to our application and verify the response. First, we need to create the Citrus configuration class:

public class BoundaryCitrusConfig {
    public static final String API_CLIENT = "apiClient";
    @BindToRegistry(name = API_CLIENT)
    public HttpClient apiClient() {
        return http()
          .client()
          .requestUrl("http://localhost:8081")
          .build();
    }
}

Then, we create the test class:

@QuarkusTest
@CitrusSupport
@CitrusConfiguration(classes = {
    BoundaryCitrusConfig.class
})
class CitrusTests {
    @CitrusEndpoint(name = BoundaryCitrusConfig.API_CLIENT)
    HttpClient apiClient;
}

As a convention, we could skip the name attributes of the annotations if the declaring method and the field in the test class have the same name. This might be shorter, but prone to errors due to missing compiler checks.

3.4. Testing the Boundary

For writing the test, we need to know that Citrus has a declarative concept, defining the components:

A Test Context is an object that provides test variables and functions, that, among others, replace dynamic content in message payloads and headers.
A Test Action is an abstraction for each step in the test. This could be one interaction, like sending a request or receiving a response, including validations and verifications. It could also be just a simple output or a timer. Citrus provides a Java DSL, and XML as an alternative to define test definitions with Test Actions. We can find a list of pre-defined Test Actions in the documentation.
A Test Action Builder is used to define and build the Test Action. Citrus uses the Builder Pattern here.
A Test Action Runner uses the Test Action Builder to build the Test Action. Then, it executes the Test Action, providing the Test Context. For BBD style, we can use a GherkinTestActionRunner.

We can get the Test Action Runner injected too. The code shows a test that sends an HTTP POST request to http://localhost:8081/api/v1/todos with a JSON body and expects to receive a response with a 201 status code:

@CitrusResource
GherkinTestActionRunner t;
@Test
void shouldReturn201OnCreateItem() {
    t.when(
            http()
              .client(apiClient)
              .send()
              .post("/api/v1/todos")
              .message()
              .contentType(MediaType.APPLICATION_JSON)
              .body("{\"title\": \"test\"}")
    );
    t.then(
            http()
              .client(apiClient)
              .receive()
              .response(HttpStatus.CREATED)
    );
}

The body is written directly as a JSON string. Alternatively, we could use a Data Dictionary as shown in this sample.

For message validation, we have multiple possibilities. For example, using JSON-Path in combination with Hamcrest, we can extend the then block:

t.then(
        http()
          .client(apiClient)
          .receive()
          .response(HttpStatus.CREATED)
          .message()
          .type(MessageType.JSON)
          .validate(
                    jsonPath()
                      .expression("$.title", "test")
                      .expression("$.id", is(notNullValue()))
          )
);

Unfortunately, only Hamcrest is supported. For AssertJ, there’s been a GitHub issue opened in 2016.

3.5. Testing the Boundary Based on OpenAPI

We can also send requests based on an OpenAPI definition. This automatically validates the response concerning property and header constraints declared in the OpenAPI schema.

First, we need to load the OpenAPI schema. For example, if we have a YML file in our project, we can do this by defining an OpenApiSpecification field:

final OpenApiSpecification apiSpecification = OpenApiSpecification.from(
        Resources.create("classpath:openapi.yml")
);

We could also read the OpenApi from the running Quarkus application, if available:

final OpenApiSpecification apiSpecification = OpenApiSpecification.from(
    "http://localhost:8081/q/openapi"
);

For the test, we can refer to the operationId to send a request or to verify a response:

t.when(
        openapi()
          .specification(apiSpecification)
          .client(apiClient)
          .send("createTodo") // operationId
);
t.then(
        openapi()
          .specification(apiSpecification)
          .client(apiClient)
          .receive("createTodo", HttpStatus.CREATED)
);

This generates a request including the necessary body by creating random values. Currently, it’s impossible to use explicitly defined values for headers, parameters, or bodies (see this GitHub-Issue). Also, there is a bug when generating random date values. We could avoid this at least for optional fields by skipping random values:

@BeforeEach
void setup() {
    this.apiSpecification.setGenerateOptionalFields(false);
    this.apiSpecification.setValidateOptionalFields(false);
}

In this case, we must also disable strict validation, which will fail because the service returns optional fields. (see this GitHub-Issue) We could do this by using JUnit Pioneer. For this, we add the junit-pioneer dependency:

<dependency>
    <groupId>org.junit-pioneer</groupId>
    <artifactId>junit-pioneer</artifactId>
    <version>2.2.0</version>
    <scope>test</scope>
</dependency>

Then, we can add the @SystemProperty annotation to our test class before the @CitrusSupport annotation:

@SetSystemProperty(
    key = "citrus.json.message.validation.strict",
    value = "false"
)

3.6. Testing the Database Access

When we invoke the create operation of our REST API, it should store the new item in the database. To evaluate this, we can query the database for the newly created ID.

First, we need a data source. We can get this easily injected from Quarkus:

@Inject
DataSource dataSource;

Then, we need to extract the ID of the newly created item from the response body and store it as a Test Context variable:

t.when(
        http()
          .client(apiClient)
          .send()
          .post("/api/v1/todos")
          .message()
          .contentType(MediaType.APPLICATION_JSON)
          .body("{\"title\": \"test\"}")
);
t.then(
        http()
          .client(apiClient)
          .receive()
          .response(HttpStatus.CREATED)
          // save new id to test context variable "todoId"
          .extract(fromBody().expression("$.id", "todoId"))
);

We can now check the database with a query that uses the variable:

t.then(
        sql()
          .dataSource(dataSource)
          .query()
          .statement("select title from todos where id=${todoId}")
          .validate("title", "test")
);

3.7. Testing the Messaging

When we invoke the create operation of our REST API, it should send the new item to a Kafka topic. To evaluate this, we can subscribe to the topic and consume the message.

For this, we need a Citrus endpoint:

public class KafkaCitrusConfig {
    public static final String TODOS_EVENTS_TOPIC = "todosEvents";
    @BindToRegistry(name = TODOS_EVENTS_TOPIC)
    public KafkaEndpoint todosEvents() {
        return kafka()
          .asynchronous()
          .topic("todo-events")
          .build();
    }
}

Then, we want Citrus to inject this endpoint into our test:

@QuarkusTest
@CitrusSupport
@CitrusConfiguration(classes = {
    BoundaryCitrusConfig.class,
    KafkaCitrusConfig.class
})
class MessagingCitrusTest {
    @CitrusEndpoint(name = KafkaCitrusConfig.TODOS_EVENTS_TOPIC)
    KafkaEndpoint todosEvents;
    // ...
}

After sending and receiving the request as we saw earlier, we can then subscribe to the topic and consume and validate the message:

t.and(
        receive()
          .endpoint(todosEvents)
          .message()
          .type(MessageType.JSON)
          .validate(
                    jsonPath()
                      .expression("$.title", "test")
                      .expression("$.id", "${todoId}")
          )
);

3.8. Mocking the Servers

Citrus can mock external systems. This can be helpful to avoid the need for these external systems for testing purposes and to directly verify the messages sent to these systems and mock the responses instead of validating the state of the system after message processing.

In the case of Kafka, the Quarkus Dev Services feature runs a Docker container with a Kafka server. We could use the Citrus mock instead. We then have to disable the Dev Services feature in the application.properties file:

%test.quarkus.kafka.devservices.enabled=false

Then, we configure the Citrus mock server:

public class EmbeddedKafkaCitrusConfig {
    private EmbeddedKafkaServer kafkaServer;
    @BindToRegistry
    public EmbeddedKafkaServer kafka() {
        if (null == kafkaServer) {
            kafkaServer = new EmbeddedKafkaServerBuilder()
              .kafkaServerPort(9092)
              .topics("todo-events")
              .build();
        }
        return kafkaServer;
    }
    // stop the server after the test
    @BindToRegistry
    public AfterSuite afterSuiteActions() {
        return afterSuite()
          .actions(context -> kafka().stop())
          .build();
    }
}

We could then activate the mock server by just referring to this configuration class as already known:

@QuarkusTest
@CitrusSupport
@CitrusConfiguration(classes = {
    BoundaryCitrusConfig.class,
    KafkaCitrusConfig.class,
    EmbeddedKafkaCitrusConfig.class
})
class MessagingCitrusTest {
    // ...
}

We can also find mock servers for external HTTP services and relational databases.

4. Challenges

Writing tests with Citrus also has challenges. The API isn’t always intuitive. Integration for AssertJ is missing. Citrus throws exceptions instead of AssertionErrors when validation fails, resulting in confusing test reports. The online documentation is extensive, but code samples contain Groovy code, and sometimes XML. There’s a repository with Java code samples in GitHub, that might help. Javadocs are incomplete.

It seems that the integration into the Spring Framework is in focus. The documentation often refers to Citrus configuration in Spring. The citrus-jdbc module depends on Spring Core and Spring JDBC, which we’ll get as unnecessary transitive dependencies for our tests unless we exclude them.

5. Conclusion

In this tutorial, we’ve learned how to implement Quarkus tests with Citrus. Citrus provides many features to test the communication of our application with external systems. This also includes mocking these systems for the test. It’s well-documented, but the included code samples match for other use cases than integration into Quarkus. Fortunately, there is a GitHub repository that contains samples with Quarkus.

As usual, all the code implementations are available over on GitHub.

↧

Blazing Fast Serialization Using Apache Fury

July 8, 2024, 10:22 am

≫ Next: Check if a List Contains Elements With Certain Properties in Hamcrest

≪ Previous: Testing Quarkus With Citrus

1. Overview

In this article, we’ll learn about Apache Fury, an incubating project under the Apache Software Foundation. This library promises blazing-fast performance, robust capabilities, and multi-language support.

We’ll examine some of the project’s basic features and compare its performance against other frameworks.

2. Serialization With Apache Fury

Serialization is a critical process in software development that enables efficient data exchange between systems. It allows the application to share the state and communicate through it.

Apache Fury is a serialization library designed to address the limitations of existing libraries and frameworks. It offers a high-performance, easy-to-use library for serializing and deserializing data across various programming languages. Built to handle complex data structures and large data volumes efficiently. The key features offered by Apache Fury are:

High Performance: Apache Fury is optimized for speed, ensuring minimal overhead during serialization and deserialization processes.
Cross-Language Support: Supports multiple programming languages, making it versatile for different development environments (Java/Python/C++/Golang/JavaScript/Rust/Scala/TypeScript).
Complex Data Structures: Capable of handling intricate data models with ease.
Compact Serialization: Produces compact serialized data, reducing storage and transmission costs.
GraalVM Native Image Support: AOT compilation serialization is needed for the GraalVM native image, and no reflection/serialization JSON config is necessary.

3. Code Sample

First, we need to add the required dependency to our project so we can start interacting with the Fury library APIs:

<dependency>
    <groupId>org.apache.fury</groupId>
    <artifactId>fury-core</artifactId>
    <version>0.5.0</version>
</dependency>

To try Fury for the first time, let’s create a simple structure using different data types and at least one nested object so we can simulate an everyday use case in an actual application. To do that, we’ll need to create a UserEvent class to represent the state of our user event which later will be serialized:

public class UserEvent implements Serializable {
    private final String userId;
    private final String eventType;
    private final long timestamp;
    private final Address address;
    // Constructor and getters
}

To introduce a bit more complexity to our event object, let’s define a nested structure for the address using a Java POJO named Address:

public class Address implements Serializable {
    private final String street;
    private final String city;
    private final String zipCode;
    // Constructor and getters
}

An important aspect is that Fury doesn’t require the class to implement the Serializable interface. However, later, we’ll use the Java native serializer, which does need it. Next, we should initiate the Fury context.

3.1. Fury Setup

Now, we’ll see how to set up Fury so we can start using it:

class FurySerializationUnitTest {
    @Test
    void whenUsingFurySerialization_thenGenerateByteOutput() {
        Fury fury = Fury.builder()
          .withLanguage(Language.JAVA)
          .withAsyncCompilation(true)
          .build();
        fury.register(UserEvent.class);
        fury.register(Address.class);
        
        // ...
}

In this code snippet, we create the Fury object and define Java as the protocol for use, as it’s optimal for this case. However, as mentioned before, Fury supports cross-language serialization (using Language.XLANG for example). Moreover, we set the withAsyncCompilation option to true, which allows the compilation of serializers in the background using the JIT (Just In Time) and our application to continue processing other tasks without waiting for the compilation to complete. It uses a non-blocking compilation to implement this optimization.

Once the Fury is set up, we need to register the classes that may be serialized. This is important as Fury can use a pre-generated schema or metadata to streamline the serialization and deserialization process. That eliminates the need for runtime reflection, which can be slow and resource-intensive.

Also, registering classes helps reduce the overhead associated with dynamically determining the class structure during serialization and deserialization. That can lead to faster processing times. Finally, this is relevant from the secure perspective as we create a safelist of classes that are allowed for serialization and deserialization.

Fury’s registry prevents unintentional or malicious serialization of unexpected classes, which could lead to security vulnerabilities such as deserialization attacks. It also mitigates the risk of exploiting vulnerabilities in the serialization mechanism or within the classes themselves. Deserialization of arbitrary or unexpected classes can lead to code execution vulnerabilities.

3.2. Using Fury

Now that Fury is configured, we can use this object to perform multiple serialization and deserialization operations. It offers many APIs with lower and high-level access to the serialization process nuances, but in our case, we can call the following methods:

@Test
void whenUsingFurySerialization_thenGenerateByteOutput() { 
    //... setup
    byte[] serializedData = fury.serialize(event);
    UserEvent temp = (UserEvent) fury.deserialize(serializedData);
    //...
}

We need this to execute these two basic operations using the library and leverage its great potential. Nonetheless, how could we compare it to other well-known serialization frameworks used in Java? Next, we’ll run some experiments to make such a comparison.

4. Comparing Apache Fury

First of all, this tutorial doesn’t intend to perform an extensive benchmark between Apache Fury and other frameworks. Having said that, to contextualize the kind of performance the project aims to achieve, let’s see how different libraries and frameworks perform against our sample use case. For our comparison, we used Java Native Serialization, Avro Serialization, and Protocol Buffers.

To compare each framework, our test measures the time it takes each of them to serialize and deserialize 100K of our events:

As observed, Fury and Protobuf performed exceptionally in our experiment. In the beginning, Protobuf outperforms Fury, but later, Fury seems to perform better, most likely due to the nature of the JIT compiler. However, both have performed outstandingly, as we can observe. Finally, let’s have a look at the size of output generated for such frameworks:

When it comes to the serialization process’s output, Protobuf seems to have slightly better performance, producing a smaller output. However, the difference between Fury and it looks pretty small, so we can say their performance is also comparable.

Once again, that may not be true for all cases. This isn’t an extensive benchmark but rather a comparison based on our use case. Nonetheless, Apache Fury offers great performance and simple-to-use capabilities, which is the project’s aim.

5. Conclusion

In this tutorial, we looked at Fury, a serialization library that offers blaze-fast, cross-language, powered by JIT (just-in-time compilation) and zero-copy serialization and deserialization capabilities. Moreover, we saw how it performs compared to other well-known serialization frameworks used in the Java ecosystems.

Regardless of which library or framework is faster/more efficient, Fury’s ability to handle complex data structures and provide cross-language support makes it an excellent choice for modern applications requiring high-speed data processing. By incorporating Apache Fury, developers can ensure their applications perform serialization and deserialization tasks with minimal overhead, enhancing overall efficiency and performance.

As usual, all code samples used in this article are available over on GitHub.

↧

Check if a List Contains Elements With Certain Properties in Hamcrest

July 9, 2024, 8:21 am

≫ Next: How to Handle Default Values in Avro

≪ Previous: Blazing Fast Serialization Using Apache Fury

1. Overview

When writing unit tests in Java, particularly with the JUnit framework, we often need to verify that elements within a list have specific properties.

Hamcrest, a widely used Matcher library, provides straightforward and expressive ways to perform these checks.

In this tutorial, we’ll explore how to check if a list contains elements with specific properties using JUnit and Hamcrest’s Matchers.

2. Setting up Hamcrest and Examples

Before we set up examples, let’s quickly add Hamcrest dependency to our pom.xml:

<dependency>
    <groupId>org.hamcrest</groupId>
    <artifactId>hamcrest</artifactId>
    <version>2.2</version>
    <scope>test</scope>
</dependency>

We can check the artifact’s latest version in Maven Central.

Now, let’s create a simple POJO class:

public class Developer {
    private String name;
    private int age;
    private String os;
    private List<String> languages;
 
    public Developer(String name, int age, String os, List<String> languages) {
        this.name = name;
        this.age = age;
        this.os = os;
        this.languages = languages;
    }
    // ... getters are omitted
}

As the code shows, the Developer class holds some properties to describe a developer, such as name, age, the operating system (os), and programming languages (languages) the developer mainly uses.

Next, let’s create a list of Developer instances:

private static final List<Developer> DEVELOPERS = List.of(
    new Developer("Kai", 28, "Linux", List.of("Kotlin", "Python")),
    new Developer("Liam", 26, "MacOS", List.of("Java", "C#")),
    new Developer("Kevin", 24, "MacOS", List.of("Python", "Go")),
    new Developer("Saajan", 22, "MacOS", List.of("Ruby", "Php", "Typescript")),
    new Developer("Eric", 27, "Linux", List.of("Java", "C"))
);

We’ll take the DEVELOPERS list as an example to address how to check whether elements contain specific properties using JUnit and Hamcrest.

3. Using hasItem() and hasProperty()

Hamcrest provides a rich set of convenient Matchers. We can use the hasProperty() Matcher in combination with the hasItem() Matcher:

assertThat(DEVELOPERS, hasItem(hasProperty("os", equalTo("Linux"))));

This example shows how to check if at least one element’s os is “Linux“.

We can pass a different property name to hasProperty() to verify another property, for instance:

assertThat(DEVELOPERS, hasItem(hasProperty("name", is("Kai"))));

In the example above, we use is(), an alias for equalTo(), to check if the list has an element with name equal to “Kai”.

Of course, in addition to equalTo() and is(), we can use other Matchers in hasProperty() to verify elements’ properties in different ways:

assertThat(DEVELOPERS, hasItem(hasProperty("age", lessThan(28))));
assertThat(DEVELOPERS, hasItem(hasProperty("languages", hasItem("Go"))));

The assertion statements read like natural language, such as: “assertThat the DEVELOPERS list hasItem which hasProperty whose name is age and value is lessThan 28″.

4. The anyOf() and allOf() Matchers

hasProperty() is convenient for checking one single property. We can also check if elements in a list satisfy multiple properties by combining multiple hasProperty() calls using anyOf() and allOf().

If any Matcher inside anyOf() is matched, the whole anyOf() is satisfied. Next, let’s understand it through an example:

assertThat(DEVELOPERS, hasItem(
  anyOf(
      hasProperty("languages", hasItem("C")),
      hasProperty("os", is("Windows"))) // <-- No dev has the OS "Windows"
));

As the example shows, although no element in the DEVELOPERS list has os equal to “Windows”, the assertion passes since we have an element (“Eric”)’s languages containing “C“.

Therefore, anyOf() corresponds to “OR” logic: if there is any element whose languages contains “C” OR os is “Windows“.

Conversely, allOf() performs “AND” logic. Next, let’s see an example:

assertThat(DEVELOPERS, hasItem(
  allOf(
      hasProperty("languages", hasItem("C")),
      hasProperty("os", is("Linux")),
      hasProperty("age", greaterThan(25)))
));

In the test above, we check whether at least one element in DEVELOPERS simultaneously satisfies the three hasProperty() Matchers within allOf().

Since “Eric” ‘s properties pass the three hasProperty() Matcher checks, the test passes.

Next, let’s make some changes to the test:

assertThat(DEVELOPERS, not(hasItem( // <-- not() matcher
  allOf(
      hasProperty("languages", hasItem("C#")),
      hasProperty("os", is("Linux")),
      hasProperty("age", greaterThan(25)))
)));

This time, we don’t have a match since no element can pass all three hasProperty() Matchers.

5. Using JUnit’s assertTrue() and Stream.anyMatch()

Harmcrest offers handy ways to verify that elements within a list have specific properties. Alternatively, we can use the standard JUnit assertTrue() assertion and anyMatch() from Java Stream API to perform the same checks.

The anyMatch() method returns true if any element in the stream passes the check function. Next, let’s see some examples:

assertTrue(DEVELOPERS.stream().anyMatch(dev -> dev.getOs().equals("Linux")));
assertTrue(DEVELOPERS.stream().anyMatch(dev -> dev.getAge() < 28));
assertTrue(DEVELOPERS.stream().anyMatch(dev -> dev.getLanguages().contains("Go")));

It’s worth noting that when we use a lambda to examine an element’s properties, we can directly call getter methods to get their values. This can be easier than Hamcrest’s hasProperty() Matcher, which requires property names as literal Strings.

Of course, if it’s required, we can easily extend the lambda expression to do complex checks:

assertTrue(DEVELOPERS.stream().anyMatch(dev -> dev.getLanguages().contains("C") && dev.getOs().equals("Linux")));

The test above shows a lambda expression to check multiple properties of an element in the stream.

6. Conclusion

In this article, we’ve explored various ways to assert if a list contains elements with certain properties using JUnit and Hamcrest.

Whether working with simple properties or combining multiple conditions, Hamcrest provides a powerful toolkit for validating the properties of elements within collections. Furthermore, Hamcrest Matchers make our tests more readable and expressive, enhancing the clarity and maintainability of our test code.

Alternatively, we can perform this kind of check using JUnit assertTrue() and anyMatch() from Stream API.

As always, the complete source code for the examples is available over on GitHub.

↧

How to Handle Default Values in Avro

July 9, 2024, 8:24 am

≫ Next: Upload Files With GraphQL in Java

≪ Previous: Check if a List Contains Elements With Certain Properties in Hamcrest

1. Introduction

In this tutorial, we’ll explore the Apache Avro data serialization/deserialization framework. What’s more, we’ll learn how to approach schema definition with default values used when we initialize and serialize objects.

2. What Is Avro?

Apache Avro is a more powerful alternative to classic ways of formatting data. Generally, it uses JSON for the schema definition. Furthermore, the most popular uses cases for Avro involve Apache Kafka, Hive or Impala. Avro comes in handy for handling large volumes of data in real-time (write-intensive, big data operations).

Let’s think of Avro as being defined by a schema and the schema is written in JSON.

The advantages of Avro are:

data is compressed automatically (less CPU resources needed)
data is fully typed (we’ll see later how we declare the type of each property)
schema accompanies the data
documentation is embedded in the schema
thanks to JSON, data can be read in any language
safe schema evolution

3. Avro Setup

First, let’s add the appropriate Avro Maven dependency:

<dependencies> 
    <dependency> 
        <groupId>org.apache.avro</groupId> 
        <artifactId>avro</artifactId> 
        <version>1.11.3</version> 
    </dependency> 
</dependencies>

Next, we’ll configure avro-maven-plugin that helps us with code generation:

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.avro</groupId>
            <artifactId>avro-maven-plugin</artifactId>
            <version>1.11.3</version>
            <configuration>
                <sourceDirectory>${project.basedir}/src/main/java/com/baeldung/avro/</sourceDirectory>
                <outputDirectory>${project.basedir}/src/main/java/com/baeldung/avro/</outputDirectory>
                <stringType>String</stringType>
            </configuration>
            <executions>
                <execution>
                    <phase>generate-sources</phase>
                    <goals>
                        <goal>schema</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Now let’s define an example schema, which Avro uses to generate the example class. The schema is a JSON formatted object definition, stored in a text file. We must ensure the file has the .avsc extension. In our example, we’ll name this file car.avsc.

Here’s what the initial schema looks like:

{
    "namespace": "generated.avro",
     "type": "record",
     "name": "Car",
     "fields": [
         {  "name": "brand",
            "type": "string"
         },
         {  "name": "number_of_doors",
            "type": "int"
         },
         {  "name": "color",
            "type": "string"
         }
     ]
}

Let’s take a look at the schema in a bit more detail. The namespace is where the generated record class will be added. A record is a special type of Java class that helps us model plain data aggregates with less boilerplate code than normal classes. Overall, Avro supports six kinds of complex types: record, enum, array, map, union and fixed.

In our example, type is a record. name is the name of the class and fields are its attributes and their types. Here’s where we handle the default value.

4. Avro Default Values

An important aspect of Avro is that a field can be made optional by using a union, in which case it defaults to null, or it can be assigned a particular default value when it hasn’t been initialized. So, we either have an optional field that will default to null or that field is initialized with the default value we specify in the schema.

Now, let’s look at the new schema that configures the default values:

{
    "namespace": "generated.avro",
    "type": "record",
    "name": "Car",
    "fields": [
        {   "name": "brand",
            "type": "string",
            "default": "Dacia"
         },
        {   "name": "number_of_doors",
            "type": "int",
            "default": 4
        },
        {   "name": "color",
            "type": ["null", "string"],
            "default": null
        }
    ]
}

We see that there’s two types of attributes: String and int. We also notice that attributes have an addition to type, default. This allows the types to not be initialized and it defaults to the specified value.

In order for the default values to be used when we initialize the object, we must use the newBuilder() method of the Avro generated class. As we can see in the test below, we use the builder design pattern and through it we initialize the mandatory attributes.

Let’s also look at the test:

@Test
public void givenCarJsonSchema_whenCarIsSerialized_thenCarIsSuccessfullyDeserialized() throws IOException {
    Car car = Car.newBuilder()
      .build();
    SerializationDeserializationLogic.serializeCar(car);
    Car deserializedCar = SerializationDeserializationLogic.deserializeCar();
    assertEquals("Dacia", deserializedCar.getBrand());
    assertEquals(4, deserializedCar.getNumberOfDoors());
    assertNull(deserializedCar.getColor());
}

We’ve instantiated a new car object and only set the color attribute, which is also the only one mandatory. Checking the attributes, we see that brand is initialized to Dacia, number_of_doors to 4 (both were assigned the default values from the schema) and color defaulted to null.

Furthermore, adding the optional syntax (union) to the field forces it to take that value. Therefore, even if the field is int, the default value will be null. This can be useful when we want to make sure the field hasn’t been set:

{ 
    "name": "number_of_wheels", 
    "type": ["null", "int"], 
    "default": null 
}

5. Conclusion

Avro has been created to address the need for efficient serialization in the context of big data processing.

In this article, we’ve taken a look at Apache’s data serialization/deserialization framework, Avro. In addition, we’ve gone over its advantages and setup. However, most importantly, we’ve learned how to configure the schema to accept default values.

As always, the code is available over on GitHub.

↧

Upload Files With GraphQL in Java

July 9, 2024, 8:27 am

≫ Next: Testcontainers JDBC Support

≪ Previous: How to Handle Default Values in Avro

1. Introduction

GraphQL has transformed the way developers interact with APIs, offering a streamlined, powerful alternative to traditional REST approaches.

However, handling file uploads with GraphQL in Java, particularly within a Spring Boot application, requires a bit of setup due to the nature of GraphQL’s handling of binary data. In this tutorial, we’ll go through setting up file uploads using GraphQL in a Spring Boot application.

2. File Upload Using GraphQL vs. HTTP

In the realm of developing GraphQL APIs with Spring Boot, adhering to best practices often involves leveraging standard HTTP requests for handling file uploads.

By managing file uploads via dedicated HTTP endpoints and then linking these uploads to GraphQL mutations through identifiers like URLs or IDs, developers can effectively minimize the complexity and processing overhead typically associated with embedding file uploads directly within GraphQL queries. Such an approach not only simplifies the upload process but also helps avoid potential issues related to file size constraints and serialization demands, contributing to a more streamlined and scalable application structure.

Nonetheless, certain situations necessitate directly incorporating file uploads within GraphQL queries. In such scenarios, integrating file upload capabilities into GraphQL APIs demands a tailored strategy that carefully balances user experience with application performance. Therefore, we need to define a specialized scalar type for handling uploads. Additionally, this method involves the deployment of specific mechanisms for validating input and mapping uploaded files to the correct variables within GraphQL operations. Furthermore, uploading a file requires the multipart/form-data content type of a request body, so we need to implement a custom HttpHandler.

3. File Upload Implementation In GraphQL

This section outlines a comprehensive approach to integrating file upload functionality within a GraphQL API using Spring Boot. Through a series of steps, we’ll explore the creation and configuration of essential components designed to handle file uploads directly through GraphQL queries.

In this guide, we’ll utilize a specialized starter package to enable GraphQL support in a Spring Boot Application:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-graphql</artifactId>
    <version>3.3.0</version>
</dependency>

3.1. Custom Upload Scalar Type

First, we define a custom scalar type, Upload, within our GraphQL schema. The introduction of the Upload scalar type extends GraphQL’s capability to handle binary file data, enabling the API to accept file uploads. The custom scalar serves as a bridge between the client’s file upload requests and the server’s processing logic, ensuring a type-safe and structured approach to handling file uploads.

Let’s define it in the src/main/resources/file-upload/graphql/upload.graphqls file:

scalar Upload
type Mutation {
    uploadFile(file: Upload!, description: String!): String
}
type Query {
    getFile: String
}

In the definition above, we also have the description parameter to illustrate how to pass additional data along with a file.

3.2. UploadCoercing Implementation

In the context of GraphQL, coercing refers to the process of converting a value from one type to another. This is particularly important when dealing with custom scalar types, like our Upload type. In that case, we need to define how values associated with this type are parsed (converted from input) and serialized (converted to output).

The UploadCoercing implementation is crucial for managing these conversions in a way that aligns with the operational requirements of file uploads within a GraphQL API.

Let’s define the UploadCoercing class to handle the Upload type correctly:

public class UploadCoercing implements Coercing<MultipartFile, Void> {
    @Override
    public Void serialize(Object dataFetcherResult) {
        throw new CoercingSerializeException("Upload is an input-only type and cannot be serialized");
    }
    @Override
    public MultipartFile parseValue(Object input) {
        if (input instanceof MultipartFile) {
            return (MultipartFile) input;
        }
        throw new CoercingParseValueException("Expected type MultipartFile but was " + input.getClass().getName());
    }
    @Override
    public MultipartFile parseLiteral(Object input) {
        throw new CoercingParseLiteralException("Upload is an input-only type and cannot be parsed from literals");
    }
}

As we can see, this involves converting an input value (from a query or mutation) into a Java type that our application can understand and work with. For the Upload scalar, this means taking the file input from the client and ensuring it’s correctly represented as a MultipartFile in our server-side code.

3.3. MultipartGraphQlHttpHandler: Handling Multipart Requests

GraphQL, in its standard specification, is designed to handle JSON-formatted requests. This format works well for typical CRUD operations but falls short when dealing with file uploads, which are inherently binary data and not easily represented in JSON. The multipart/form-data content type is the standard for submitting forms and uploading files over HTTP, but handling these requests requires parsing the request body differently than a standard GraphQL request.

By default, GraphQL servers do not understand or handle multipart requests directly, often leading to a 404 Not Found response for such requests. Therefore, we need to implement a handler that bridges that gap, ensuring that the multipart/form-data content type is correctly handled by our application.

Let’s implement this class:

public ServerResponse handleMultipartRequest(ServerRequest serverRequest) throws ServletException {
    HttpServletRequest httpServletRequest = serverRequest.servletRequest();
    Map<String, Object> inputQuery = Optional.ofNullable(this.<Map<String, Object>>deserializePart(httpServletRequest, "operations", MAP_PARAMETERIZED_TYPE_REF.getType())).orElse(new HashMap<>());
    final Map<String, Object> queryVariables = getFromMapOrEmpty(inputQuery, "variables");
    final Map<String, Object> extensions = getFromMapOrEmpty(inputQuery, "extensions");
    Map<String, MultipartFile> fileParams = readMultipartFiles(httpServletRequest);
    Map<String, List<String>> fileMappings = Optional.ofNullable(this.<Map<String, List<String>>>deserializePart(httpServletRequest, "map", LIST_PARAMETERIZED_TYPE_REF.getType())).orElse(new HashMap<>());
    fileMappings.forEach((String fileKey, List<String> objectPaths) -> {
        MultipartFile file = fileParams.get(fileKey);
        if (file != null) {
            objectPaths.forEach((String objectPath) -> {
                MultipartVariableMapper.mapVariable(objectPath, queryVariables, file);
            });
        }
    });
    String query = (String) inputQuery.get("query");
    String opName = (String) inputQuery.get("operationName");
    Map<String, Object> body = new HashMap<>();
    body.put("query", query);
    body.put("operationName", StringUtils.hasText(opName) ? opName : "");
    body.put("variables", queryVariables);
    body.put("extensions", extensions);
    WebGraphQlRequest graphQlRequest = new WebGraphQlRequest(serverRequest.uri(), serverRequest.headers().asHttpHeaders(), body, this.idGenerator.generateId().toString(), LocaleContextHolder.getLocale());
    if (logger.isDebugEnabled()) {
        logger.debug("Executing: " + graphQlRequest);
    }
    Mono<ServerResponse> responseMono = this.graphQlHandler.handleRequest(graphQlRequest).map(response -> {
        if (logger.isDebugEnabled()) {
            logger.debug("Execution complete");
        }
        ServerResponse.BodyBuilder builder = ServerResponse.ok();
        builder.headers(headers -> headers.putAll(response.getResponseHeaders()));
        builder.contentType(selectResponseMediaType(serverRequest));
        return builder.body(response.toMap());
    });
    return ServerResponse.async(responseMono);
}

The handleMultipartRequest method within the MultipartGraphQlHttpHandler class processes multipart/form-data requests. At first, we extract the HTTP request from the server request object, which allows access to the multipart files and other form data included in the request. Then, we attempt to deserialize the “operations” part of the request, which contains the GraphQL query or mutation, along with the “map” part, which specifies how to map files to the variables in the GraphQL operation.

After deserializing these parts, the method proceeds to read the actual file uploads from the request, using the mappings defined in the “map” to associate each uploaded file with the correct variable in the GraphQL operation.

3.4. Implementing File Upload DataFetcher

As we have the uploadFile mutation for uploading files, we need to implement specific logic to accept a file and additional metadata from the client and save the file.
In GraphQL, every field within the schema is linked to a DataFetcher, a component responsible for retrieving the data associated with that field.

While some fields might require specialized DataFetcher implementations capable of fetching data from databases or other persistent storage systems, many fields simply extract data from in-memory objects. This extraction often relies on the field names and utilizes standard Java object patterns to access the required data.

Let’s implement our implementation of the DataFetcher interface:

@Component
public class FileUploadDataFetcher implements DataFetcher<String> {
    private final FileStorageService fileStorageService;
    public FileUploadDataFetcher(FileStorageService fileStorageService) {
        this.fileStorageService = fileStorageService;
    }
    @Override
    public String get(DataFetchingEnvironment environment) {
        MultipartFile file = environment.getArgument("file");
        String description = environment.getArgument("description");
        String storedFilePath = fileStorageService.store(file, description);
        return String.format("File stored at: %s, Description: %s", storedFilePath, description);
    }
}

When the get method of this data fetcher is invoked by the GraphQL framework it retrieves the file and an optional description from the mutation’s arguments. It then calls the FileStorageService to store the file, passing along the file and its description.

4. Spring Boot Configuration for GraphQL Upload Support

The integration of file upload into a GraphQL API using Spring Boot is a multi-faceted process that requires the configuration of several key components.

Let’s define the configuration according to our implementation:

@Configuration
public class MultipartGraphQlWebMvcAutoconfiguration {
    private final FileUploadDataFetcher fileUploadDataFetcher;
    public MultipartGraphQlWebMvcAutoconfiguration(FileUploadDataFetcher fileUploadDataFetcher) {
        this.fileUploadDataFetcher = fileUploadDataFetcher;
    }
    @Bean
    public RuntimeWiringConfigurer runtimeWiringConfigurer() {
        return (builder) -> builder
          .type(newTypeWiring("Mutation").dataFetcher("uploadFile", fileUploadDataFetcher))
          .scalar(GraphQLScalarType.newScalar()
            .name("Upload")
            .coercing(new UploadCoercing())
            .build());
    }
    @Bean
    @Order(1)
    public RouterFunction<ServerResponse> graphQlMultipartRouterFunction(
      GraphQlProperties properties,
      WebGraphQlHandler webGraphQlHandler,
      ObjectMapper objectMapper
    ) {
        String path = properties.getPath();
        RouterFunctions.Builder builder = RouterFunctions.route();
        MultipartGraphQlHttpHandler graphqlMultipartHandler = new MultipartGraphQlHttpHandler(webGraphQlHandler, new MappingJackson2HttpMessageConverter(objectMapper));
        builder = builder.POST(path, RequestPredicates.contentType(MULTIPART_FORM_DATA)
          .and(RequestPredicates.accept(SUPPORTED_MEDIA_TYPES.toArray(new MediaType[]{}))), graphqlMultipartHandler::handleMultipartRequest);
        return builder.build();
    }
}

RuntimeWiringConfigurer plays a pivotal role in this setup, granting us the ability to link the GraphQL schema’s operations such as mutations and queries with the corresponding data fetchers. This linkage is crucial for the uploadFile mutation, where we apply the FileUploadDataFetcher to handle the file upload process.

Furthermore, the RuntimeWiringConfigurer is instrumental in defining and integrating the custom Upload scalar type within the GraphQL schema. This scalar type, associated with the UploadCoercing, enables the GraphQL API to understand and correctly handle file data, ensuring that files are properly serialized and deserialized as part of the upload process.

To address the handling of incoming requests, particularly those carrying the multipart/form-data content type necessary for file uploads, we employ the RouterFunction bean definition. This function is adept at intercepting these specific types of requests, allowing us to process them through the MultipartGraphQlHttpHandler. This handler is key to parsing multipart requests, extracting files, and mapping them to the appropriate variables in the GraphQL operation, thereby facilitating the execution of file upload mutations. We also apply the correct order by using the @Order(1) annotation.

5. Testing File Upload Using Postman

Testing file upload functionality in a GraphQL API via Postman requires a non-standard approach, as the built-in GraphQL payload format doesn’t directly support multipart/form-data requests, which are essential for uploading files. Instead, we must construct a multipart request manually, mimicking the way a client would upload a file alongside a GraphQL mutation.
In the Body tab, the selection should be set to form-data. Three key-value pairs are required: operations, map, and the file variable with the key name according to the map value.

For the operations key, the value should be a JSON object that encapsulates the GraphQL query and variables, with the file part represented by null as a placeholder. The type for this part remains as Text.

{"query": "mutation UploadFile($file: Upload!, $description: String!) { uploadFile(file: $file, description: $description) }","variables": {"file": null,"description": "Sample file description"}}

Next, the map key requires a value that is another JSON object. This time, mapping the file variable to the form field containing the file. If we attach a file to the key 0, then the map would explicitly associate this key with the file variable in the GraphQL variables, ensuring the server correctly interprets which part of the form data contains the file. This value has the Text type as well.

{"0": ["variables.file"]}

Finally, we add a file itself with a key that matches its reference in the map object. In our case, we use 0 as the key for this value. Unlike the previous text values, the type for this part is File.

After executing the request, we should get a JSON response:

{
    "data": {
        "uploadFile": "File stored at: File uploaded successfully: C:\\Development\\TutorialsBaeldung\\tutorials\\uploads\\2023-06-21_14-22.bmp with description: Sample file description, Description: Sample file description"
    }
}

6. Conclusion

In this article, we’ve explored how to add file upload functionality to a GraphQL API using Spring Boot. We started by introducing a custom scalar type called Upload, which handles file data in GraphQL mutations.

We then implemented the MultipartGraphQlHttpHandler class to manage multipart/form-data requests, necessary for uploading files through GraphQL mutations. Unlike standard GraphQL requests that use JSON, file uploads need multipart requests to handle binary file data.

The FileUploadDataFetcher class processes the uploadFile mutation. It extracts and stores uploaded files and sends a clear response to the client about the file upload status.

Usually, it’s more efficient to use a plain HTTP request for file uploads and pass the resulting ID through a GraphQL query. However, sometimes directly using GraphQL for file uploads is necessary.

As always, code snippets are available over on GitHub.

↧

Testcontainers JDBC Support

July 9, 2024, 8:31 am

≫ Next: Using Amazon Athena With Spring Boot to Query S3 Data

≪ Previous: Upload Files With GraphQL in Java

1. Overview

In this short article, we’ll learn about the Testcontainers JDBC support, and we’ll compare two different ways of spinning up Docker containers in our tests.

Initially, we’ll manage the Testcontainer’s lifecycle programmatically. After that, we’ll simplify this setup through a single configuration property, leveraging the framework’s JDBC support.

2. Managing Testcontainer Lifecycle Manually

Testcontainers is a framework that provides lightweight disposable Docker containers for testing. We can use it to run tests against real services such as databases, message queues, or web services without mocks or external dependencies.

Let’s imagine we want to use Testcontainers to verify the interaction with our PostgreSQL database. Firstly, we’ll add the testcontainers dependencies to our pom.xml:

<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>testcontainers</artifactId>
    <version>1.19.8</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>postgresql</artifactId>
    <version>1.19.8</version>
    <scope>test</scope>
</dependency>

After that, we’ll have to manage the container’s lifecycle, following a few simple steps:

Create a container object
Start the container before all the tests
Configure the application to connect with the container
Stop the container at the end of the tests

We can implement these steps ourselves using JUnit5 and Spring Boot annotations such as @BeforeAll, @AfterAll, and @DynamicPropertyRegistry:

@SpringBootTest
class FullTestcontainersLifecycleLiveTest {
    static PostgreSQLContainer postgres = new PostgreSQLContainer("postgres:16-alpine")
      .withDatabaseName("test-db");
    @BeforeAll
    static void beforeAll() {
        postgres.start();
    }
    @DynamicPropertySource
    static void setProperties(DynamicPropertyRegistry registry) {
        registry.add("spring.datasource.url", postgres::getJdbcUrl);
        registry.add("spring.datasource.username", postgres::getUsername);
        registry.add("spring.datasource.password", postgres::getPassword);
    }
    @AfterAll
    static void afterAll() {
        postgres.stop();
    }
    // tests
}

Even though this solution allows us to customize specific lifecycle phases, it requires a complex setup. Luckily, the framework provides a convenient solution to launch containers and communicate with them through JDBC using minimal configuration.

3. Using the Testcontainers JDBC Driver

Testcontainers will automatically start a Docker container hosting our database when we use their JDBC driver. To do this, we need to update the JDBC URL for the test execution, and use the pattern: “jdbc:tc:<docker-image-name>:<image-tag>:///<database-name>”.

Let’s use this syntax to update spring.datasource.url in our test:

spring.datasource.url: jdbc:tc:postgresql:16-alpine:///test-db

Needless to say, this property can be defined in a dedicated configuration file or in the test itself through the @SpringBootTest annotation:

@SpringBootTest(properties =
  "spring.datasource.url: jdbc:tc:postgresql:16-alpine:///test-db"
)
class CustomTestcontainersDriverLiveTest {
    @Autowired
    HobbitRepository theShire;
    @Test
    void whenCallingSave_thenEntityIsPersistedToDb() {
        theShire.save(new Hobbit("Frodo Baggins"));
        assertThat(theShire.findAll())
          .hasSize(1).first()
          .extracting(Hobbit::getName)
          .isEqualTo("Frodo Baggins");
    }
}

As we can notice, we no longer have to manually handle the PostgreSQL container’s lifecycle. Testcontainers handle this complexity, allowing us to focus solely on the tests at hand.

4. Conclusion

In this brief tutorial, we explored different ways to spin up a Docker Container and connect to it via JDBC. Firstly, we manually created and started the container, and connected it with the application. This solution requires more boilerplate code but allows specific customizations. On the other hand, when we used the custom JDBC driver from Testcontainers, we achieved the same setup with just a single line of configuration.

As always, the complete code used in this article is available over on GitHub.

↧

Using Amazon Athena With Spring Boot to Query S3 Data

July 9, 2024, 8:34 am

≫ Next: Java IDEs in 2024 – Survey Results

≪ Previous: Testcontainers JDBC Support

1. Overview

We often store large amounts of data in Amazon S3, but analyzing this data can be challenging. Traditional methods require us to move the data or set up complex systems like a data warehouse.

Amazon Athena offers a simpler solution, allowing us to query our S3 data directly using SQL.

In this tutorial, we’ll explore using Amazon Athena to analyze data in our S3 buckets using Spring Boot. We’ll walk through the necessary configurations, execute Athena queries programmatically, and handle the results.

2. Understanding Amazon Athena

Amazon Athena is a serverless query service that allows us to perform ad-hoc queries on the data stored in our S3 buckets without setting up any infrastructure.

One of the key benefits of using Athena is that we only pay for the amount of data consumed while executing the query, making it a cost-effective solution for ad-hoc and occasional data analysis.

Athena also uses schema-on-read to translate our S3 data in-flight into a table-like structure. Specifically, this means we query our data without altering the source and without performing any extract, transform, and load (ETL) operations. The tables we define in Athena don’t contain the actual data like traditional databases. Instead, they store instructions on how to convert the source data for querying.

The data in our S3 buckets can originate from various AWS services, such as CloudTrail logs, VPC Flow Logs, and ALB Access Logs, or even custom data that we store in S3 in formats such as JSON, XML, Parquet, etc.

3. Setting up the Project

Before we use Amazon Athena, we’ll need to include the dependency for it and configure our application correctly.

3.1. Dependencies

Let’s start by adding the Amazon Athena dependency to our project’s pom.xml file:

<dependencies>
    <dependency>
        <groupId>software.amazon.awssdk</groupId>
        <artifactId>athena</artifactId>
        <version>2.26.0</version>
    </dependency>
</dependencies>

This dependency provides us with the AthenaClient and other related classes, which we’ll use to interact with the Athena service.

3.2. Defining Athena Configuration Properties

Now, to interact with the Athena service and execute queries, we need to configure our AWS credentials for authentication, the Athena database name to use for running our SQL queries, and the query result location, which is an S3 bucket where Athena stores the results of our queries.

We’ll store these properties in our project’s application.yaml file and use @ConfigurationProperties to map the values to a POJO, which our service layer references when interacting with Athena:

@Getter
@Setter
@Validated
@ConfigurationProperties(prefix = "com.baeldung.aws")
class AwsConfigurationProperties {
    @NotBlank
    private String accessKey;
    @NotBlank
    private String secretKey;
    @Valid
    private Athena athena = new Athena();
    @Getter
    @Setter
    public class Athena {
        @Nullable
        private String database = "default";
        @NotBlank
        private String s3OutputLocation;
    }
}

The s3OutputLocation field represents the S3 bucket location where Athena stores the results of our queries. This is necessary because Athena is serverless and doesn’t store any data itself. Instead, it performs the queries and writes the results to the specified S3 location, which our application can then read from.

We’ve also added validation annotations to ensure all the required properties are configured correctly. If any of the defined validations fail, it results in the Spring ApplicationContext failing to start up. This allows us to conform to the fail fast pattern.

Below is a snippet of our application.yaml file, which defines the required properties that will be mapped to our AwsConfigurationProperties class automatically:

com:
  baeldung:
    aws:
      access-key: ${AWS_ACCESS_KEY}
      secret-key: ${AWS_SECRET_KEY}
      athena:
        database: ${AMAZON_ATHENA_DATABASE}
        s3-output-location: ${AMAZON_ATHENA_S3_OUTPUT_LOCATION}

Accordingly, this setup allows us to externalize the Athena properties and easily access them in our application.

4. Configuring Athena in Spring Boot

Now that we’ve defined our properties, let’s reference them to configure the necessary beans for interacting with Athena.

4.1. Creating the AthenaClient Bean

The AthenaClient is the main entry point for interacting with the Athena service. We’ll create a bean to set it up:

@Bean
public AthenaClient athenaClient() {
    String accessKey = awsConfigurationProperties.getAccessKey();
    String secretKey = awsConfigurationProperties.getSecretKey();
    AwsBasicCredentials awsCredentials = AwsBasicCredentials.create(accessKey, secretKey);
    
    return AthenaClient.builder()
      .credentialsProvider(StaticCredentialsProvider.create(awsCredentials))
      .build();
}

Here, we create an instance of AthenaClient using the configured AWS credentials. This client is used to start query executions and retrieve results from the S3 bucket.

4.2. Defining the QueryExecutionContext Bean

Next, we need to tell Athena which database to use when running our SQL queries:

@Bean
public QueryExecutionContext queryExecutionContext() {
    String database = awsConfigurationProperties.getAthena().getDatabase();
    return QueryExecutionContext.builder()
      .database(database)
      .build();
}

We create a QueryExecutionContext bean and specify the database to be used for our queries. The database name is retrieved from our configuration properties, which defaults to the default database if not explicitly specified.

4.3. Setting up the ResultConfiguration Bean

Finally, we need to configure where Athena should store the results of our SQL queries:

@Bean
public ResultConfiguration resultConfiguration() {
    String outputLocation = awsConfigurationProperties.getAthena().getS3OutputLocation();
    return ResultConfiguration.builder()
      .outputLocation(outputLocation)
      .build();
}

It’s important to note that the S3 bucket we use to store query results should differ from the bucket containing our source data.

This separation prevents query results from being interpreted as additional source data, which would lead to unexpected query results. Moreover, Athena should have read-only access to the source bucket to maintain data integrity, with write permissions only granted on the bucket we’ve provisioned to store results.

5. Executing Athena Queries

With the necessary configuration in place, let’s look at how we can execute queries using Athena. We’ll create a QueryService class, autowiring all the beans we’ve created, and expose a single public execute() method that encapsulates the query execution logic.

5.1. Starting a Query Execution

First, we’ll use the AthenaClient instance to start query execution:

public <T> List<T> execute(String sqlQuery, Class<T> targetClass) {
    String queryExecutionId;
    try {
        queryExecutionId = athenaClient.startQueryExecution(query -> 
            query.queryString(sqlQuery)
              .queryExecutionContext(queryExecutionContext)
              .resultConfiguration(resultConfiguration)
        ).queryExecutionId();
    } catch (InvalidRequestException exception) {
        log.error("Invalid SQL syntax detected in query {}", sqlQuery, exception);
        throw new QueryExecutionFailureException();
    }
    // ...rest of the implementation in the upcoming sections
}

We provide the SQL query string, the QueryExecutionContext, and the ResultConfiguration when starting the query execution. The startQueryExecution() method returns a unique queryExecutionId that we’ll use to track the query’s status and retrieve the results.

The targetClass argument specifies the Java class to which we’ll be mapping the query results.

We also handle the InvalidRequestException that the Athena SDK throws if the provided SQL query contains syntax errors. We catch this exception, log the error message along with the invalid query, and throw a custom QueryExecutionFailureException.

5.2. Waiting for Query Completion

After starting the query execution, we need to wait for it to complete before attempting to retrieve the results:

private static final long WAIT_PERIOD = 30;
private void waitForQueryToComplete(String queryExecutionId) {
    QueryExecutionState queryState;
    do {
        GetQueryExecutionResponse response = athenaClient.getQueryExecution(request -> 
            request.queryExecutionId(queryExecutionId));
        queryState = response.queryExecution().status().state();
        switch (queryState) {
            case FAILED:
            case CANCELLED:
                String error = response.queryExecution().status().athenaError().errorMessage();
                log.error("Query execution failed: {}", error);
                throw new QueryExecutionFailureException();
            case QUEUED:
            case RUNNING:
                TimeUnit.MILLISECONDS.sleep(WAIT_PERIOD);
                break;
            case SUCCEEDED:
                queryState = QueryExecutionState.SUCCEEDED;
                return;
        }
    } while (queryState != QueryExecutionState.SUCCEEDED);
}

We create a private waitForQueryToComplete() method and periodically poll the query’s status using the getQueryExecution() method until it reaches the SUCCEEDED state.

If the query fails or is canceled, we log the error message and throw our custom QueryExecutionFailureException. If it’s queued or running, we wait for a short period before checking again.

We invoke the waitForQueryToComplete() method from our execute() method with the queryExecutionId we received from starting the query execution.

5.3. Processing Query Results

After the query execution completes successfully, we can retrieve the results:

GetQueryResultsResponse queryResult = athenaClient.getQueryResults(request -> 
    request.queryExecutionId(queryExecutionId));

The getQueryResults() method returns a GetQueryResultsResponse object containing the result set. We can process these results and convert them into instances of the class specified by the targetClass argument of our execute() method:

private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper().registerModule(new JsonOrgModule());
private <T> List<T> transformQueryResult(GetQueryResultsResponse queryResultsResponse, Class<T> targetClass) {
    List<T> response = new ArrayList<T>();
    List<Row> rows = queryResultsResponse.resultSet().rows();
    List<String> headers = rows.get(0).data().stream().map(Datum::varCharValue).toList();
    rows.stream()
      .skip(1)
      .forEach(row -> {
          JSONObject element = new JSONObject();
          List<Datum> data = row.data();
           
          for (int i = 0; i < headers.size(); i++) {
              String key = headers.get(i);
              String value = data.get(i).varCharValue();
              element.put(key, value);
          }
          T obj = OBJECT_MAPPER.convertValue(element, targetClass);
          response.put(obj);
      });
    return response;
}

Here, we extract the headers from the first row of the result set and then process each subsequent row, converting it into a JSONObject where the keys are the column names and the values are the corresponding cell values. We then use the ObjectMapper to convert each JSONObject into an instance of the specified target class, representing the domain model. These domain model objects are added to a list that is returned.

It’s important to note that our transformQueryResult() implementation is generic and works for all types of read queries, regardless of the table or domain model.

5.4 Executing SQL Queries With the execute() Method

With our execute() method fully implemented, we can now easily run SQL queries against our S3 data and retrieve the results as domain model objects:

String query = "SELECT * FROM users WHERE age < 25;";
User user = queryService.execute(query, User.class);
record User(Integer id, String name, Integer age, String city) {};

Here, we define a SQL query that selects all users younger than 25 years. We pass this query and the User class to our execute() method. The User class is a simple record representing the structure of the data we expect to retrieve.

The execute() method takes care of starting the query execution, waiting for its completion, retrieving the results, and transforming them into a list of User objects. This abstraction allows us to focus on the query and the domain model, without worrying about the underlying interactions with Athena.

5.5. Parameterized Statements With Athena

It’s important to note that when constructing SQL queries with user input, we should be cautious about the risk of SQL injection attacks. Athena supports parameterized statements, which allow us to separate the SQL query from the parameter values, providing a safer way to execute queries with user input. While we’ve used a raw SQL query here for demonstration purposes, using parameterized statements when building queries with user-supplied input is strongly recommended.

To use parameterized queries, we can modify our execute() method to accept an optional list of parameters:

public <T> List<T> execute(String sqlQuery, List<String> parameters, Class<T> targetClass) {
    // ... same as above
    
    queryExecutionId = athenaClient.startQueryExecution(query -> 
        query.queryString(sqlQuery)
          .queryExecutionContext(queryExecutionContext)
          .resultConfiguration(resultConfiguration)
          .executionParameters(parameters)
    ).queryExecutionId();
    
    // ... same as above
}

We’ve added a new parameters argument to the execute() method, which is a list of string values that will be used in the parameterized query. When starting the query execution, we pass these parameters using the executionParameters() method.

Let’s look at how we can use our updated execute() method:

public List<User> getUsersByName(String name) {
    String query = "SELECT * FROM users WHERE name = ?";
    return queryService.execute(query, List.of(name), User.class);
}

This example defines a SQL query with a placeholder ‘?’ for the name parameter. We pass the name value as a list containing a single element to the execute() method, along with the query and the target class.

6. Automating Database and Table Creation

To query our S3 data using Athena, we need to first define a database and a table that’ll map to the data stored in our S3 bucket. While we can create these manually using the AWS Management Console, it’s more convenient to automate this process as part of our application startup.

We’ll place our SQL scripts for setting up the necessary database and table in a new athena-init directory, which we’ll create inside the src/main/resources directory.

To execute these SQL scripts, we’ll create an AthenaInitializer class that implements the ApplicationRunner interface:

@Component
@RequiredArgsConstructor
class AthenaInitializer implements ApplicationRunner {
    private final QueryService queryService;
    private final ResourcePatternResolver resourcePatternResolver;
    @Override
    public void run(ApplicationArguments args) {
        Resource[] initScripts = resourcePatternResolver.getResources("classpath:athena-init/*.sql");
        for (Resource script : initScripts) {
            String sqlScript = FileUtils.readFileToString(script.getFile(), StandardCharsets.UTF_8);
            queryService.execute(sqlScript, Void.class);
        }
    }
}

Using constructor injection via Lombok, we inject instances of ResourcePatternResolver and QueryService that we created earlier.

We use the ResourcePatternResolver to locate all our SQL scripts in the athena-init directory. We then iterate over these scripts, read their contents using Apache Commons IO, and execute them using our QueryService.

We’ll first begin by creating a create-database.sql script to create a custom database:

CREATE DATABASE IF NOT EXISTS baeldung;

We create a custom database named baeldung if it doesn’t already exist. The database name used here can be configured in the application.yaml file, as we’ve seen earlier in the tutorial.

Similarly, to create a table named users in the baeldung database, we’ll create another script named create-users-table.sql with the following content:

CREATE EXTERNAL TABLE IF NOT EXISTS users (
  id INT,
  name STRING,
  age INT,
  city STRING
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://baeldung-athena-tutorial-bucket/';

This script creates an external table named users with columns corresponding to the fields in the JSON data that we’ll store in S3. We specify JsonSerDe as the row format and provide the S3 location where we’ll store our JSON files.

Significantly, to correctly query the data stored in S3 using Athena, it’s important to ensure that each JSON record is entirely on a single line of text with no spaces or line breaks between keys and values:

{"id":1,"name":"Homelander","age":41,"city":"New York"}
{"id":2,"name":"Black Noir","age":58,"city":"Los Angeles"}
{"id":3,"name":"Billy Butcher","age":46,"city":"London"}

7. IAM Permissions

Finally, for our application to function, we’ll need to configure some permissions for the IAM user configured in our app.

Our policy should configure Athena and S3 access:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowAthenaQueryExecution",
            "Effect": "Allow",
            "Action": [
                "athena:StartQueryExecution",
                "athena:GetQueryExecution",
                "athena:GetQueryResults"
            ],
            "Resource": "arn:aws:athena:region:account-id:workgroup/primary"
        },
        {
            "Sid": "AllowS3ReadAccessToSourceBucket",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::baeldung-athena-tutorial-bucket",
                "arn:aws:s3:::baeldung-athena-tutorial-bucket/*"
            ]
        },
        {
            "Sid": "AllowS3AccessForAthenaQueryResults",
            "Effect": "Allow",
            "Action": [
                "s3:GetBucketLocation",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::baeldung-athena-tutorial-results-bucket",
                "arn:aws:s3:::baeldung-athena-tutorial-results-bucket/*"
            ]
        },
        {
            "Sid": "AllowGlueCatalogAccessForAthena",
            "Effect": "Allow",
            "Action": [
                "glue:CreateDatabase",
                "glue:GetDatabase",
                "glue:CreateTable",
                "glue:GetTable"
            ],
            "Resource": [
                "arn:aws:glue:region:account-id:catalog",
                "arn:aws:glue:region:account-id:database/baeldung",
                "arn:aws:glue:region:account-id:table/baeldung/users"
            ]
        }
    ]
}

The IAM policy consists of four key statements to build the permissions required for our Spring Boot application. The AllowAthenaQueryExecution statement provides the necessary permissions to interact with Athena itself, including starting queries, checking their status, and retrieving results.

Then, the AllowS3ReadAccessToSourceBucket statement allows read access to our S3 bucket that contains the source data we intend to query. The AllowS3AccessForAthenaQueryResults statement focuses on the S3 bucket where Athena stores query results. It grants permissions for Athena to write results to the configured S3 bucket and for our application to retrieve them

Finally, to allow interactions with AWS Glue, which Athena uses as its metadata store, we define the AllowGlueCatalogAccessForAthena statement. It allows us to create and retrieve database and table definitions which are essential for Athena to understand the structure of our S3 data and execute SQL queries.

Our IAM policy conforms to the least privilege principle, granting only the necessary permissions required by our application to function correctly.

8. Conclusion

In this article, we’ve explored using Amazon Athena with Spring Boot to query data directly from our S3 buckets without setting up any complex infrastructure.

We discussed starting a query execution, waiting for its completion, and generically processing the query results. Additionally, we automated the creation of databases and tables using SQL scripts executed during application startup.

As always, all the code examples used in this article are available over on GitHub.

↧

Java IDEs in 2024 – Survey Results

July 10, 2024, 11:30 pm

≫ Next: Building Simple Java Applications with Scala-CLI

≪ Previous: Using Amazon Athena With Spring Boot to Query S3 Data

I recently ran a short survey to understand what IDEs are the most popular for Java development in 2024.

First, thank you to everyone who participated.

Now, let’s look at the results:

As we can see, IntelliJ is the most popular Java IDE (including both the Community and Ultimate versions), followed by Eclipse.

Visual Studio Code is also a notable entry here, coming in at number 3, which puts it ahead of NetBeans.

The results were collected over email from July 2 to July 10, 2024.

In total, there were 5788 respondents, with some users choosing more than one option:

IDE	Responses
IntelliJ	4418
Eclipse	1100
VSCode	730
NetBeans	426
Other	347

To see the evolution of IDE market share, we can compare this with a similar survey I ran in 2019:

At this point, IntelliJ was still the number one choice, but with a smaller percentage than today. In the meantime, the percentage of IntelliJ users has continued to grow, while Eclipse has decreased in popularity.

↧

Building Simple Java Applications with Scala-CLI

July 11, 2024, 3:53 am

≫ Next: Do Spring Prototype Beans Need to Be Destroyed Manually?

≪ Previous: Java IDEs in 2024 – Survey Results

1. Introduction

We can independently create and run a simple Java class with just a Java installation. However, including third-party dependencies usually requires a build tool like Maven or Gradle, which can be cumbersome for small and straightforward Java applications. Packaging a Java app as an executable also requires adding plugins to these build tools or using separate tools.

Fortunately, we can simplify Java app development for smaller projects using a practical Scala tool called Scala-CLI. In this tutorial, we’ll explore Scala-CLI’s features and how to use it for tasks like compiling, building, packaging, and using specific JDKs.

2. What is Scala-CLI?

Scala-CLI is a command-line tool primarily designed for building Scala applications. It simplifies compiling, running, and packaging Scala applications, eliminating the need for a comprehensive build system.

Interestingly, Scala-CLI can also build pure Java applications, leveraging its full capabilities to streamline development.

3. Set-Up

Scala-CLI can be installed by following the instructions provided on the official website. We can verify if the installation is successful by using the following command:

scala-cli --version

If the installation is successful, it prints the scala-cli version to the console.

Furthermore, we can install the Metals extension in VS Code Editor to work inside an IDE. Scala-CLI doesn’t need a pre-installed JDK on the machine, as it can manage the JDK itself.

However, sometimes, there might be a conflict with the other Java plugins. If there is an issue with error reporting in VS Code, we might need to disable the other Java extension temporarily.

We’ll also create a base directory named scala-cli for this tutorial, where we keep all the files related to this tutorial.

4. Hello World Program

As is the norm, let’s start with a simple Hello World program. First, we can create a new hello-world directory under the base directory. Next, let’s create a file named HelloWorld.java within this directory with the following code:

package com.baeldung;
public class HelloWorld {
    public static void main(String args[]) {
        System.out.println("Hello, World!");
    }
}

We can run this simple Java class using the command:

scala-cli HelloWorld.java

On the first run, it downloads the necessary libraries for it to run. If no JDK is installed, Scala-CLI automatically downloads the Open JDK and uses it for compilation. If there is already one available, then it uses that particular JDK. While writing this article, Scala-CLI uses the JDK 17 as the default version. However, it might upgrade to newer versions in the future. After that, it runs this class and prints the given text to the console:

If we open this directory in VS Code, we might see compilation errors and need proper navigation ability. The VS Code might be able to apply navigation and code completion in this case; however, when we use third-party dependencies, it won’t resolve them. To support it, we can create the necessary metadata for the Metals plugins by using the command inside the hello-world directory:

scala-cli setup-ide .

This generates the necessary metadata required by the Metals plugin for advanced features.

The Metals plugin also provides a tooltip to run the main class directly from the VS Code IDE:

We can click on the run action, and it executes the program.

5. Directives

In the previous section, we ran a simple program using Scala-CLI. However, the same can be done with the java command and doesn’t need another tool. Now, let’s look at some of Scala-CLI’s most useful features.

Directives are metadata settings that have special meanings in Scala-CLI. The directives start with the special syntax //>. They must be placed before any Java code, even before the package statement. In this section, let’s explore some of the useful directives in Scala-CLI.

5.1. Specifying Java Version

In the previous sample code, we didn’t specify any Java version, and it uses either the default JDK or the already installed JDK for the execution. However, we can configure a specific version of JDK we want the code to use. For that, we use the jvm directive.

Let’s write a sample code that uses JDK 21. We can name the file as Directives.java and place it under scala-cli/jdk-config path:

//> using jvm 21
package com.baeldung;
record Greet(String name){};
public class Jdk21Sample {
    public static void main(String args[]) {
        var greet = new Greet("Baeldung");
        var greeting = "Hello, " + greet.name();
        System.out.println(greeting);
    }
}

Notice the directive jvm being used with the required version of JDK. This means the Scala-CLI uses JDK 21 to execute the entire file.

By default, it uses Adoptium Open JDK of the specified version. We can also select another flavor of JDK explicitly in the directive. For example, to use Zulu JDK, we can use the directive:

//> using jvm zulu:21

When we run this directive, Scala-CLI downloads the Zulu JDK and executes the program using it. Without any manual setup, we can easily switch between different JDKs and execute the programs. Under the hood, Scala-CLI uses the tool Coursier to manage the dependencies and the JDK versions.

5.2. Passing Javac Options and Java Properties

We can use directives to pass Java options to the compiler. Let’s look at an example:

//> using jvm 21
//> using javaOpt -Xmx2g, -DappName=baeldungApp, --enable-preview
//> using javaProp language=english, country=usa
//> using javacOpt --release 21 --enable-preview
public class JavaArgs {
    public static void main(String[] args) {
        String appName = System.getProperty("appName");
        String language = System.getProperty("language");
        String country = System.getProperty("country");
        String combinedStr = STR."appName = \{ appName } , language = \{ language } and country = \{ country }";
        System.out.println(combinedStr);
    }
}

The code above passes three different types of options. We can use the directive javaOpt to pass the JVM arguments such as -Xmx or system properties. Similarly, we can use javacOpt to pass options to the javac compiler used during the compilation. Furthermore, we can also pass Java properties to the application by using the directive javaProp.

These directives enable configuring and controlling arguments directly within the main Java class.

5.3. Managing External Dependencies

We can manage external dependencies using the directive dep. To demonstrate this, let’s add a simple library. We can create a new file named DependencyApp.java and paste the following content:

//> using dep com.google.code.gson:gson:2.8.9
import com.google.gson.JsonParser;
import com.google.gson.JsonElement;
public class DependencyApp {
    public static void main(String args[]) {
        String jsonString = "{\"country\": \"Germany\", \"language\": \"German\", \"currency\": \"Euro\"}";
        var countryJson = JsonParser.parseString(jsonString);
        var country = countryJson.getAsJsonObject().get("country").getAsString();
        System.out.println("Selected country: " + country);
    }
}

In the code above, we added the dependency for Google Gson using the directive dep. Scala-CLI uses the gradle style syntax for dependency definition.

When we execute the t code, we get the output as:

Compiled project (Java)
[hint] ./DependencyApp.java:1:15
[hint] "gson is outdated, update to 2.11.0"
[hint]      gson 2.8.9 -> com.google.code.gson:gson:2.11.0
[hint] //> using dep com.google.code.gson:gson:2.8.9
[hint]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Selected country: Germany

We can observe that Scala-CLI can detect the library version and suggest updates if a newer version is available. This feature is convenient for maintaining up-to-date dependencies without manually searching the Maven Central repository.

6. Packaging the Java Application as An Executable

One of Scala-CLI’s most valuable features is its ability to create executable applications directly from code without requiring additional plugins. Let’s explore creating an executable application using the previously created file, DependencyApp.java.

We can use the following command to create the executable:

scala-cli --power package DependencyApp.java -o myApp --assembly

The package command create a package. The –power flag indicates that this is a power-user command. We can specify the executable app name using the argument -o. If we include the –assembly flag, Scala-CLI will create an executable JAR; otherwise, it creates a library JAR.

Scala-CLI creates an executable JAR by default when we use the assembly flag and wrap it in a standalone script. To run the created package, use the command:

./myApp

Instead of creating a wrapper executable, we can generate a simple JAR by passing an additional flag.

scala-cli --power package DependencyApp.java -o myApp.jar --assembly --preamble=false

Now, we can execute it as follows:

java -jar myApp.jar

Similarly, Scala-CLI allows us to create Docker images, GraalVM native images, and platform-specific formats such as deb, msi, etc.

7. Conclusion

In this article, we explored several key features of Scala-CLI, a tool similar to JBang. Scala-CLI, while strongly emphasizing Scala, also provides robust support for Java applications. We demonstrated how Scala-CLI streamlines development by simplifying various aspects of application building, offering easy customization, support for different packaging formats, and more.

As always, the sample code used in this tutorial is available over on GitHub.

↧

Do Spring Prototype Beans Need to Be Destroyed Manually?

July 11, 2024, 3:58 am

≫ Next: Guide to FileOutputStream vs. FileChannel

≪ Previous: Building Simple Java Applications with Scala-CLI

1. Introduction

In this tutorial, we’ll explore how the Spring Framework handles prototype beans and manages their lifecycle. Understanding how to use beans and their scopes is an important and useful aspect of application development. We’ll discover whether manually destroying prototype beans is necessary, when, and how to do it.

Although Spring provides us with various helpful bean scopes, prototype will be the main topic of this lesson.

2. Prototype Bean and Its Lifecycle

Scope determines the bean’s lifecycle and visibility within the context in which it exists. According to their defined scope, the IoC container is responsible for managing the lifecycle of beans. A prototype scope dictates that the container creates a new instance of the bean every time it’s requested using getBean() or injected into another bean. In the case of creation and initialization, we can safely rely on Spring. However, the process of destroying beans is different.

Before we check the necessity of destroying the bean, let’s first look at how to create a prototype bean:

@Component
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class PrototypeExample {
}

3. Do Prototype Beans Need Manual Destruction?

Spring doesn’t automatically destroy prototype beans. Unlike singleton scope, where the IoC container takes care of the bean’s entire lifecycle, with a prototype, that’s not the case. The container will instantiate, configure, and assemble the prototype bean, but then it will stop keeping track of its state.

In Java, an object becomes eligible for garbage collection when it’s no longer reachable through any references. it’s usually sufficient to leave a prototype bean instance after its usage for the garbage collector to pick it up. In other words, we don’t have to bother destroying prototype beans in most use cases.

On the other hand, let’s consider scenarios where it’s advisable to destroy beans manually. For example while working with processes that require resources such as handling files, database connections, or networking. Since prototype scope states that a bean is created each time we use it, that means resources are utilized and consumed as well. As a result, accumulation of usage over time can lead to potential issues such as memory leaks and the exhaustion of connection pools. That happens because we never release those resources, we keep creating new ones simply by using prototype beans.

That’s why we must ensure that we properly destroy prototype beans after we use them, closing all the resources that we created or used.

4. How to Destroy Prototype Bean?

There are several ways how to destroy beans in Spring manually. it’s important to note that the container will apply each mechanism if we use multiple of them, but we need to use at least one.

Each example requires manually invoking the method destroyBean() from BeanFactory, except in the custom method approach where we can invoke our custom method. We’ll get BeanFactory from ApplicationContext and invoke bean destruction:

applicationContext.getBeanFactory().destroyBean(prototypeBean);

4.1. Using @PreDestroy Annotation

Annotation @PreDestroy is used to mark the method of our bean which is responsible for destroying the bean. A method is not allowed to have any parameters nor to be static. We’ll see how that looks in practice:

@Component
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class PreDestroyBeanExample {
    @PreDestroy
    private void destroy() {
        // release all resources that the bean is holding
    }
}

4.2. DisposableBean Interface

DisposableBean interface has a singular callback method destroy(), which we have to implement. Spring team doesn’t recommend using the DisposableBean interface because it couples the code to Spring. Nevertheless, we’ll take a look at how to use it:

@Component
@Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
public class DisposableBeanExample implements DisposableBean {
    @Override
    public void destroy() {
        // release all resources that the bean is holding
    }
}

4.3. DestructionAwareBeanPostProcessor Interface

DestructionAwareBeanPostProcessor, like other BeanPostProcessor variants, customizes the initialization of beans. One key difference is that it includes an additional method to execute custom logic before destroying a bean.

Before implementing an interface, we must ensure that we have a way to release resources from our bean. We can use a DisposableBean, just as in the previous example, or a custom method.

The next step is to implement an interface where we’ll invoke our destruction method:

@Component
public class CustomPostProcessor implements DestructionAwareBeanPostProcessor {
    @Override
    public void postProcessBeforeDestruction(Object bean, String beanName) throws BeansException {
        if (bean instanceof PostProcessorBeanExample) {
            ((PostProcessorBeanExample) bean).destroy();
        }
    }
}

4.4. Custom Method With POJOs

There could be a scenario in which we have a POJO that we want to define as a prototype bean. While defining a bean, we can use the property destroyMethod to specify a particular method that will be responsible for destroying the bean. Let’s see how is that done:

public class CustomMethodBeanExample {
    public void destroy() {
        // release all resources that the bean is holding
    }
}
@Configuration
public class DestroyMethodConfig {
    @Bean(destroyMethod = "destroy")
    @Scope(value = ConfigurableBeanFactory.SCOPE_PROTOTYPE)
    public CustomMethodBeanExample customMethodBeanExample() {
        return new CustomMethodBeanExample();
    }
}

We successfully marked our custom method as a destroyMethod callback, but it will never be invoked. This is because the container only invokes it for beans whose lifecycle it fully controls. In scenarios like this, we could utilize DestructionAwareBeanPostProcessor or simply invoke our custom destruction method when we stop using prototype bean.

5. Conclusion

In this article, we explored what are prototype beans and how Spring handles initialization but then leaves clients to take care of the destruction.

Although it may not be necessary to manually destroy prototype beans, it’s recommended to do so if they handle resources like file handling, database connections, or networking. Since prototype bean instances are created each time we request it, we’ll stack up resources pretty quickly. To avoid any unwanted problems, such as memory leaks, we have to release resources.

We learned several approaches that we can use to destroy beans, including @PreDestroy, DisposableBean interface, DestructionAwareBeanPostProcessor interface, and custom methods.

As always, full code examples are available over on GitHub.

↧

Guide to FileOutputStream vs. FileChannel

July 11, 2024, 4:01 am

≫ Next: How to Solve “java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking”

≪ Previous: Do Spring Prototype Beans Need to Be Destroyed Manually?

1. Introduction

When working with file I/O operations in Java, FileOutputStream and FileChannel are the two common approaches for writing data to files. In this tutorial, we’ll explore their functionalities and understand their differences.

2. FileOutputStream

FileOutputStream is a part of the java.io package and is one of the simplest ways to write binary data to a file. It’s a good choice for straightforward write operations, especially with smaller files. Its simplicity makes it easy to use for basic file writing tasks.

Here’s a code snippet demonstrating how to write a byte array to a file using FileOutputStream:

byte[] data = "This is some data to write".getBytes();
try (FileOutputStream outputStream = new FileOutputStream("output.txt")) {
    outputStream.write(data);
} catch (IOException e) {
    // ...
}

In this example, we first create a byte array containing the data to write. Next, we initialize a FileOutputStream object, and specify the file name “output.txt“. The try-with-resources statement ensures automatic resource closing. The write() method of FileOutputStream writes the entire byte array “data” to the file.

3. FileChannel

FileChannel is a part of the java.nio.channels package and provides more advanced and flexible file I/O operations compared to FileOutputStream. It’s particularly well-suited for handling larger files, random access, and performance-critical applications. Its use of buffers allows for more efficient data transfer and manipulation.

Here’s a code snippet demonstrating how to write a byte array to a file using FileChannel:

byte[] data = "This is some data to write".getBytes();
ByteBuffer buffer = ByteBuffer.wrap(data);
try (FileChannel fileChannel = FileChannel.open(Path.of("output.txt"), 
  StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
    fileChannel.write(buffer);
} catch (IOException e) {
    // ...
}

In this example, we create a ByteBuffer and wrap the byte array data into it. Then we initialize a FileChannel object using the FileChannel.open() method. Next, we also specify the file name “output.txt” and the necessary open options (StandardOpenOption.WRITE and StandardOpenOption.CREATE).

The write() method of FileChannel then writes the content of the ByteBuffer to the specified file.

4. Data Access

In this section, let’s dive into the differences between FileOutputStream and FileChannel in the context of data access.

4.1. FileOutputStream

FileOutputStream writes data sequentially, meaning it writes bytes to a file in the order they’re given, from the beginning to the end. It doesn’t support jumping to specific positions within the file to read or write data.

Here’s an example of writing data sequentially using FileOutputStream:

byte[] data1 = "This is the first line.\n".getBytes();
byte[] data2 = "This is the second line.\n".getBytes();
try (FileOutputStream outputStream = new FileOutputStream("output.txt")) {
    outputStream.write(data1);
    outputStream.write(data2);
} catch (IOException e) {
    // ...
}

In this code, “This is the first line.” will be written first, followed by “This is the second line.” on a new line in the “output.txt” file. We can’t write data in the middle of the file without rewriting everything from the beginning.

4.2. FileChannel

On the other hand, FileChannel allows us to read or write data at any position in the file. This is because FileChannel uses a file pointer that can be moved to any position in the file. This is achieved using the position() method, which sets the position within the file where the next read or write occurs.

The code snippet below demonstrates how FileChannel can write data to specific positions within the file:

ByteBuffer buffer1 = ByteBuffer.wrap(data1);
ByteBuffer buffer2 = ByteBuffer.wrap(data2);
try (FileChannel fileChannel = FileChannel.open(Path.of("output.txt"), 
  StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
    fileChannel.write(buffer1);
    fileChannel.position(10);
    fileChannel.write(buffer2);
} catch (IOException e) {
    // ...
}

In this example, data1 is written at the beginning of the file. Now, we want to insert data2 into the file starting from position 10. Therefore, we set the position to 10 using fileChannel.position(10), and then data2 is written starting at the 10th byte.

5. Concurrency and Thread Safety

In this section, we’ll explore how FileOutputStream and FileChannel handle concurrency and thread safety.

5.1. FileOutputStream

FileOutputStream doesn’t handle synchronization internally. If two threads are trying to write to the same FileOutputStream concurrently, the result can be unpredictable data interleaving in the output file. Therefore we need synchronization to ensure thread safety.

Here’s an example using FileOutputStream with external synchronization:

final Object lock = new Object();
void writeToFile(String fileName, byte[] data) {
    synchronized (lock) {
        try (FileOutputStream outputStream = new FileOutputStream(fileName, true)) {
            outputStream.write(data);
            log.info("Data written by " + Thread.currentThread().getName());
        } catch (IOException e) {
            // ...
        }
    }
}

In this example, we use a common lock object to synchronize the access to the file. When multi threads write data to the file sequentially, it ensures the thread safety:

Thread thread1 = new Thread(() -> writeToFile("output.txt", data1));
Thread thread2 = new Thread(() -> writeToFile("output.txt", data2));
thread1.start();
thread2.start();

5.2. FileChannel

In contrast, FileChannel supports file locking, allowing us to lock specific file sections to prevent other threads or processes from accessing that data simultaneously.

Here’s an example of using FileChannel with FileLock to handle concurrent access:

void writeToFileWithLock(String fileName, ByteBuffer buffer, int position) {
    try (FileChannel fileChannel = FileChannel.open(Path.of(fileName), StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
        // Acquire an exclusive lock on the file
        try (FileLock lock = fileChannel.lock(position, buffer.remaining(), false)) {
            fileChannel.position(position);
            fileChannel.write(buffer);
            log.info("Data written by " + Thread.currentThread().getName() + " at position " + position);
        } catch (IOException e) {
            // ...
        }
    } catch (IOException e) {
        // ...
    }
}

In this example, the FileLock object is used to ensure that the file section being written to is locked to prevent other threads from accessing it concurrently. When a thread calls writeToFileWithLock(), it first acquires a lock on the specific section of the file:

Thread thread1 = new Thread(() -> writeToFileWithLock("output.txt", buffer1, 0));
Thread thread2 = new Thread(() -> writeToFileWithLock("output.txt", buffer2, 20));
thread1.start();
thread2.start();

6. Performance

In this section, we’ll compare the performance of FileOutputStream and FileChannel using JMH. We’ll create a benchmark class that includes both FileOutputStream and FileChannel benchmarks to evaluate their performance in handling large files:

@Setup
public void setup() {
    largeData = new byte[1000 * 1024 * 1024]; // 1 GB of data
    Arrays.fill(largeData, (byte) 1);
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void testFileOutputStream() {
    try (FileOutputStream outputStream = new FileOutputStream("largeOutputStream.txt")) {
        outputStream.write(largeData);
    } catch (IOException e) {
        // ...
    }
}
@Benchmark
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
public void testFileChannel() {
    ByteBuffer buffer = ByteBuffer.wrap(largeData);
    try (FileChannel fileChannel = FileChannel.open(Path.of("largeFileChannel.txt"), StandardOpenOption.WRITE, StandardOpenOption.CREATE)) {
        fileChannel.write(buffer);
    } catch (IOException e) {
        // ...
    }
}

Let’s execute the benchmarks and compare the performance of FileOutputStream and FileChannel. The results show the average time taken for each operation in milliseconds:

Options opt = new OptionsBuilder()
  .include(FileIOBenchmark.class.getSimpleName())
  .forks(1)
  .build();
new Runner(opt).run();

After running the benchmarks, we obtained the following results:

Benchmark                             Mode  Cnt    Score    Error  Units
FileIOBenchmark.testFileChannel       avgt    5  431.414 ± 52.229  ms/op
FileIOBenchmark.testFileOutputStream  avgt    5  556.102 ± 91.512  ms/op

FileOutputStream is designed for simplicity and ease of use. However, when dealing with large files with high-frequency I/O operations, it can introduce some overhead. This is because the FileOutputStream operations are blocking, which means each write operation must be completed before the next one starts.

On the other hand, FileChannel supports memory-mapped I/O, which can map a section of the file into memory. This enables data manipulation directly in memory space, resulting in faster transfer.

7. Conclusion

In this article, we’ve explored two file I/O methods: FileOutputStream and FileChannel. FileOutputStream offers simplicity and ease for basic file writing tasks, ideal for smaller files and sequential data writing.

On the other hand, FileChannel provides advanced features like direct buffer access for better performance with large files.

As always, the source code for the examples is available over on GitHub.

↧

How to Solve “java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking”

July 11, 2024, 4:03 am

≫ Next: Function Calling in Java and Spring AI Using the Mistral AI API

≪ Previous: Guide to FileOutputStream vs. FileChannel

1. Introduction

In this article, we’ll see a common mistake that developers make while using Spring Webflux. Spring Webflux is a non-blocking web framework built from the ground up to take advantage of multi-core, next-generation processors and handle massive concurrent connections.

Since it’s a non-blocking framework the threads shouldn’t be blocked. Let’s explore this in more detail.

2. Spring Webflux Threading Model

To understand this issue better we need to understand the threading model of Spring Webflux.

In Spring Webflux, a small pool of worker threads handles incoming requests. This contrasts with the Servlet model where each request gets a dedicated thread. Hence, the framework is protective of what happens on these request-accepting threads.

With that understanding in mind, let’s dive deep into the main focus of this article.

3. Understanding IllegalStateException With Thread Blocking

Let’s understand with the help of an example when and why we get the error “java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread” in Spring Webflux.

Let’s take an example of a file-searching API. This API reads a file from the file system and searches for the text provided by the user within the file.

3.1. File Service

Let’s first define a FileService class which reads a file’s content as a string:

@Service
public class FileService {
    @Value("${files.base.dir:/tmp/bael-7724}")
    private String filesBaseDir;
    public Mono<String> getFileContentAsString(String fileName) {
        return DataBufferUtils.read(Paths.get(filesBaseDir + "/" + fileName), DefaultDataBufferFactory.sharedInstance, DefaultDataBufferFactory.DEFAULT_INITIAL_CAPACITY)
          .map(dataBuffer -> dataBuffer.toString(StandardCharsets.UTF_8))
          .reduceWith(StringBuilder::new, StringBuilder::append)
          .map(StringBuilder::toString);
    }
}

It’s worth noting that FileService reads the file from the file system reactively (asynchronously).

3.2. File Content Search Service

We’re ready to leverage this FileService to write a file-searching service:

@Service
public class FileContentSearchService {
    @Autowired
    private FileService fileService;
    public Mono<Boolean> blockingSearch(String fileName, String searchTerm) {
        String fileContent = fileService
          .getFileContentAsString(fileName)
          .doOnNext(content -> ThreadLogger.log("1. BlockingSearch"))
          .block();
        boolean isSearchTermPresent = fileContent.contains(searchTerm);
        return Mono.just(isSearchTermPresent);
    }
}

The file-searching service returns a boolean depending on whether the search term is found in the file. For this, we call the getFileContentAsString() method of FileService. Since we get the result asynchronously, i.e., as a Mono<String>, we call block() to get the String value. After that, we check whether the fileContent contains the searchTerm. Finally, we wrap and return the result in a Mono.

3.3. Files Controller

Finally, we’ve got the FileController which makes use of the FileContentSearchService‘s blockingSearch() method:

@RestController
@RequestMapping("bael7724/v1/files")
public class FileController {
    ...
    @GetMapping(value = "/{name}/blocking-search")
    Mono<Boolean> blockingSearch(@PathVariable("name") String fileName, @RequestParam String term) {
        return fileContentSearchService.blockingSearch(fileName, term);
    }
}

3.4. Reproducing the Exception

We can observe that the Controller calls the method of FileContentSearchService which in turn calls the block() method. Since this was on a request-accepting thread, if we call our API in the current arrangement, we would encounter the notorious exception we’re after:

12:28:51.610 [reactor-http-epoll-2] ERROR o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler - [ea98e542-1]  500 Server Error for HTTP GET "/bael7724/v1/files/a/blocking-search?term=a"
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-epoll-2
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:86)
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    *__checkpoint ⇢ com.baeldung.filters.TraceWebFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ com.baeldung.filters.ExceptionalTraceFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ HTTP GET "/bael7724/v1/files/a/blocking-search?term=a" [ExceptionHandlingWebHandler]
Original Stack Trace:
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:86)
	at reactor.core.publisher.Mono.block(Mono.java:1712)
	at com.baeldung.bael7724.service.FileContentSearchService.blockingSearch(FileContentSearchService.java:20)
	at com.baeldung.bael7724.controller.FileController.blockingSearch(FileController.java:35)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)

3.5. The Root Cause

The root cause of this exception is calling block() on the request-accepting thread. In our sample code above, the block() method is being called on one of the threads from the thread pool which accepts the request. Specifically, on the threads marked as “non-blocking operations only”, i.e., threads implementing Reactor’s NonBlocking marker interface, like those started by Schedulers.parallel().

4. Solution

Let’s now look at what can be done to address this exception.

4.1. Embracing the Reactive Operations

The idiomatic approach is to use reactive operations instead of calling block(). Let’s update the code to make use of the map() operation to translate the String into Boolean:

public Mono<Boolean> nonBlockingSearch(String fileName, String searchTerm) {
    return fileService.getFileContentAsString(fileName)
      .doOnNext(content -> ThreadLogger.log("1. NonBlockingSearch"))
      .map(content -> content.contains(searchTerm))
      .doOnNext(content -> ThreadLogger.log("2. NonBlockingSearch"));
}

We’ve thus eliminated the need to call block() altogether. When we run the above method we notice the following thread context:

[1. NonBlockingSearch] ThreadName: Thread-4, Time: 2024-06-17T07:40:59.506215299Z
[2. NonBlockingSearch] ThreadName: Thread-4, Time: 2024-06-17T07:40:59.506361786Z
[1. In Controller] ThreadName: Thread-4, Time: 2024-06-17T07:40:59.506465805Z
[2. In Controller] ThreadName: Thread-4, Time: 2024-06-17T07:40:59.506543145Z

The above log statements indicate that we’ve performed the operations on the same thread pool which is accepting the request.

Notably, even though we didn’t encounter the exception it’s better to run the I/O operations such as reading from a file on a different thread pool.

4.2. Blocking on the Bounded Elastic Thread Pool

Let’s say we can’t avoid block() for some reason. Then how do we go about this? We concluded that the exception occurred as we called block() on the request-accepting thread pool. Hence, to call block() we need to switch the thread pool. Let’s see how we can do this:

public Mono<Boolean> workableBlockingSearch(String fileName, String searchTerm) {
    return Mono.just("")
      .doOnNext(s -> ThreadLogger.log("1. WorkableBlockingSearch"))
      .publishOn(Schedulers.boundedElastic())
      .doOnNext(s -> ThreadLogger.log("2. WorkableBlockingSearch"))
      .map(s -> fileService.getFileContentAsString(fileName)
        .block()
        .contains(searchTerm))
      .doOnNext(s -> ThreadLogger.log("3. WorkableBlockingSearch"));
}

To switch the thread pool, Spring Webflux provides two operations publishOn() and subscribeOn(). We’ve used publishOn() which changes the thread for the operations that come after publishOn(), without affecting the subscription or the upstream operations. Since the thread pool is now switched to the bounded elastic, we can call block().

Now, if we run the workableBlockingSearch() method we’ll get the following thread context:

[1. WorkableBlockingSearch] ThreadName: parallel-2, Time: 2024-06-17T07:40:59.440562518Z
[2. WorkableBlockingSearch] ThreadName: boundedElastic-1, Time: 2024-06-17T07:40:59.442161018Z
[3. WorkableBlockingSearch] ThreadName: boundedElastic-1, Time: 2024-06-17T07:40:59.442891230Z
[1. In Controller] ThreadName: boundedElastic-1, Time: 2024-06-17T07:40:59.443058091Z
[2. In Controller] ThreadName: boundedElastic-1, Time: 2024-06-17T07:40:59.443181770Z

We can see that from number 2 onwards the operations have indeed happened on the bounded elastic thread pool hence, we didn’t get IllegalStateException.

4.3. Caveats

Let’s look at some of the caveats to this solution with the block.

There are many ways we can go wrong with calling block(). Let’s take one example where even though we use a Scheduler to switch the thread context it doesn’t behave the way we expect it to:

public Mono<Boolean> incorrectUseOfSchedulersSearch(String fileName, String searchTerm) {
    String fileContent = fileService.getFileContentAsString(fileName)
      .doOnNext(content -> ThreadLogger.log("1. IncorrectUseOfSchedulersSearch"))
      .publishOn(Schedulers.boundedElastic())
      .doOnNext(content -> ThreadLogger.log("2. IncorrectUseOfSchedulersSearch"))
      .block();
    boolean isSearchTermPresent = fileContent.contains(searchTerm);
    return Mono.just(isSearchTermPresent);
}

In the above code sample, we’ve used publishOn() as recommended in the solution but the block() method is still causing the exception. When we run the above code, we’ll get the following logs:

[1. IncorrectUseOfSchedulersSearch] ThreadName: Thread-4, Time: 2024-06-17T08:57:02.490298417Z
[2. IncorrectUseOfSchedulersSearch] ThreadName: boundedElastic-1, Time: 2024-06-17T08:57:02.491870410Z
14:27:02.495 [parallel-1] ERROR o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler - [53e4bce1]  500 Server Error for HTTP GET "/bael7724/v1/files/robots.txt/incorrect-use-of-schedulers-search?term=r-"
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread parallel-1
    at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:86)
    Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: 
Error has been observed at the following site(s):
    *__checkpoint ⇢ com.baeldung.filters.TraceWebFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ com.baeldung.filters.ExceptionalTraceFilter [DefaultWebFilterChain]
    *__checkpoint ⇢ HTTP GET "/bael7724/v1/files/robots.txt/incorrect-use-of-schedulers-search?term=r-" [ExceptionHandlingWebHandler]
Original Stack Trace:
	at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:86)
	at reactor.core.publisher.Mono.block(Mono.java:1712)
	at com.baeldung.bael7724.service.FileContentSearchService.incorrectUseOfSchedulersSearch(FileContentSearchService.java:64)
	at com.baeldung.bael7724.controller.FileController.incorrectUseOfSchedulersSearch(FileController.java:48)
	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)

This indicates that the second log statement did indeed run on the bounded elastic thread pool. However, we still encountered the exception. The reason is the block() is still running on the same request accepting thread pool.

Let’s look at another caveat. Even though we’re switching the thread pool, we cannot use the parallel thread pool, i.e., Schedulers.parallel(). As mentioned earlier certain thread pools don’t allow invoking block() on their threads – parallel thread pool is one of them.

Finally, we’ve only used Schedulers.boundedElastic() in our examples. Instead, we could have also used any custom thread pool via Schedulers.fromExecutorService().

5. Conclusion

In conclusion, to effectively address the issue of IllegalStateException in Spring Webflux when using blocking operations like block(), we should adopt a non-blocking, reactive approach. By leveraging reactive operators such as map(), we can perform operations on the same reactive thread pool, eliminating the need for explicit block(). If block() cannot be avoided, switching the execution context to a boundedElastic scheduler or a custom thread pool using publishOn() can isolate these operations from the reactive request-accepting thread pool, thus preventing exceptions.

It’s essential to be aware of thread pools that don’t support blocking calls and to ensure that the correct context switching is applied to maintain the application’s stability and performance.

As always, the source code used in this article is available over on GitHub.

↧

Function Calling in Java and Spring AI Using the Mistral AI API

July 13, 2024, 1:07 pm

≫ Next: A Guide to the @AutoClose Extension in JUnit5

≪ Previous: How to Solve “java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking”

1. Overview

Using large language models, we can retrieve a lot of useful information. We can learn many new facts about anything and get answers based on existing data on the internet. We can ask them to process input data and perform various actions. But what if we ask the model to use an API to prepare the output?

For this purpose, we can use Function Calling. Function calling allows LLMs to interact with and manipulate data, perform calculations, or retrieve information beyond their inherent textual capabilities.

In this article, we’ll explore what function calling is and how we can use it to integrate the LLMs with our internal logic. As the model provider, we’ll use the Mistral AI API.

2. Mistral AI API

Mistral AI focuses on providing open and portable generative AI models for developers and businesses. We can use it for simple prompts as well as for function-calling integrations.

2.1. Retrieve API Key

To start using the Mistral API, we first need to retrieve the API key. Let’s go to the API-keys management console:

To activate any key we have to set up the billing configuration or use the trial period if available:

After settling everything, we can push the Create new key button to obtain the Mistral API key.

2.2. Example of Usage

Let’s start with a simple prompting. We’ll ask the Mistral API to return us a list of patient statuses. Let’s implement such a call:

@Test
void givenHttpClient_whenSendTheRequestToChatAPI_thenShouldBeExpectedWordInResponse() throws IOException, InterruptedException {
    String apiKey = System.getenv("MISTRAL_API_KEY");
    String apiUrl = "https://api.mistral.ai/v1/chat/completions";
    String requestBody = "{"
      + "\"model\": \"mistral-large-latest\","
      + "\"messages\": [{\"role\": \"user\", "
      + "\"content\": \"What the patient health statuses can be?\"}]"
      + "}";
    HttpClient client = HttpClient.newHttpClient();
    HttpRequest request = HttpRequest.newBuilder()
      .uri(URI.create(apiUrl))
      .header("Content-Type", "application/json")
      .header("Accept", "application/json")
      .header("Authorization", "Bearer " + apiKey)
      .POST(HttpRequest.BodyPublishers.ofString(requestBody))
      .build();
    HttpResponse<String> response = client.send(request, HttpResponse.BodyHandlers.ofString());
    String responseBody = response.body();
    logger.info("Model response: " + responseBody);
    Assertions.assertThat(responseBody)
      .containsIgnoringCase("healthy");
}

We created an HTTP request and sent it to the /chat/completions endpoint. Then, we used the API key as the authorization header value. As expected, in the response we see both metadata and the content itself:

Model response: {"id":"585e3599275545c588cb0a502d1ab9e0","object":"chat.completion",
"created":1718308692,"model":"mistral-large-latest",
"choices":[{"index":0,"message":{"role":"assistant","content":"Patient health statuses can be
categorized in various ways, depending on the specific context or medical system being used.
However, some common health statuses include:
1.Healthy: The patient is in good health with no known medical issues.
...
10.Palliative: The patient is receiving care that is focused on relieving symptoms and improving quality of life, rather than curing the underlying disease.",
"tool_calls":null},"finish_reason":"stop","logprobs":null}],
"usage":{"prompt_tokens":12,"total_tokens":291,"completion_tokens":279}}

The example of function calling is more complex and requires a lot of preparation before the call. We’ll discover it in the next section.

3. Spring AI Integration

Let’s see a few examples of usage of the Mistral API with function calls. Using Spring AI we can avoid a lot of preparation work and let the framework do it for us.

3.1. Dependencies

The needed dependency is located in the Spring milestone repository. Let’s add it to our pom.xml:

<repositories>
    <repository>
        <id>spring-milestones</id>
        <name>Spring milestones</name>
        <url>https://repo.spring.io/milestone</url>
    </repository>
</repositories>

Now, let’s add the dependency for the Mistral API integration:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-mistral-ai-spring-boot-starter</artifactId>
    <version>0.8.1</version>
</dependency>

3.2. Configuration

Now let’s add the API key we obtained previously into the properties file:

spring:
  ai:
    mistralai:
      api-key: ${MISTRAL_AI_API_KEY}
      chat:
        options:
          model: mistral-small-latest

And that’s all that we need to start using the Mistral API.

3.3. Usecase With One Function

In our demo example, we’ll create a function that returns the patient’s health status based on their ID.

Let’s start by creating the patient record:

public record Patient(String patientId) {
}

Now let’s create another record for a patient’s health status:

public record HealthStatus(String status) {
}

In the next step, we’ll create a configuration class:

@Configuration
public class MistralAIFunctionConfiguration {
    public static final Map<Patient, HealthStatus> HEALTH_DATA = Map.of(
      new Patient("P001"), new HealthStatus("Healthy"),
      new Patient("P002"), new HealthStatus("Has cough"),
      new Patient("P003"), new HealthStatus("Healthy"),
      new Patient("P004"), new HealthStatus("Has increased blood pressure"),
      new Patient("P005"), new HealthStatus("Healthy"));
    @Bean
    @Description("Get patient health status")
    public Function<Patient, HealthStatus> retrievePatientHealthStatus() {
        return (patient) -> new HealthStatus(HEALTH_DATA.get(patient).status());
    }
}

Here, we’ve specified the dataset with patients’ health data. Additionally, we created the retrievePatientHealthStatus() function, which returns the health status for a given patient ID.

Now, let’s test our function by calling it within an integration:

@Import(MistralAIFunctionConfiguration.class)
@ExtendWith(SpringExtension.class)
@SpringBootTest
public class MistralAIFunctionCallingManualTest {
    @Autowired
    private MistralAiChatModel chatClient;
    @Test
    void givenMistralAiChatClient_whenAskChatAPIAboutPatientHealthStatus_thenExpectedHealthStatusIsPresentInResponse() {
        var options = MistralAiChatOptions.builder()
          .withFunction("retrievePatientHealthStatus")
          .build();
        ChatResponse paymentStatusResponse = chatClient.call(
          new Prompt("What's the health status of the patient with id P004?",  options));
        String responseContent = paymentStatusResponse.getResult().getOutput().getContent();
        logger.info(responseContent);
        Assertions.assertThat(responseContent)
          .containsIgnoringCase("has increased blood pressure");
    }
}

We’ve imported our MistralAIFunctionConfiguration class to add our retrievePatientHealthStatus() function to the test Spring context. We also injected MistralAiChatClient, which will be instantiated automatically by the Spring AI starter.

In the request to the chat API, we’ve specified the prompt text containing one of the patients’ IDs and the name of the function to retrieve the health status. Then we called the API and verified that the response contained the expected health status.

Additionally, we’ve logged the whole response text, and here is what we see there:

The patient with id P004 has increased blood pressure.

3.4. Usecase With Multiple Functions

We also can specify multiple functions and AI decides which one to use based on the prompt we send.

To demonstrate it, let’s extend our HealthStatus record:

public record HealthStatus(String status, LocalDate changeDate) {
}

We’ve added the date when the status was changed last time.

Now let’s modify the configuration class:

@Configuration
public class MistralAIFunctionConfiguration {
    public static final Map<Patient, HealthStatus> HEALTH_DATA = Map.of(
      new Patient("P001"), new HealthStatus("Healthy",
        LocalDate.of(2024,1, 20)),
      new Patient("P002"), new HealthStatus("Has cough",
        LocalDate.of(2024,3, 15)),
      new Patient("P003"), new HealthStatus("Healthy",
        LocalDate.of(2024,4, 12)),
      new Patient("P004"), new HealthStatus("Has increased blood pressure",
        LocalDate.of(2024,5, 19)),
      new Patient("P005"), new HealthStatus("Healthy",
        LocalDate.of(2024,6, 1)));
    @Bean
    @Description("Get patient health status")
    public Function<Patient, String> retrievePatientHealthStatus() {
        return (patient) -> HEALTH_DATA.get(patient).status();
    }
    @Bean
    @Description("Get when patient health status was updated")
    public Function<Patient, LocalDate> retrievePatientHealthStatusChangeDate() {
        return (patient) -> HEALTH_DATA.get(patient).changeDate();
    }
}

We’ve populated change dates for each of the status items. We also created the retrievePatientHealthStatusChangeDate() function, which returns information about the status change date.

Let’s see how we can use our two new functions with the Mistral API:

@Test
void givenMistralAiChatClient_whenAskChatAPIAboutPatientHealthStatusAndWhenThisStatusWasChanged_thenExpectedInformationInResponse() {
    var options = MistralAiChatOptions.builder()
      .withFunctions(
        Set.of("retrievePatientHealthStatus",
          "retrievePatientHealthStatusChangeDate"))
      .build();
    ChatResponse paymentStatusResponse = chatClient.call(
      new Prompt(
        "What's the health status of the patient with id P005",
        options));
    String paymentStatusResponseContent = paymentStatusResponse.getResult()
      .getOutput().getContent();
    logger.info(paymentStatusResponseContent);
    Assertions.assertThat(paymentStatusResponseContent)
      .containsIgnoringCase("healthy");
    ChatResponse changeDateResponse = chatClient.call(
      new Prompt(
        "When health status of the patient with id P005 was changed?",
        options));
    String changeDateResponseContent = changeDateResponse.getResult().getOutput().getContent();
    logger.info(changeDateResponseContent);
    Assertions.assertThat(paymentStatusResponseContent)
      .containsIgnoringCase("June 1, 2024");
}

In this case, we’ve specified two function names and sent two prompts. First, we asked about the health status of a patient. And then we asked when this status was changed. We’ve verified that the results contain the expected information. Besides that, we’ve logged all the responses and here’s what it looks like:

The patient with id P005 is currently healthy.
The health status of the patient with id P005 was changed on June 1, 2024.

4. Conclusion

Function calling is a great tool to extend the LLM functionality. We can also use it to integrate LLM with our logic.

In this tutorial, we explored how we can implement the LLM-based flow by calling one or multiple of our functions. Using this approach we can implement modern applications that are integrated with AI APIs.

As usual, the full source code can be found over on GitHub.

↧

A Guide to the @AutoClose Extension in JUnit5

July 13, 2024, 1:10 pm

≫ Next: Maven Spotless Plugin for Java

≪ Previous: Function Calling in Java and Spring AI Using the Mistral AI API

1. Overview

In this brief tutorial, we’ll explore the new @AutoClose JUnit 5 annotation, which helps us deal with classes that require a specific method call after test execution.

After that, we’ll learn how to use this extension to simplify our tests and remove boilerplate code from the @AfterAll block.

2. The @AutoClose Extension

In testing, there are scenarios where certain classes require specific actions to be performed after the test is completed. For example, this is often the case when we have test dependencies implementing the AutoCloseable interface. For demonstration purposes, let’s create our custom AutoCloseable class:

class DummyAutoCloseableResource implements AutoCloseable {
   
    // logger
   
    private boolean open = true;
    @Override
    public void close() {
        LOGGER.info("Closing Dummy Resource");
        open = false;
    }
}

When we finish running the tests, we close the resource using the @AfterAll block:

class AutoCloseableExtensionUnitTest {
    static DummyAutoCloseableResource resource = new DummyAutoCloseableResource();
    @AfterAll
    static void afterAll() {
        resource.close();
    }
    // tests
}

However, starting with JUnit5 version 5.11, we can use the @AutoClose extension to eliminate the boilerplate code. The extension is integrated into the JUnit5 framework, so we don’t need to add any special annotation at the class level. Instead, we can just annotate the field with @AutoClose:

class AutoCloseableExtensionUnitTest {
    @AutoClose
    DummyAutoCloseableResource resource = new DummyAutoCloseableResource();
    // tests
}

As we can see, this also removes the limitation of declaring the field static. Furthermore, the annotated field doesn’t necessarily have to implement the AutoCloseable interface. By default, the extension looks inside the annotated field and tries to find a method named “close“, but we can customize and point to a different function.

Let’s consider another use case, where we want to call the clear() method when we finished working with the resource:

class DummyClearableResource {
   
    // logger
    public void clear() {
        LOGGER.info("Clear Dummy Resource");
    }
}

In this case, we can use the annotation’s value to indicate which method needs to be called after all the tests:

class AutoCloseableExtensionUnitTest {
    @AutoClose
    DummyAutoCloseableResource resource = new DummyAutoCloseableResource();
    @AutoClose("clear")
    DummyClearableResource clearResource = new DummyClearableResource();
    // tests
}

3. Conclusion

In this short article, we discussed the new @AutoClose extension and used it for practical examples. We explored how it helps us keep tests concise and manage resources that need to be closed.

As usual, all code samples used in this article are available over on GitHub.

↧

Maven Spotless Plugin for Java

July 13, 2024, 1:13 pm

≫ Next: Evaluating H2 as a Production Database

≪ Previous: A Guide to the @AutoClose Extension in JUnit5

1. Overview

In this tutorial, we’ll explore the Maven Spotless Plugin, and use it to enforce a consistent code style across the project. Initially, we’ll use a minimal configuration to analyze the sourcode and address potential formatting violations. After that, we’ll gradually update the plugin’s configuration to use customized rules and execute these checks during a specific maven phase.

2. Getting Started

The Maven Spotless Plugin is a tool that automatically formats and enforces code style standards across various programming languages, during the build process. Getting started with Spotless is very easy, all we need to do is specify our preferred coding format in the spotless-maven-plugin.

Let’s start by adding the plugin to our pom.xml and configuring it to use the Google Java Style:

<plugin>
    <groupId>com.diffplug.spotless</groupId>
    <artifactId>spotless-maven-plugin</artifactId>
    <version>2.43.0</version>
    <configuration>
        <java>
            <googleJavaFormat/>
        </java>
    </configuration>
</plugin>

That’s it! We can now run “mvn spotless:check” and the plugin will automatically scan our Java files, and check if we use the correct formatting. In the console, we’ll see a summary of the files that were scanned, and how many of them have failures:

As we can see, if the plugin finds at least one formatting violation, the build will fail. If we scroll down, we’ll see a representation of the detected formatting issues. In this case, our code uses tabs, whereas the Google specification requires indentation blocks of two spaces:

Furthermore, Spotless will automatically fix all the violations, when we execute the command “mvn spotless::apply”. Let’s use the command to correct the violations, and compare the source code with the remote branch:

As we can notice, our source code was formatted correctly and it now complies with the Google Java standard.

3. Custom Formatting Rules

So far, we verified that our codebase uses consistent formatting, using a minimal configuration of the Spotless plugin. However, we can configure our own formatting rules using an Eclipse Formatter Profile. This profile is an XML file with a standardized structure that is compatible with a wide range of IDEs and formatting plugins.

Let’s add one of these files to the root folder of our project, we’ll call it baeldung-style.xml:

<profiles version="21">
    <profile kind="CodeFormatterProfile" name="baeldung-style" version="21">
        <setting id="org.eclipse.jdt.core.formatter.tabulation.char" value="space"/>
        <setting id="org.eclipse.jdt.core.formatter.use_tabs_only_for_leading_indentations" value="true"/>
        <setting id="org.eclipse.jdt.core.formatter.indentation.size" value="4"/>
        <!--   other settings...   -->
        <setting id="org.eclipse.jdt.core.formatter.enabling_tag" value="@formatter:on"/>
        <setting id="org.eclipse.jdt.core.formatter.disabling_tag" value="@formatter:off"/>
    </profile>
</profiles>

Now, let’s update the pom.xml and add our custom formatter profile. We’ll remove the <googleJavaFormat/> step and replace it with a <eclipse> formatter that uses the settings from our custom XML file:

<plugin>
    <groupId>com.diffplug.spotless</groupId>
    <artifactId>spotless-maven-plugin</artifactId>
    <version>2.43.0</version>
    <configuration>
        <java>
            <eclipse>
                <file>${project.basedir}/baeldung-style.xml</file>
            </eclipse>
        </java>
    </configuration>
</plugin>

That’s it! Now we can re-run “mvn spotless:check” to ensure the project follows our custom conventions.

4. Additional Steps

Apart from verifying if the code is formatted properly, we can also use Spotless to perform static analysis and apply small improvements. After we specify the preferred code style in the plugin configuration, we can follow up with additional steps:

<java>
    <eclipse>
        <file>${project.basedir}/baeldung-style.xml</file>
    </eclipse>
    <licenseHeader>
        <content>/* (C)$YEAR */</content>
    </licenseHeader>
    <importOrder/>
    <removeUnusedImports />
    <formatAnnotations />
</java>

When we execute spotless:apply, each “step” will verify and enforce a specific rule:

<licenseHeader> checks if the files contain the correct copyright header,
<importOrder> and <removeUnusedImports> make sure the imports are relevant and follow a consistent order,
<formatAnnotations> ensures that type annotations are positioned on the same line as the fields they describe;

If we run the command, we can expect all these changes to be automatically applied:

5. Binding to a Maven Phase

Until now, we only used the Spotless plugin by directly triggering the maven goals “spotless:check” and “spotless:apply“. However, we can also bind these goals to a specific Maven phase. Phases are predefined stages in the Maven build lifecycle that execute tasks in a particular order to automate the software build process.

For example, the “package” phase bundles the compiled code and other resources into a distributable format, such as “Jar” or “War” files. Let’s use this phase to integrate with the Spotless plugin and execute “spotless:check“:

<plugin>
    <groupId>com.diffplug.spotless</groupId>
    <artifactId>spotless-maven-plugin</artifactId>
    <version>2.43.0</version>
    
    <configuration>
        <java>
            <!--  formatter and additional steps  -->
        </java>
    </configuration>
    
    <executions>
        <execution>
            <goals>
                <goal>check</goal>
            </goals>
            <phase>package</phase>
        </execution>
    </executions>
</plugin>

Consequently, Spotless’ check goal will be automatically executed during Maven’s package phase. In other words, we can enforce a consistent code style by causing the Maven build to fail if the source code does not adhere to the specified format and style guidelines.

6. Conclusion

In this article, we learned about the Maven Spotless Plugin, initially using it to enforce Google’s Java Format for static code analysis on our project. Then, we transitioned to a custom Eclipse Formatter Profile with our custom formatting rules.

Apart from formatting, we explored other configurable steps that can improve our code, and perform minor refactorings. Lastly, we discussed binding Spotless goals to specific Maven phases to ensure a consistent code style is enforced throughout the build process.

As always, the code examples can be found over on GitHub.

↧