Baeldung

1. Overview

We often need to filter a List of objects by checking if any of their fields match a given String in many real-world Java applications. In other words, we want to search for a String and filter the objects that match in any of their properties.

In this tutorial, we’ll walk through different approaches to filtering a List by any matching fields in Java.

2. Introduction to the Problem

As usual, let’s understand the problem through examples.

Let’s say we have a Book class with fields like title, tags, intro, and pages:

class Book {
    private String title;
    private List<String> tags;
    private String intro;
    private int pages;
    
    public Book(String title, List<String> tags, String intro, int pages) {
        this.title = title;
        this.tags = tags;
        this.intro = intro;
        this.pages = pages;
    }
    // ... getter and setter methods are omitted
}

Next, let’s create four Book instances using the defined constructor and put them into a List<Book>:

static final Book JAVA = new Book(
  "The Art of Java Programming",
  List.of("Tech", "Java"),
  "Java is a powerful programming language.",
  400);
 
static final Book KOTLIN = new Book(
  "Let's Dive Into Kotlin Codes",
  List.of("Tech", "Java", "Kotlin"),
  "It is big fun learning how to write Kotlin codes.",
  300);
 
static final Book PYTHON = new Book(
  "Python Tricks You Should Know",
  List.of("Tech", "Python"),
  "The path of being a Python expert.",
  200);
 
static final Book GUITAR = new Book(
  "How to Play a Guitar",
  List.of("Art", "Music"),
  "Let's learn how to play a guitar.",
  100);
 
static final List<Book> BOOKS = List.of(JAVA, KOTLIN, PYTHON, GUITAR);

Now, we want to perform a filter operation on BOOKS to find all Book objects that contain a keyword String in the title, tags, or intro. In other words, we would like to execute a full-text search on BOOKS.

For example, if we want to search for “Java”, the JAVA and KOTLIN instances should be in the result. This is because JAVA.title contains “Java” and KOTLIN.tags contains “Java”.

Similarly, if we search for “Art“, we expect JAVA and GUITAR to be found since JAVA.title and GUITAR.tags contain the word “Art.” When “Let’s“ is the keyword, KOTLIN and GUITAR should be in the result as KOTLIN.title and GUITAR.intro contain the keyword.

Next, let’s explore how to solve this problem and use these three keyword examples to check our solutions.

For simplicity, we assume all Book‘s properties are not null, and we’ll leverage unit test assertions to verify whether our solutions work as expected.

Next, let’s dive into the code.

3. Using the Stream.filter() Method

Stream API provides the convenient filter() method, which allows us to filter objects in a Stream through a lambda expression easily.

Next, let’s solve the full-text search problem using this approach:

List<Book> fullTextSearchByLogicalOr(List<Book> books, String keyword) {
    return books.stream()
      .filter(book -> book.getTitle().contains(keyword) 
        || book.getIntro().contains(keyword) 
        || book.getTags().stream().anyMatch(tag -> tag.contains(keyword)))
      .toList();
}

As we can see, the above implementation is pretty straightforward. In the lambda expression that we pass to filter(), we check whether any property contains the keyword.

It’s worth mentioning that as Book.tags is a List<String>, we leverage Stream.anyMatch() to check if any tag in the List contains the keyword.

Next, let’s test whether this approach works correctly:

List<Book> byJava = fullTextSearchByLogicalOr(BOOKS, "Java");
assertThat(byJava).containsExactlyInAnyOrder(JAVA, KOTLIN);
 
List<Book> byArt = fullTextSearchByLogicalOr(BOOKS, "Art");
assertThat(byArt).containsExactlyInAnyOrder(JAVA, GUITAR);
 
List<Book> byLets = fullTextSearchByLogicalOr(BOOKS, "Let's");
assertThat(byLets).containsExactlyInAnyOrder(KOTLIN, GUITAR);

The test passes if we give it a run.

In this example, we only need to check three properties in the Book class. However, a full-text search might check a class’s dozen properties in an actual application. In this case, the lambda expression would be pretty long and can make the Stream pipeline difficult to read and maintain.

Of course, we can extract the lambda expression as a method to solve it. Alternatively, we can create a function to generate the String representation for full-text search.

Next, let’s take a closer look at this approach.

4. Creating a String Representation For Filtering

We know toString() returns a String representation of an object. Similarly, we can create a method to provide an object’s String representation for full-text search:

class Book {
    // ... unchanged codes omitted
    public String strForFiltering() {
        String tagsStr = String.join("\n", tags);
        return String.join("\n", title, intro, tagsStr);
    }
}

As the code above shows, the strForFiltering() function joins all full-text search required String values to a linebreak-separated String. If we take KOTLIN as an example, this method returns the following String:

String expected = """
  Let's Dive Into Kotlin Codes
  It is big fun learning how to write Kotlin codes.
  Tech
  Java
  Kotlin""";
assertThat(KOTLIN.strForFiltering()).isEqualTo(expected);

In this example, we use Java text block to present the multiline String.

Then, a full-text search would be an easy task for us. We just check if book.strForFilter()’s result contains keyword:

List<Book> fullTextSearchByStrForFiltering(List<Book> books, String keyword) {
    return books.stream()
      .filter(book -> book.strForFiltering().contains(keyword))
      .toList();
}

Next, let’s check if this solution works as expected:

List<Book> byJava = fullTextSearchByStrForFiltering(BOOKS, "Java");
assertThat(byJava).containsExactlyInAnyOrder(JAVA, KOTLIN);
 
List<Book> byArt = fullTextSearchByStrForFiltering(BOOKS, "Art");
assertThat(byArt).containsExactlyInAnyOrder(JAVA, GUITAR);
 
List<Book> byLets = fullTextSearchByStrForFiltering(BOOKS, "Let's");
assertThat(byLets).containsExactlyInAnyOrder(KOTLIN, GUITAR);

The test passes. Therefore, this approach does the job.

5. Creating a General Full-Text Search Method

In this section, let’s try to create a general method to perform a full-text search on any object:

boolean fullTextSearchOnObject(Object obj, String keyword, String... excludedFields) {
    Field[] fields = obj.getClass().getDeclaredFields();
    for (Field field : fields) {
        if (Arrays.stream(excludedFields).noneMatch(exceptName -> exceptName.equals(field.getName()))) {
            field.setAccessible(true);
            try {
                Object value = field.get(obj);
                if (value != null) {
                    if (value.toString().contains(keyword)) {
                        return true;
                    }
                    if (!field.getType().isPrimitive() && !(value instanceof String) 
                      && fullTextSearchOnObject(value, keyword, excludedFields)) {
                        return true;
                    }
                }
            } catch (InaccessibleObjectException | IllegalAccessException ignored) {
                //ignore reflection exceptions
            }
        }
    }
    return false;
}

The fullTextSearchOnObject() method accepts three parameters: the object we want to perform the search, the keyword, and the excluded field names.

The implementation uses reflection to retrieve all fields of the object. Then, we loop through the fields, skip excludedFields using Stream.nonMatch(), and obtain the field’s value by field.get(obj). Since we aim to perform a String-based search, we convert the value to a String using toString() and check if the field’s value contains the search term.

Our object may contain nested objects. Therefore, we recursively check the fields of nested objects. If any field is an object (other than a primitive or String), it calls fullTextSearchOnObject() on that field, enabling us to search through deeply nested structures.

Now, we can make use of fullTextSearchOnObject() to create a method to full-text filter a List of Book objects:

List<Book> fullTextSearchByReflection(List<Book> books, String keyword, String... excludeFields) {
    return books.stream().filter(book -> fullTextSearchOnObject(book, keyword, excludeFields)).toList();
}

Next, let’s run the same test to verify if this approach works as expected:

List<Book> byJava = fullTextSearchByReflection(BOOKS, "Java", "pages");
assertThat(byJava).containsExactlyInAnyOrder(JAVA, KOTLIN);
 
List<Book> byArt = fullTextSearchByReflection(BOOKS, "Art", "pages");
assertThat(byArt).containsExactlyInAnyOrder(JAVA, GUITAR);
 
List<Book> byLets = fullTextSearchByReflection(BOOKS, "Let's", "pages");
assertThat(byLets).containsExactlyInAnyOrder(KOTLIN, GUITAR);

As we can see, we passed “pages” as the excluded fields in the test above. If it’s required, we can conveniently extend excluded fields for a custom full-text search:

List<Book> byArtExcludeTag = fullTextSearchByReflection(BOOKS, "Art", "tags", "pages");
assertThat(byArtExcludeTag).containsExactlyInAnyOrder(JAVA);

This example showcases how to perform a full-text search only on title and intro fields. As GUITAR only contains the search keyword “Art” in tags, it gets filtered out.

Using reflection and recursion, we can implement a full-text search on a Java object that checks all fields, including nested fields, for a given String keyword. This approach allows us to dynamically search through an object’s fields without explicitly knowing the structure of the class.

6. Conclusion

In this article, we’ve explored different solutions to filtering a List by any field matching a String in Java.

These techniques will help us write cleaner, more maintainable code while providing a powerful way to search and filter Java objects.

As always, the complete source code for the examples is available over on GitHub.

1. Overview

Cross-Origin Resource Sharing (CORS) is a security mechanism for browser-based applications that allows a web page from one domain to access another domain’s resources. The browser implements the same-origin access policy to restrict any cross-origin application access.

Also, Spring provides first-class support for easily configuring CORS in any Spring, Spring Boot web, and Spring Cloud gateway application.

In this article, we’ll learn how to set up a Spring Cloud Gateway application with a backend API. Also, we’ll access the gateway API and debug a common CORS-related error.

Then, we’ll configure the Spring gateway API with Spring CORS support.

2. Implement the API Gateway With Spring Cloud Gateway

Let’s imagine we need to build a Spring Cloud gateway service to expose a backend REST API.

2.1. Implement the Backend REST API

Our backend application will have an endpoint to return User data.

First, let’s model the User class:

public class User {
    private long id;
    private String name;
    //standard getters and setters
}

Next, we’ll implement the UserController with the getUser endpoint:

@GetMapping(path = "/user/{id}")
public User getUser(@PathVariable("id") long userId) {
    LOGGER.info("Getting user details for user Id {}", userId);
    return userMap.get(userId);
}

2.2. Implement the Spring Cloud Gateway Service

Now let’s implement an API gateway service using the Spring Cloud Gateway support.

First, we’ll include the spring-cloud-starter-gateway dependency:

<dependency>
    <groupId>org.springframework.cloud</groupId>
    <artifactId>spring-cloud-starter-gateway</artifactId>
    <version>4.1.5</version
</dependency>

2.3. Configure the API Routing

We can expose the User service endpoint using the Spring Cloud Gateway routing option.

We’ll configure the predicates with the /user path and set the uri property with the backend URI http://<hostname>:<port>:

spring:
  cloud:
    gateway:
      routes:
        -  id: user_service_route
           predicates:
             - Path=/user/**
           uri: http://localhost:8081

3. Test the Spring Gateway API

Now we’ll test the Spring gateway service from the terminal with a cURL command and the browser window.

3.1. Testing the Gateway API With cURL

Let’s run both services, User and Gateway:

$ java -jar ./spring-backend-service/target/spring-backend-service-1.0.0-SNAPSHOT.jar

$ java -jar ./spring-cloud-gateway-service/target/spring-cloud-gateway-service-1.0.0-SNAPSHOT.jar

Now, let’s access the /user endpoint using the gateway service URL:

$ curl -v 'http://localhost:8080/user/100001'

< HTTP/1.1 200 OK
< Content-Type: application/json
{"id":100001,"name":"User1"}

As tested above, we’re able to get the backend API response.

3.2. Testing With the Browser Console

To experiment in a browser environment, we’ll open our frontend application e.g. https://www.baeldung.com, and use the browser’s supported developer tools option.

We’ll use the Javascript fetch function to call the API from a different origin URL:

fetch("http://localhost:8080/user/100001")

As we can see from the above, the API request failed due to the CORS error.

We’ll further debug the API request from the browser’s network tab:

OPTIONS /user/100001 HTTP/1.1
Access-Control-Request-Method: GET
Access-Control-Request-Private-Network: true
Connection: keep-alive
Host: localhost:8080
Origin: https://www.baeldung.com

Also, let’s verify the API response:

HTTP/1.1 403 Forbidden
...
content-length: 0

The above request failed because the web page URL’s scheme, domain, and port are different than the gateway API’s. The browser expects the Access-Control-Allow-Origin header to be included by the server, but instead gets an error.

By default, the Spring returns a Forbidden 403 error on the preflight OPTIONS request as the origin is different.

Next, we’ll fix the error by using the Spring Cloud gateway’s supported CORS configuration.

4. Configure CORS Policy in API Gateway

We’ll now configure the CORS policy to allow a different origin to access the gateway API.

Let’s configure the CORS access policy using the globalcors properties:

spring:
  cloud:
    gateway:
      globalcors:
        corsConfigurations:
          '[/**]':
            allowedOrigins: "https://www.baeldung.com"
            allowedMethods:
              - GET
            allowedHeaders: "*"

We should note that the globalcors properties would apply the CORS policy to all routing endpoints.

Alternatively, we can configure the CORS policy per API route:

spring:
  cloud:
    gateway:
      routes:
        -  id: user_service_route
           ....
           metadata:
             cors:
               allowedOrigins: 'https://www.baeldung.com,http://localhost:3000'
               allowedMethods:
                 - GET
                 - POST
               allowedHeaders: '*'

The allowedOrigins field can be configured as a specific domain name, or comma-separated domain names, or set as the * wildcard character to allow any cross-origin. Similarly, the allowedMethods and allowedHeaders properties can be configured with specific values or with the * wildcard.

Also, we can use an alternative allowedOriginsPattern configuration to provide more flexibility with the cross-origin pattern matching:

allowedOriginPatterns:
  - https://*.example1.com
  - https://www.example2.com:[8080,8081]
  - https://www.example3.com:[*]

In contrast to the allowedOrigins property, the allowedOriginsPattern allows the * wildcard character in any part of the URL including the scheme, domain name, and port number for pattern matching. In addition, we can specify the comma-separated port numbers within a bracket. However, the allowedOriginsPattern property does not support any regular expression.

Now, let’s re-verify the user API in the browser’s console window:

We’re now getting a HTTP 200 response from the API gateway.

Also, let’s confirm the Access-Control-Allow-Origin header in the OPTIONS API response:

HTTP/1.1 200 OK
...
Access-Control-Allow-Origin: https://www.baeldung.com
Access-Control-Allow-Methods: GET
content-length: 0

We should note that it’s recommended to configure a finite set of allowed origins to provide the highest level of security.

By default, the CORS specs don’t allow any cookie or CSRF token in any cross-origin request. However, we can enable it using the allowedCredentials property set as true. Also, the allowedCredentials do not work when used with the * wildcard in the allowedOrigins and allowedHeaders properties.

5. Conclusion

In this article, we’ve learned how to implement a Gateway service using the Spring Cloud gateway support. We’ve also encountered a usual CORS error while testing the API from a browser console.

Finally, we’ve demonstrated how to fix the CORS error by configuring the application with the allowedOrigins, and allowedMethods properties.

As always, the example code can be found over on GitHub.

1. Overview

In this tutorial, we’ll see how to mock nested method calls using Mockito stubs, specifically deep stubs. To learn more about testing with Mockito, check out our comprehensive Mockito series.

2. Explaining the Problem

In complex code, especially legacy code, it’s sometimes hard to initialize all the objects needed to unit test. It’s easy to introduce many dependencies that we don’t need in our tests. On the other hand, mocking those objects can lead to null pointer exceptions.

Let’s see a code example where we’ll explore the limitations of those two approaches.

For both, we’ll need some classes to test. First, let’s add a NewsArticle class:

public class NewsArticle {
    String name;
    String link;
    public NewsArticle(String name, String link) {
        this.name = name;
        this.link = link;
    }
// Usual getters and setters
}

Also, we’ll need a Reporter class:

public class Reporter {
    String name;
    NewsArticle latestArticle;
    public Reporter(String name, NewsArticle latestArticle) {
        this.name = name;
        this.latestArticle = latestArticle;
    }
// Usual getters and setters
}

And finally, let’s create a NewsAgency class:

public class NewsAgency {
    List<Reporter> reporters;
    public NewsAgency(List<Reporter> reporters) {
        this.reporters = reporters;
    }
    public List<String> getLatestArticlesNames(){
        List<String> results = new ArrayList<>();
        for(Reporter reporter : this.reporters){
            results.add(reporter.getLatestArticle().getName());
        }
        return results;
    }
}

Understanding the relationship between them is important. First, a NewsArticle is reported by a Reporter. And a Reporter works for a NewsAgency.

NewsAgency contains a method getLatestArticlesNames() that returns the names of the latest articles written by all the reporters working with the NewsAgency. This method will be subject to our unit testing.

Let’s take a first stab at this unit test by initializing all the objects.

3. Initializing Objects

In our test, as a first approach, we’ll initialize all the objects:

public class NewsAgencyTest {
    @Test
    void getAllArticlesTest(){
        String title1 = "new study reveals the dimension where the single socks disappear";
        NewsArticle article1 = new NewsArticle(title1,"link1");
        Reporter reporter1 = new Reporter("Tom", article1);
        String title2 = "secret meeting of cats union against vacuum cleaners";
        NewsArticle article2 = new NewsArticle(title2,"link2");
        Reporter reporter2 = new Reporter("Maria", article2);
        List<String> expectedResults = List.of(title1, title2);
        NewsAgency newsAgency = new NewsAgency(List.of(reporter1, reporter2));
        List<String> actualResults = newsAgency.getLatestArticlesNames();
        assertEquals(expectedResults, actualResults);
    }
}

We can see how the initialization of all objects can get tedious when our objects become more and more complex. As mocks serve exactly this purpose, we’ll use them to simplify and avoid cumbersome initializations.

4. Mocking Objects

Let’s use mocks to test the same method getLatestArticlesNames() :

    @Test
    void getAllArticlesTestWithMocks(){
        Reporter mockReporter1 = mock(Reporter.class);
        String title1 = "cow flying in London, royal guard still did not move";
        when(mockReporter1.getLatestArticle().getName()).thenReturn(title1);
        Reporter mockReporter2 = mock(Reporter.class);
        String title2 = "drunk man accidentally runs for mayor and wins";
        when(mockReporter2.getLatestArticle().getName()).thenReturn(title2);
        NewsAgency newsAgency = new NewsAgency(List.of(mockReporter1, mockReporter2));
        List<String> expectedResults = List.of(title1, title2);
        assertEquals(newsAgency.getLatestArticlesNames(), expectedResults);
    }

If we try to execute this test as is, we’ll receive a null pointer exception. The root cause is that the call to mockReporter1.getLastestArticle() returns null, which is an expected behavior: a mock is a nullified version of the object.

5. Using Deep Stubs

Deep stubs are an easy solution to mock nested calls. Deep stubs help us leverage the mocks and stubbing only the calls we need in our tests.

Let’s use it in our example, we’ll rewrite the unit test using mocks and deep stubs:

    @Test
    void getAllArticlesTestWithMocksAndDeepStubs(){
        Reporter mockReporter1 = mock(Reporter.class, Mockito.RETURNS_DEEP_STUBS);
        String title1 = "cow flying in London, royal guard still did not move";
        when(mockReporter1.getLatestArticle().getName()).thenReturn(title1);
        Reporter mockReporter2 = mock(Reporter.class, Mockito.RETURNS_DEEP_STUBS);
        String title2 = "drunk man accidentally runs for mayor and wins";
        when(mockReporter2.getLatestArticle().getName()).thenReturn(title2);
        NewsAgency newsAgency = new NewsAgency(List.of(mockReporter1, mockReporter2));
        List<String> expectedResults = List.of(title1, title2);
        assertEquals(newsAgency.getLatestArticlesNames(), expectedResults);
    }

Adding Mockito.RETURNS_DEEP_STUBS allowed us to access all nested methods and objects. In our code example, we did not need to mock multiple levels of objects in mockReporter1 to access mockReporter1.getLatestArticle().getName().

6. Conclusion

In this article, we learned how to use deep stubs to solve the issue of nested method calls with Mockito.

We should keep in mind that the necessity to use them is often a symptom of a violation of Demeter’s law, a guideline in object-oriented programming that favors low coupling and avoiding nested method calls. Therefore, deep stubs should be reserved for legacy code, and in clean modern code we should favor refactoring the nested calls.

The complete source code for the examples is available over on GitHub.

1. Introduction

In networking, retrieving the list of IP addresses connected within the same network (subnet) is essential for tasks such as network monitoring and device administration. Additionally, this helps identify active devices in a specific IP range and ensures they are reachable.

In this tutorial, we’ll explore various methods in Java to scan and retrieve a list of IP addresses within the same subnet. We’ll cover solutions using Java’s InetAddress class and enhancements using Java 8 Stream API. Finally, we’ll demonstrate more advanced subnet handling with the Apache Commons Net library.

2. Understanding IP Addresses and Subnets

An IP address uniquely identifies devices on a network, while a subnet groups a range of IP addresses together. Subnets allow networks to be divided into smaller, more manageable blocks, helping to improve performance and security.

A subnet is typically represented by an IP address and a subnet mask (for example, 192.168.1.0/24). The subnet mask defines which portion of the IP address represents the network and which portion identifies individual hosts.

For instance, the subnet 192.168.1.0/24 covers all addresses from 192.168.1.1 to 192.168.1.254. In this case, the first three octets (192.168.1) represent the network, and the last octet can be any number from 1 to 254, identifying the individual hosts.

The subnet mask 255.255.255.0 indicates that the first three sections of the IP address represent the network, while the last section varies for the host.

Let’s see how to determine the subnet dynamically in Java:

private String getSubnet() throws UnknownHostException {
    InetAddress localHost = InetAddress.getLocalHost();
    byte[] ipAddr = localHost.getAddress();
    return String.format("%d.%d.%d", (ipAddr[0] & 0xFF), (ipAddr[1] & 0xFF), (ipAddr[2] & 0xFF));
}

This method retrieves the local machine’s IP address, and we extract the subnet from the first three octets. The code dynamically calculates the subnet based on the environment, allowing it to adapt to different network configurations.

3. Using Java’s InetAddress Class

One of the simplest ways to check the reachability of devices in a network is by using Java’s InetAddress class. Furthermore, this class allows us to verify whether a device at a specific IP address is reachable within a given timeout period.

Once we determine the subnet dynamically, we can loop through the possible IP addresses within the subnet by appending numbers from 1 to 254 to the base address:

@Test
public void givenSubnet_whenScanningForDevices_thenReturnConnectedIPs() throws Exception {
    String subnet = getSubnet();
    List<String> connectedIPs = new ArrayList<>();
    for (int i = 1; i <= 254; i++) {
        String ip = subnet + "." + i;
        if (InetAddress.getByName(ip).isReachable(100)) {
            connectedIPs.add(ip);
        }
    }
    assertFalse(connectedIPs.isEmpty());
}

For each IP, we use the InetAddress.getByName() method to create an InetAddress object. Then, we check if it’s reachable using the isReachable() method. We add the IP address to the list if the device is reachable.

The list of IP addresses will vary depending on the devices currently connected to the same network.

3.1. Streamlining Subnet Scanning with Java 8 Stream API

Java 8 introduced the Stream API, which allows us to process collections and arrays in a concise and functional manner. Moreover, we can use this feature to perform subnet scanning in a streamlined way:

@Test
public void givenSubnet_whenUsingStream_thenReturnConnectedIPs() throws UnknownHostException {
    String subnet = getSubnet();
    List<String> connectedIPs = IntStream.rangeClosed(1, 254)
            .mapToObj(i -> subnet + "." + i)
            .filter(ip -> {
                try {
                    return InetAddress.getByName(ip).isReachable(100);
                } catch (Exception e) {
                    return false;
                }
            })
            .toList();
    assertFalse(connectedIPs.isEmpty());
}

Here, we use IntStream.rangeClosed(1, 254) to generate the range of possible IP addresses. We then use mapToObj() to append the generated number to the dynamically retrieved subnet and filter() to check if each IP is reachable.

While this doesn’t introduce new networking capabilities, it demonstrates how we can organize and streamline the solution using the Stream API, a powerful addition introduced in Java 8.

4. Advanced Subnet Handling with Apache Commons Net Library

For more advanced subnet management, we can use the Apache Commons Net library, which provides utilities to handle subnets easily. A use case for this library involves checking for open ports (such as port 80) on devices within a subnet using TelnetClient, a subclass of SocketClient provided by the Apache Commons Net library:

@Test
public void givenSubnet_whenCheckingForOpenPorts_thenReturnDevicesWithOpenPort() throws UnknownHostException {
    SubnetUtils utils = new SubnetUtils(getSubnet() + ".0/24");
    int port = 80;
    List<String> devicesWithOpenPort = Arrays.stream(utils.getInfo().getAllAddresses())
            .filter(ip -> {
                TelnetClient telnetClient = new TelnetClient();
                try {
                    telnetClient.setConnectTimeout(100);
                    telnetClient.connect(ip, port);
                    return telnetClient.isConnected();
                } catch (Exception e) {
                    return false;
                } finally {
                    try {
                        if (telnetClient.isConnected()) {
                            telnetClient.disconnect();
                        }
                    } catch (IOException ex) {
                        System.err.println(ex.getMessage());
                    }
                }
            })
            .toList();
    assertFalse(devicesWithOpenPort.isEmpty());
}

In this example, SubnetUtils generates all valid IP addresses within the subnet (such as 192.168.1.0/24). For each IP address, we attempt to connect to port 80 using the TelnetClient.connect() method. When the connection succeeds, the system adds the IP address to the list of devices with open ports. It then closes the connection in the finally block using telnetClient.disconnect() to ensure proper resource management.

5. Conclusion

In this tutorial, we’ve explored different ways to scan and retrieve a list of IP addresses connected within the same subnet using Java.

We used the InetAddress class for simplicity, Java 8’s Stream API for concise functional programming, and the Apache Commons Net library for more robust subnet handling and advanced tasks like port scanning.

As always, the complete code samples for this article can be found over on GitHub.

1. Overview

Despite the ability to format code within an Integrated Development Environment (IDE), we might want to use command-line formatters to automate the process of formatting code. Therefore, we can ensure the usage of the same code style even if there are many developers and the codebase is large.

In this tutorial, we’ll discuss formatting Java code from the command line. We’ll run the examples in Linux, but the formatters we’ll discuss are also available in other operating systems like Windows.

2. Sample Code

We’ll use an unformatted version of the simple “Hello World” program:

	public class HelloWorld 
{
 	   public static     void main(   String[]    args   )
	{
System.out.println(   
	"Hello World!")
 ;
    } }

There are several problems with the format of this code:

The code isn’t indented properly
There are superfluous whitespaces such as static void
The expression starting with System.out.println spans more than one line
The closing curly braces of the class and the method are on the same line

Additionally, we prefer a starting curly brace to be attached to the end of a line that starts a class or method, not to be on a new line.

First, let’s test whether we can compile and run the unformatted code:

$ javac HelloWorld.java
$ java HelloWorld
Hello World!

The program compiles and runs as expected.

3. Using astyle

astyle (Artistic Style) is a source code formatter that supports several languages, including Java. The version of astyle we’ll use is 3.6.3. Once we download it, we can use it to format HelloWorld.java:

$ astyle --squeeze-ws --style=java HelloWorld.java
Formatted  /home/baeldung/projects/formatter/HelloWorld.java

The –squeeze-ws option removes superfluous whitespaces. The –style=java option specifies using attached braces. Finally, we passed the input file, HelloWorld.java. We can pass multiple files. It can also process directories recursively.

Let’s check the content of the formatted HelloWorld.java:

$ cat HelloWorld.java
public class HelloWorld {
    public static void main( String[] args ) {
        System.out.println(
            "Hello World!")
        ;
    }
}

Now, the source code is formatted with proper indentation. astyle uses a 4-space indentation by default. There are no superfluous spaces. Each closing curly brace is on a new line.

However, the expression starting with System.out.println still occupies three lines.

astyle provides many more options that can customize the formatting of source code both in Java and other languages.

4. Using google-java-format

google-java-format is another option for formatting Java code from the command line. It formats source code using Google Java Style. We can also use it as a plugin in IDEs like IntelliJ and Eclipse.

The version of google-java-format we’ll use is 1.24.0. Once we download the corresponding jar file, we can run it using java:

$ java -jar ./google-java-format-1.24.0-all-deps.jar -r HelloWorld.java

The -r option specifies replacing the input file with the formatted version. Otherwise, google-java-format sends the output to stdout. We passed the input file to be formatted, HelloWorld.java, after the -r option. It’s possible to format more than one file. It can also process files in directories.

Let’s check the content of the formatted HelloWorld.java:

$ cat HelloWorld.java
public class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}

As is apparent from the output, google-java-format formatted the source code properly and eliminated all the formatting problems we listed before. Google Java Style uses a 2-space indentation by default.

google-java-format has many other options. We can list them using the -h option.

5. Using idea.sh format

IntelliJ is a popular IDE for developing Java applications. Normally, we use the shell script provided by IntelliJ, idea.sh, to launch the IDE. However, we can also format source code from the command line when we run it together with the format keyword, i.e., idea.sh format:

$ idea.sh format -allowDefaults HelloWorld.java
...
Formatting /home/baeldung/projects/formatter/HelloWorld.java...OK
1 file(s) scanned.
1 file(s) formatted.

We’ve truncated the output since it’s long. The command-line formatter launches an instance of the IntelliJ IDE and formats the source code. Therefore, it fails if we have another instance of IntelliJ running.

The -allowDefaults option uses the default code style settings. We passed the file to be formatted, HelloWorld.java, after this option. It’s possible to format multiple files and files in directories.

Let’s check the content of the formatted HelloWorld.java:

$ cat HelloWorld.java 
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println(
                "Hello World!")
        ;
    }
}

All the formatting problems seem to have gone except the statement starting with System.out.println. It still occupies three lines.

Using the -allowDefaults option is helpful when there’s no code style defined or the file doesn’t belong to a project. However, it’s also possible to specify other code style settings using the -s option. We can use the code style settings in the project directory in this case.

If we use another code style setting using the -s option, we can turn off keeping line breaks by setting the ij_java_keep_line_breaks option to false in the code style configuration file:

ij_java_keep_line_breaks = false

Besides the -allowDefaults and -s options, idea.sh format supports other options as well.

6. Using Eclipse’s Formatter Application

Eclipse is another popular IDE for developing Java applications. Just like IntelliJ, Eclipse also supports formatting source code from the command line:

$ eclipse -noSplash -data /home/baeldung/eclipse-workspace -application org.eclipse.jdt.core.JavaCodeFormatter -config org.eclipse.jdt.core.prefs HelloWorld.java
...
Configuration Name: org.eclipse.jdt.core.prefs
Starting format job ...
Done.

We’ve truncated the output as it’s long. The -noSplash option is for disabling the splash screen. The formatter requires a workspace, so we pass the workspace directory using the -data option. It’s /home/baeldung/eclipse-workspace in our example.

We specify to run the formatter using the -application org.eclipse.jdt.core.JavaCodeFormatter option. Running the formatter fails if we have another instance of Eclipse running.

The last option is the -config option. We use the -config option to specify the configuration file for the formatter application. This file can be created from the code formatter settings of a Java project within the Eclipse IDE. Another alternative is to copy and use an existing configuration file.

We used the org.eclipse.jdt.core.prefs configuration file in our example. We set the org.eclipse.jdt.core.formatter.tabulation.char parameter in the configuration file to space to use spaces instead of tabs for indentation:

org.eclipse.jdt.core.formatter.tabulation.char=space

Finally, we passed the file to be formatted, HelloWorld.java. We can specify multiple source files or directories.

Let’s check the content of the formatted HelloWorld.java:

$ cat HelloWorld.java
public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

As is apparent from the output, Eclipse’s formatter application formatted the source code properly and eliminated all the formatting problems we listed before.

7. Conclusion

In this article, we discussed formatting Java code from the command line in Linux. We used an unformatted version of the “Hello World” program in the examples.

Firstly, we examined astyle. It not only supports Java but also other languages like C and C++.

Secondly, we examined google-java-format, which formats Java code to comply with Google Java Style. It can be used as a command-line program and a plugin in several IDEs.

Then, we saw that the two popular IDEs, IntelliJ and Eclipse, also provide command-line tools to format Java code. We learned that we needed to execute the idea.sh format command to run the IntelliJ’s formatter. Similarly, we used the org.eclipse.jdt.core.JavaCodeFormatter application together with eclipse to run Eclipse’s code formatter from the command line.

1. Introduction

When working with Apache Avro in Java applications, we often need to convert Plain Old Java Objects (POJOs) to their Avro equivalent. While it’s perfectly acceptable to do this manually by setting each field individually, performing this conversion using generics is a better and more maintainable approach.

In this article, we’ll explore how to convert POJOs into Avro objects. We’ll approach this in a robust way to changes made to the original Java class structure.

2. The Straightforward Approach

Let’s say we have an area of code with a POJO that we want to convert to an Avro object.

Let’s see our POJO:

public class Pojo {
    private final Map<String, String> aMap;
    private final long uid;
    private final long localDateTime;
    public Pojo() {
        aMap = new HashMap<>();
        uid = ThreadLocalRandom.current().nextLong();
        localDateTime = LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
        aMap.put("mapKey", "mapValue");
    }
    //getters
}

Then, we have the class that does the mapping with its specific method:

public static Record mapPojoToRecordStraightForward(Pojo pojo){
    Schema schema = ReflectData.get().getSchema(pojo.getClass());
    GenericData.Record avroRecord = new GenericData.Record(schema);
    avroRecord.put("uid", pojo.getUid());
    avroRecord.put("localDateTime", pojo.getLocalDateTime());
    avroRecord.put("aMap", pojo.getaMap());
    return avroRecord;
}

As we can see, the straightforward approach involves explicitly setting each field. Just by looking at this solution, we can see the problems that could appear in the future. This solution is brittle and requires updates whenever the POJO structure changes. It is not the best solution.

Note that we can pull the schema from sources other than the POJO itself; for example, we could have also looked it up by schema version.

3. Generic Conversion Using Reflection

Another approach is to use Java Reflection. This method uses reflection and iterates over every field in the POJO. Next, it sets each field in the Avro Record.

Here’s what this would look like:

public static Record mapPojoToRecordReflection(Pojo pojo) throws IllegalAccessException {
    Class<?> pojoClass = pojo.getClass();
    Schema schema = ReflectData.get().getSchema(pojoClass);
    GenericData.Record avroRecord = new GenericData.Record(schema);
    for (Field field : pojoClass.getDeclaredFields()) {
        field.setAccessible(true);
        avroRecord.put(field.getName(), field.get(pojo));
    }

Afterwards, it goes through each superclass and sets those fields in the record:

    // Handle superclass fields
    Class<?> superClass = pojoClass.getSuperclass();
    while (superClass != null && superClass != Object.class) {
        for (Field field : superClass.getDeclaredFields()) {
            field.setAccessible(true);
            avroRecord.put(field.getName(), field.get(pojo));
        }
        superClass = superClass.getSuperclass();
    }
    return avroRecord;
}

Most importantly, this method is straightforward but is slower for large objects or if called frequently.

4. Using Avro’s ReflectDatumWriter Class

Avro has a built-in functionality for this scenario, the ReflectDatumWriter class. Initially, we generate an Avro schema from the POJO class. Next, we create a ReflectDatumWriter to serialize the POJO. Then, we set up a ByteArrayOutputStream and BinaryEncoder for writing:

public static GenericData.Record mapPojoToRecordReflectDatumWriter(Object pojo) throws IOException {
    Schema schema = ReflectData.get().getSchema(pojo.getClass());
    ReflectDatumWriter<Object> writer = new ReflectDatumWriter<>(schema);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);

Next, we serialize the POJO to binary format:

    writer.write(pojo, encoder);
    encoder.flush();

Finally, we create a BinaryDecoder to read the serialized data and use a GenericDatumReader to deserialize the binary data into a GenericData.Record:

    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
    GenericDatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
    return reader.read(null, decoder);
}

This method uses Avro’s serialization and deserialization features to convert the POJO to an Avro Record. Note that this conversion version is more efficient for complex objects but introduces complexity for simple ones.

5. Conclusion

In this article, we’ve explored different ways to convert POJO’s in Avro records in Java. We’ve started with a straightforward approach, which although simple, has disadvantages when it comes to maintainability and flexibility. Next, we’ve analyzed a solution using Java reflection. This is more robust and is easier to adapt to changes in class structure. However, it has performance issues for larger objects or frequent calls.

Finally, we’ve come up with a solution that uses ReflectDatumWriter class of Avro. This class is suited for this specific purpose and is the most appropriate choice for our needs. Furthermore, this benefits from Avro’s internal optimizations and is recommended for complex scenarios.

To sum up, it’s important to evaluate the specific context of our needs. This way, we can choose the approach that best fits our criteria for performance, maintainability, and scalability.

As always, the code is available over on GitHub.

1. Introduction

In this tutorial, we’ll explore the similarities and differences between SessionFactory and EntityManagerFactory.

As their names suggest, both are factory classes used to create objects for database communication. Beyond just creating objects, they offer additional features to help us interact with the database.

In the following sections, we’ll examine the distinctions between these two factory classes so we can have a better understanding of when to use them.

2. What Is EntityManagerFactory?

The Java Persistence API (JPA) serves as a specification for managing persistent data within Java applications. It provides a standard way to interact with relational databases. EntityManager, as a core interface of JPA, is used to interact with the persistence context and manage the lifecycle of entities. It provides lightweight instances with methods for basic CRUD operations.

That said, we notice that we’ll frequently require EntityManager instances, and that is where EntityManagerFactory will help us. EntityManagerFactory is a JPA interface that creates instances of EntityManager, enabling interaction with the persistence context in a thread-safe manner.

2.1. Setup Process

As a first step, let’s begin by defining an entity:

@Entity(name = "persons")
public class Person {
    @Id
    @GeneratedValue(strategy= GenerationType.IDENTITY)
    private Integer id;
    private String name;
    private String email;
    // omitted getters and setters
}

There are several ways to set up configuration, we’ll cover the approach by using the persistence.xml file. To begin, we need to create a new file inside the resource/META-INF folder and define connection details:

<persistence-unit name="com.baeldung.sfvsemf.persistence_unit" transaction-type="RESOURCE_LOCAL">
    <description>Persistence Unit for SessionFactory vs EntityManagerFactory code example</description>
    <class>com.baeldung.sfvsemf.entity.Person</class>
    <exclude-unlisted-classes>true</exclude-unlisted-classes>
    <properties>
        <property name="hibernate.hbm2ddl.auto" value="update"/>
        <property name="hibernate.show_sql" value="true"/>
        <property name="hibernate.generate_statistics" value="false"/>
        <property name="hibernate.dialect" value="org.hibernate.dialect.H2Dialect"/>
        <property name="jakarta.persistence.jdbc.driver" value="org.h2.Driver"/>
        <property name="jakarta.persistence.jdbc.url" value="jdbc:h2:mem:db2;DB_CLOSE_DELAY=-1"/>
        <property name="jakarta.persistence.jdbc.user" value="sa"/>
        <property name="jakarta.persistence.jdbc.password" value=""/>
    </properties>
</persistence-unit>

Note that for simplicity we are using H2 in-memory database in this example, but it’s not limited to it. Most relational databases will work the same way, we just need to ensure that the correct dialect and driver class are used.

2.2. Usage Example

With configuration completed, the process of creating EntityManager objects with EntityManagerFactory is simple.

Using EntityManagerFactory can be risky if not done properly as it’s expensive to create. In other words, EntityManagerFactory instantiation requires a lot of resources and because of that, it’s recommended to create them as a Singleton class.

While we won’t go into much detail to illustrate usage, we’ll cover basic operations with a code example. We won’t create singleton instead we’ll just instantiate EntityManagerFactory, create an EntityManager object, and then proceed to use it for simple database operations.

Let’s see how that looks in practice:

@Test
public void givenEntityManagerFactory_whenPersistAndFind_thenAssertObjectPersisted() {
    EntityManagerFactory entityManagerFactory =
      Persistence.createEntityManagerFactory("com.baeldung.sfvsemf.persistence_unit");
    EntityManager entityManager = entityManagerFactory.createEntityManager();
    try {
        entityManager.getTransaction().begin();
        Person person = new Person("John", "johndoe@email.com");
        entityManager.persist(person);
        entityManager.getTransaction().commit();
        Person persistedPerson = entityManager.find(Person.class, person.getId());
        assertEquals(person.getName(), persistedPerson.getName());
        assertEquals(person.getEmail(), persistedPerson.getEmail());
    } catch (Exception ex) {
        entityManager.getTransaction().rollback();
    } finally {
        entityManager.close();
        entityManagerFactory.close();
    }
}

3. What Is SessionFactory?

A popular ORM framework, Hibernate, uses SessionFactory as its factory class to create and manage Session instances. Same as EntityManagerFactory, the SessionFactory also offers a thread-safe way to handle database connections and CRUD operations.

In contrast, a Session is similar to an EntityManager as it interacts with a database, manages transactions, and handles the complete lifecycle of an entity.

3.1. Setup Process

Before we proceed with the setup process we’ll assume that we have a working knowledge of configuring and using Hibernate as we won’t go with an in-depth explanation in this article. If not, refer to our Hibernate-related articles to discover more.

To demonstrate how SessionFactory works, we utilize the same entity class as in the previous example. Hibernate primarily uses the hibernate.cfg.xml file for configuration, let’s add it to our resources:

<hibernate-configuration>
    <session-factory>
        <property name="hibernate.connection.driver_class">org.h2.Driver</property>
        <property name="hibernate.connection.url">jdbc:h2:mem:db2;DB_CLOSE_DELAY=-1</property>
        <property name="hibernate.connection.username">sa</property>
        <property name="hibernate.connection.password"></property>
        <property name="hibernate.dialect">org.hibernate.dialect.H2Dialect</property>
        <property name="hibernate.hbm2ddl.auto">update</property>
        <property name="hibernate.show_sql">true</property>
        <mapping class="com.baeldung.sfvsemf.entity.Person"/>
    </session-factory>
</hibernate-configuration>

3.2. Usage Example

Before using SessionFactory, it’s important to mention that, similarly to EntityManagerFactory it’s expensive to create. A general suggestion is to use them as singleton instances.

Once we configured SessionFactory, let’s check how to use it to create a Session instance and perform basic database operations:

@Test
void givenSessionFactory_whenPersistAndFind_thenAssertObjectPersisted() {
    SessionFactory sessionFactory = new Configuration().configure().buildSessionFactory();
    Session session = sessionFactory.openSession();
    Transaction transaction = null;
    try {
        transaction = session.beginTransaction();
        Person person = new Person("John", "johndoe@email.com");
        session.persist(person);
        transaction.commit();
        Person persistedPerson = session.find(Person.class, person.getId());
        assertEquals(person.getName(), persistedPerson.getName());
        assertEquals(person.getEmail(), persistedPerson.getEmail());
    } catch (Exception ex) {
        if (transaction != null) {
            transaction.rollback();
        }
    } finally {
        session.close();
        sessionFactory.close();
    }
}

4. Comparing EntityManagerFactory and SessionFactory

Both factories share several similarities and serve the same purpose. Their primary role is to provide instances for database communication. We should explore other similarities, differences, and use case scenarios where we can utilize them.

4.1. Key Similarities and Differences

In addition to the responsibility of creating session instances, there are other similarities:

Both provide supplementary query capabilities with CriteriaBuilder and HibernateCriteriaBuilder.
They support transactions helping us maintain data integrity.
We must manage them carefully because they are resource-intensive, thus it’s best to instantiate them once and reuse the instance.
Their thread-safe design allows for concurrent access.

If we look at their implementations, we notice that in fact, SessionFactory inherits the EntityManagerFactory. However, there are some key differences between the two. The main difference is that SessionFactory is a Hibernate-specific concept, while EntityManagerFactory is a standard JPA interface.

Another important difference is that Hibernate supports second-level caching. It operates at the SessionFactory level, allowing cached data to be shared across all sessions. This feature is specific to Hibernate and isn’t available in the JPA specification.

4.2. Use Case Comparison

EntityManagerFactory should be used when building applications that need to be vendor-independent, in other words, we can easily swap underlying providers (Hibernate, EclipseLink, OpenJpa, etc.). If we prefer Hibernate or some of its specific features, such as second-level caching and batch querying, we should use SessionFactory.

In summary, EntityManagerFactory tends to be more flexible and portable across various JPA implementations, whereas SessionFactory is tightly coupled to Hibernate.

For a better understanding, let’s do a side-by-side comparison.

Aspect	EntityManagerFactory	SessionFactory
Standardization	Part of the JPA specification	Specific to Hibernate
Caching	Supports first-level caching	Supports first and second-level caching
Query Language	Uses JPQL (Java Persistence Query Language)	Can use both JPQL and HQL (Hibernate Query Language) thus offering more flexibility in queries
Flexibility	It’s vendor-agnostic, meaning it works with any JPA-compliant framework	Works only within Hibernate
Use Case	When flexibility is important	Where we can utilize Hibernate’s features

5. Conclusion

In this article, we explored the setup and usage of the EntityManagerFactory and SessionFactory. We learned that both serve the essential purpose of creating session objects for database communication. It became clear that SessionFactory is Hibernate’s specific adaptation of the standard EntityManagerFactory.

In cases where we desire Hibernate features, it’s a good choice to use SessionFactory. However, for a more standardized approach, we should lean towards JPA specification meaning that EntityManagerFactory is a better option.

As always, full code examples are available over on GitHub.

1. Introduction

IntelliJ IDEA is a powerful IDE for Java and other JVM-based languages, offering numerous features to enhance productivity. One key feature allows IntelliJ IDEA to automatically build projects whenever changes occur, eliminating the need for manual compilation. This feature is particularly useful when working on large projects or needing continuous compilation for features like hot reload.

In this tutorial, we’ll explore how to enable automatic project builds in IntelliJ IDEA and integrate it with features like hot reload for faster development.

2. Building Our Project

By default, IntelliJ IDEA doesn’t build projects automatically after every code change. Instead, the IDE requires manual intervention to compile the code. Additionally, depending on the nature of our development tasks, we may need to compile files multiple times per session. We can either click the “Build Project” menu option or use hotkeys to compile files:

Compile individual files: We can press Ctrl + F9 to compile only the modified files. This is useful when working on specific modules or classes within larger projects:

Recompile the entire project: To ensure that all files, including those that depend on the current changes, are compiled, we use the shortcut Ctrl + Shift + F9. This triggers a full project recompilation:

3. Enabling Automatic Project Builds

However, IntelliJ IDEA can automatically build the project whenever changes occur without requiring us to manually trigger a build.

Moreover, enabling this feature can significantly streamline the development process, especially when combined with automated tests, live reloads, or other continuous integration tools.

3.1. Step 1: Open Build Settings

To enable automatic builds, we can change the settings in IntelliJ IDEA by navigating to File > Settings:

In the settings dialog, search for “Compiler” and select Build, Execution, Deployment > Compiler from the search results:

This brings us to the main settings page where we can manage how our project is built.

3.2. Step 2: Enable Automatic Build

Now that we are in the Compiler settings, check the option Build project automatically to enable automatic builds whenever changes are detected:

Build automatically ensures that IntelliJ keeps the compiled project up-to-date as we edit and save files. Specifically, we only need to ensure the “Build Automatically” checkbox is checked to enable this feature.

4. Conclusion

Enabling automatic project builds in IntelliJ IDEA is an easy way to reduce manual work during development, especially when used alongside hot reload tools. Moreover, the steps outlined in this article help us set up automatic builds and integrate them with real-time feedback mechanisms provided by IntelliJ’s notifications.

By continuously building the project as we code, we can ensure that our code is always compiled, reducing the risk of errors going unnoticed until later stages of development. This automatic building process helps improve development efficiency and allows for faster feedback cycles, making it easier to maintain code quality.

1. Overview

Logstash and Kafka are two powerful tools for managing real-time data streams. While Kafka excels as a distributed event streaming platform, Logstash is a data processing pipeline for ingesting, filtering, and forwarding data to various outputs.

In this tutorial, we’ll examine the difference between Kafka and Logstash in more detail and provide examples of their usage.

2. Requirements

Before learning the difference between Logstash and Kafka, let’s ensure we have a few prerequisites installed and basic knowledge of the technologies involved. First, we need to install Java 8 or later.

Logstash is part of the ELK stack (Elasticsearch, Logstash, Kibana) but can be installed and used independently. For Logstash, we can visit the official Logstash download page and download the appropriate package for our operating system (Linux, macOS, or Windows).

We also need to install Kafka and have confidence in our understanding of the publisher-subscriber model.

3. Logstash

Let’s look at the main Logstash components and a command-line example to process a log file.

3.1. Logstash Components

Logstash is an open-source data processing pipeline within the ELK Stack used to collect, process, and forward data from multiple sources. It’s composed of several core components that work together to collect, transform, and output data:

Inputs: These bring data into Logstash from various sources such as log files, databases, message queues like Kafka, or cloud services. Inputs define where the raw data comes from.
Filters: These components process and transform the data. Common filters include Grok for parsing unstructured data, mutate for modifying fields, and date for timestamp formatting. Filters allow for deep customization and data preparation before sending it to its final destination.
Outputs: After processing, outputs send the data to destinations such as Elasticsearch, databases, message queues, or local files. Logstash supports multiple parallel outputs, making it ideal for distributing data to various endpoints.
Codecs: Codecs encode and decode data streams, such as converting JSON to structured objects or reading plain text. They act as mini-plugins that process the data as it’s being ingested or sent out.
Pipelines: A pipeline is a defined data flow through inputs, filters, and outputs. Pipelines can create complex workflows, enabling data processing in multiple stages.

These components work together to make Logstash a powerful tool for centralizing logs, transforming data, and integrating with various external systems.

3.2. Logstash Example

Let’s give an example of how we process an input file to an output in JSON format. Let’s create an example.log input file in the /tmp directory:

2024-10-12 10:01:15 INFO User login successful
2024-10-12 10:05:32 ERROR Database connection failed
2024-10-12 10:10:45 WARN Disk space running low

We can then run the logstash -e command by providing a configuration:

$ sudo logstash -e '
input { 
  file { 
    path => "/tmp/example.log" 
    start_position => "beginning" 
    sincedb_path => "/dev/null" 
  } 
} 
filter { 
  grok { 
    match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:loglevel} %{GREEDYDATA:message}" }
  } 
  mutate {
    remove_field => ["log", "timestamp", "event", "@timestamp"]
  }
}
output { 
  file {
    path => "/tmp/processed-logs.json"
    codec => json_lines
  }
}'

Let’s explain the different parts of the configuration:

The whole chain of commands (input/filter/output) is a pipeline.
Extract timestamp, log level, and message fields from the logs with the grok filter.
Remove unnecessary info with a mutate filter.
Apply JSON format with Codec in the output filter.
After the input example.log file is processed, the output will be encoded in JSON format in the processed-log.json file.

Let’s see an output example:

{"message":["2024-10-12 10:05:32 ERROR Database connection failed","Database connection failed"],"host":{"name":"baeldung"},"@version":"1"}
{"message":["2024-10-12 10:10:45 WARN Disk space running low","Disk space running low"],"host":{"name":"baeldung"},"@version":"1"}
{"message":["2024-10-12 10:01:15 INFO User login successful","User login successful"],"host":{"name":"baeldung"},"@version":"1"}

As we can see, the output file is JSON with additional info, such as the @version, that we can use, for example, to document the change and ensure that any downstream processes (like querying in Elasticsearch) are aware of it to maintain data consistency.

4. Kafka

Let’s look at the main Kakfa component and a command-line example of publishing and consuming a message.

4.1. Kafka Components

Apache Kafka is an open-source distributed event streaming platform for building real-time data pipelines and applications.

Let’s look at its main components:

Topics and Partitions: Kafka organizes messages into categories called topics. Each topic is divided into partitions, which allow data to be processed on multiple servers in parallel. For example, in an e-commerce application, you might have separate topics for order data, payment transactions, and user activity logs.
Producers and Consumers: Producers publish data (messages) to Kafka topics, while consumers are applications or services that read and process these messages. Producers push data to Kafka’s distributed brokers, ensuring scalability, while consumers can subscribe to topics and read messages from specific partitions. Kafka guarantees that consumers read each message in order.
Brokers: Kafka brokers are servers that store and manage topic partitions. Multiple brokers comprise a Kafka cluster, distributing data and ensuring fault tolerance. If one broker fails, other brokers take over the data, providing high availability.
Kafka Streams and Kafka Connect: Kafka Streams is a powerful stream processing library that allows real-time data processing directly from Kafka topics. Thus, it enables applications to process and transform data on the fly, such as calculating real-time analytics or detecting patterns in financial transactions. On the other hand, Kafka Connect simplifies the integration of Kafka with external systems. It provides connectors for integrating databases, cloud services, and other applications.
ZooKeeper and KRaft: Traditionally, Kafka used ZooKeeper for distributed configuration management, including managing broker metadata and leader election for partition replication. With the introduction of KRaft (Kafka Raft), Kafka now supports ZooKeeper-less architectures, but ZooKeeper is still commonly used in many setups.

Together, these components enable Kafka to deliver a scalable, fault-tolerant, distributed messaging platform that can handle massive volumes of streaming data.

4.2. Kafka Example

Let’s create a topic, publish a simple “Hello, World” message, and consume it.

First, let’s create a topic. It can belong to multiple partitions and typically represents one subject of our domain:

$ /bin/kafka-topics.sh \
  --create \
  --topic hello-world \
  --bootstrap-server localhost:9092 \
  --partitions 1 \
  --replication-factor 1

We’ll get the message of the topic creation:

$ Created topic hello-world.

Let’s now try to send a message to the topic:

$ /bin/kafka-console-producer.sh \
  --topic hello-world \
  --bootstrap-server localhost:9092 \
  <<< "Hello, World!"

Now, we can consume our messages:

$ /bin/kafka-console-consumer.sh \
  --topic hello-world \
  --from-beginning \
  --bootstrap-server localhost:9092

We’ll get messages from the Kafka log storage for that specific topic by consuming them:

Hello, World!

5. Core Differences Between Logstash and Kafka

Logstash and Kafka are integral components of modern data processing architectures, each fulfilling distinct yet complementary roles.

5.1. Logstash

Logstash is an open-source data processing pipeline specializing in ingesting data, transforming it, and sending the results to various outputs. Its strength lies in its ability to parse and enrich data, making it ideal for processing log and event data.

For instance, a typical use case might involve a web application where Logstash ingests logs from multiple servers. Then, it applies filters to extract relevant fields such as timestamps and error messages. Finally, it forwards this enriched data to Elasticsearch for indexing and visualization in Kibana to monitor application performance and diagnose real-time issues.

5.2. Kafka

In contrast, Kafka is a distributed streaming platform that excels in handling high-throughput, fault-tolerant, and real-time data streaming. It functions as a message broker, facilitating the publishing of and subscribing to streams of records.

For example, in an e-commerce architecture, Kafka can capture user activity events from various services, such as website clicks, purchases, and inventory updates. These events can be produced into Kafka topics, allowing multiple downstream services (like recommendation engines, analytics platforms, and notification systems) to consume the data in real-time.

5.3. Differences

While Logstash focuses on data transformation, enriching raw logs, and sending them to various destinations, Kafka emphasizes reliable message delivery and stream processing, allowing real-time data flows across diverse systems.

Let’s look at the main differences:

Feature	Logstash	Kafka
Primary Purpose	Data collection, processing, and transformation pipeline for log and event data	Distributed message broker for real-time data streaming
Architecture	A plugin-based pipeline with inputs, filters, and outputs to handle data flow	Cluster-based, with Producers and Consumers interacting via Brokers and Topics
Message Retention	Processes data in real-time and generally does not store data permanently	Stores messages for a configurable retention period, enabling the replay of messages
Data Ingestion	Ingests data from multiple sources (logs, files, databases, and more) with multiple input plugins	Ingests large volumes of data from producers in a scalable, distributed way
Data Transformation	Powerful data transformation using filters like grok, mutate, and GeoIP	Limited data transformation (typically done in downstream systems)
Message Delivery Guarantee	Processes data in a flow; no built-in delivery semantics for message guarantees	Supports delivery semantics: at least once, at most, or exactly once
Integration Focus	Primarily integrates various data sources and forwards them to storage/monitoring systems like Elasticsearch, databases, or files	Primarily integrates distributed data streaming systems and analytics platforms
Typical Use Cases	Centralized logging, data parsing, transformation, and real-time systems monitoring	Event-driven architectures, streaming analytics, distributed logging, and data pipelines

Together, they enable organizations to build robust data pipelines that facilitate real-time insights and decision-making, demonstrating their critical roles in the evolving landscape of data architecture.

6. Can Logstash and Kafka Work Together?

Logstash and Kafka can seamlessly collaborate to create a robust data processing pipeline, combining their strengths to enhance data ingestion, processing, and delivery.

6.1. From Logstash

For example, Logstash can act as a data collector and processor that ingests various data sources, such as logs, metrics, and events, and then transforms this data to fit specific formats or schemas. For instance, in a microservices architecture, Logstash can collect logs from various microservices, apply filters to extract pertinent information, and then forward the structured data to Kafka topics for further processing.

6.2. To Kafka

Once the data is in Kafka, it can be consumed by multiple applications and services that require real-time processing and analytics. For example, a financial institution may use Kafka to stream transaction data from its payment processing system, which various applications — including fraud detection systems, analytics platforms, and reporting tools — can consume.

6.3. LogStash With Kafka

Logstash facilitates the initial ingestion and transformation of logs and events. At the same time, Kafka is a scalable, fault-tolerant messaging backbone that ensures reliable data delivery across the architecture.

By integrating Logstash and Kafka, organizations can build robust and flexible data pipelines that efficiently handle high volumes of data, enabling real-time analytics and insights. This collaboration allows data ingestion to be decoupled from processing, fostering scalability and resilience within their data architecture.

7. Conclusion

In this tutorial, we saw how Logstash and Kafka work by providing architectural and command-line examples. We saw their main usage and described for which practical usage each is best by describing their main components. Finally, we saw the main differences between these two systems and how they can work together.

1. Overview

In this tutorial, we’ll learn the basics of Apache Commons Validator, a powerful library from Apache Commons. It simplifies data validation in Java applications. Firstly, we will understand its use cases. Next, we’ll cover how to set it up. Subsequently, we will learn to use its built-in validators and explore practical use cases like form validation and API input validation.

2. What Is Apache Commons Validator?

Apache Commons Validator is a Java library that validates input data against common constraints. In particular, it provides out-of-the-box support for various types of validation, including email, URL, and date validations. We use it to avoid reinventing validation logic and ensure consistency across our application.

2.1. Why Use a Validation Library?

Data validation is error-prone and tedious, moreover, this is especially true when dealing with complex constraints. Consequently, a dedicated validation library like Apache Commons Validator saves time by providing pre-built validators for common use cases. In addition, not only does it promote cleaner code, but it also ensures that inputs are consistently checked. In the long run, that improves the overall security and integrity of the applications.

2.2. Real-World Use Cases for Apache Commons Validator

APIs need robust validation to ensure they only accept correctly formatted data. With this in mind, Apache Commons Validator allows us to validate input payloads quickly. Moreover, we don’t have to write custom validation logic from scratch. This is especially useful when clients send data over HTTP requests. In reality, in these scenarios, inputs like URLs, numeric values, credit card numbers, or postal codes require validation before further processing.

Validation also plays a critical role in ensuring the security and integrity of data within an application. Improperly validated data can lead to security vulnerabilities like SQL injection or cross-site scripting (XSS). We use it to reduce such risks by applying strict input validation rules. It is important to realize that it ensures only valid data is allowed into the system and safeguards against malicious entries.

3. Maven Dependencies

We need to add the commons-validator dependency to our Maven pom.xml:

<dependency>
    <groupId>commons-validator</groupId>
    <artifactId>commons-validator</artifactId>
    <version>1.9.0</version>
</dependency>

4. Built-in Validators

The reusable validators are in the org.apache.commons.validator.routines package. Commons Validator serves two purposes: providing standard, independent validation routines/functions and offering a mini framework for validation. This package has been created to separate these two concerns since version 1.3.0. It’s the location for the standard, independent validation routines/functions in Commons Validator. First, the contents of this package do not depend on the framework aspect of Commons Validator. Second, we can use these on their own.

Let’s get familiar with some of the validator routines.

5. Date and Time Validators

The date and time validators can validate based on a specified format or use a standard format for a specified Locale. There are three options available:

Date Validator – validates dates and converts them to a java.util.Date type.
Calendar Validator – validates dates and converts them to a java.util.Calendar type.
Time Validator – validates times and converts them to a java.util.Calendar type.

5.1. Validating a Date Value

Let’s learn to use a date validator. First, let’s see how to use the validate() method. It returns the date if the input is valid else it returns null.

@Test
void givenDate_whenValidationIsCalled_thenChecksDate() {
    DateValidator validator = DateValidator.getInstance();
    String validDate = "28/01/2024";
    String invalidDate = "28/13/2024";
    assertNotNull(validator.validate(validDate, "dd/MM/yyyy"));
    assertTrue(validator.isValid(validDate, "dd/MM/yyyy"));
    assertNull(validator.validate(invalidDate, "dd/MM/yyyy"));
    assertFalse(validator.isValid(invalidDate, "dd/MM/yyyy"));
    GregorianCalendar gregorianCalendar = new GregorianCalendar(2024, Calendar.JANUARY, 28, 10, 30);
    gregorianCalendar.setTimeZone(TimeZone.getTimeZone("GMT"));
    Date date = gregorianCalendar.getTime();
    assertEquals("28-Jan-2024", validator.format(date, "dd-MMM-yyyy"));
    TimeZone timeZone = TimeZone.getTimeZone("GMT+5");
    assertEquals("28/01/2024 15:30", validator.format(date, "dd/MM/yyyy HH:mm", timeZone));
}

This test verifies the DateValidator by checking valid and invalid date formats. First, it uses validate(). It returns a date if the input is a valid Date otherwise it returns null. Then, it uses isValid(). It returns a true if the input is a valid Date otherwise it returns false. After that, it formats a GregorianCalendar date to a string. Finally, it confirms the correct output, accounting for time zone adjustments.

6. Numeric Validators

The numeric validators validate according to a specified format. These either use a standard or a custom format for a specified Locale. In particular, they offer validators for different numeric data types. They offer the following validators: Byte Validator, Short Validator, Integer Validator, Long Validator, Float Validator, Double Validator, BigInteger Validator, and BigDecimal Validator.

6.1. Validating a Numeric Value

Generally, we use the IntegerValidator to validate numeric values.

@Test
void givenNumericString_whenValidationIsCalled_thenReturnsNumber() {
    IntegerValidator validator = IntegerValidator.getInstance();
    String pattern = "00000";
    int number = 1234;
    
    String formattedNumber = validator.format(number, pattern, Locale.US);
    
    assertEquals(number, validator.validate(formattedNumber, pattern));
    assertNotNull(validator.validate("123.4", Locale.GERMAN));
}

This test checks IntegerValidator functionality by formatting a number with a pattern and then validating it. First, it confirms that the formatted number matches the original. After that, it tests locale-specific validation, ensuring the validator correctly interprets numeric strings in different locales.

7. Currency Validators

The default implementation converts currency amounts to a java.math.BigDecimal. Additionally, it provides lenient currency symbol validation, meaning that currency amounts are valid with or without the symbol.

@Test
void givenCurrencyString_whenValidationIsCalled_thenReturnsCurrency() {
    BigDecimalValidator validator = CurrencyValidator.getInstance();
    
    assertEquals(new BigDecimal("1234.56"), validator.validate("$1,234.56", Locale.US));
    assertEquals("$1,234.56", validator.format(1234.56, Locale.US));
}

This test validates that CurrencyValidator correctly parses a U.S. currency string into a BigDecimal value and formats a numeric value back into the U.S. currency format. It ensures proper handling of locale-specific currency strings.

8. Other Validators

There are many such validators routines available from Apache Commons:

Regular Validators allow us to validate input using Java 1.4+ regular expression support, giving us the flexibility to define complex patterns for validation.
Check Digit routines assist us in validating and calculating check digits for various types of codes, such as EAN/UPC, credit card numbers, and ISBNs.
Code Validators provide comprehensive code validation, including checking the format, enforcing minimum and maximum length requirements, and validating check digits.
ISBN Validators help validate ISBN-10 and ISBN-13 formats, ensuring that the provided ISBNs are accurate.
IP Address Validators enable the validation of IPv4 addresses, ensuring they conform to the correct format and structure.
Email Address Validators provide robust validation for email addresses, ensuring they adhere to industry standards for correct formatting and structure.
URL Validators help validate URLs based on their scheme, domain, and authority, ensuring they are correctly formatted and valid.
Domain Name Validators validate domain names and check them against the official IANA TLD list, ensuring they are properly formatted and within the valid TLDs.

9. Conclusion

In this tutorial, first, we explored the Apache Commons Validator library, focusing on practical examples for validating dates, numbers, and currencies. Then, we demonstrated using specific validators like DateValidator, IntegerValidator, and CurrencyValidator through code snippets. Finally, we briefly introduced other available validators, showcasing the library’s versatility for common data validation needs.

As usual, the full source code can be found over on GitHub.

1. Introduction

In this tutorial, we’ll explore how to extract the schema from an Apache Avro file in Java. Furthermore, we’ll cover how to read data from Avro files. This is a common requirement in big data processing systems.

Apache Avro is a data serialization framework that provides a compact, fast binary data format. As such, it’s popular in the big data ecosystem, particularly with Apache Hadoop. Therefore, understanding how to work with Avro files is crucial for tasks involving data processing.

2. Maven Dependencies

To get Avro up and running in Java, we need to add the Avro core library to our Maven project:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.12.0</version>
</dependency>

For testing purposes, we’ll use JUnit Jupiter. If we’re using Spring Boot Starter Test dependency, we don’t have to add the JUnit one. This module automatically brings it. As a side note, this module also brings the Mockito framework.

For JUnit, let’s use the latest available version:

<dependency>
    <groupId>org.junit.jupiter</groupId>
    <artifactId>junit-jupiter-api</artifactId>
    <version>5.11.2</version>
    <scope>test</scope>
</dependency>

Whenever we start a new project, it’s good to make sure we’re using the latest stable versions of the respective dependencies.

3. Understanding and Extracting Avro Schema

Before we dive into the code for extracting schemas, let’s briefly recap the structure of an Avro file:

File header – contains metadata about the file, including the schema.
Data blocks – the actual serialized data.
File footer – contains additional metadata and synchronization markers.

The schema of an Avro file describes the structure of the data inside it. In addition, the data is stored in JSON format and includes information about fields, their names, and data types.

Now, let’s write a method to extract the schema from an Avro file:

public static Schema extractSchema(String avroFilePath) throws IOException {
    File avroFile = new File(avroFilePath);
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader)) {
        return dataFileReader.getSchema();
    }
}

First, we create a File object representing the Avro file. Next, we instantiate a GenericDatumReader. Instantiating this class without specifying a schema allows it to read any Avro file.

Next, we create a DataFileReader using the Avro file and the GenericDatumReader as arguments.

We use the getSchema() method of DataFileReader to extract the schema. The DataFileReader is wrapped in a try-with-resources block to ensure proper resource management.

This approach allows us to extract the schema without needing to know its structure beforehand. This way, it’s a versatile option for working with various Avro files.

4. Reading Data from Avro File

Once we have obtained the schema, we can read the data from the Avro file.

Let’s write a reading method:

public static List<GenericRecord> readAvroData(String avroFilePath) throws IOException { 
    
    File avroFile = new File(avroFilePath);
    DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
    List<GenericRecord> records = new ArrayList<>();
    
    try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(avroFile, datumReader)) {
        GenericRecord record = null;
        while (dataFileReader.hasNext()) {
            record = dataFileReader.next(record);
            records.add(record);
        }
    }
    return records;
}

First, we create a File from the avroFilePath. Next, we create a GenericDatumReader object, which is used to read Avro data. By creating it without specifying a schema, it can read any Avro file without knowing the schema in advance.

Then, we create a DataFileReader which is the main tool we’ll use to extract information from the Avro file. Finally, we iterate through the file using the hasNext() and next() methods and add the records to the list.

In addition, it’s good to note that we’re reusing the GenericRecord object in the next() method call. This is an optimization that helps reduce object creation and garbage collection overhead.

5. Testing

To make sure our code works correctly, let’s write some unit tests. To start with our setup, let’s create a tempDir. Using the @TempDir annotation Junit automatically creates a temporary directory for use in tests.

As such, this is useful for creating temporary files during tests without worrying about cleanup. JUnit creates it before tests run and deletes it after:

@TempDir
Path tempDir;
private File avroFile;
private Schema schema;

Next, we’re going to set up some things before each test:

@BeforeEach
void setUp() throws IOException {
    schema = new Schema.Parser().parse("""
                                    {
                                        "type": "record",
                                        "name": "User",
                                        "fields": [
                                            {"name": "name", "type": "string"},
                                            {"name": "age", "type": "int"}
                                        ]
                                    }
                                    """);
    avroFile = tempDir.resolve("test.avro").toFile();
    GenericRecord user1 = new GenericData.Record(schema);
    user1.put("name", "John Doe");
    user1.put("age", 30);
    try (DataFileWriter<GenericRecord> dataFileWriter = 
      new DataFileWriter<>(new GenericDatumWriter<>(schema))) {
        dataFileWriter.create(schema, avroFile);
        dataFileWriter.append(user1);
    }
}

Finally, let’s test our functionality:

@Test
void whenSchemaIsExistent_thenItIsExtractedCorrectly() throws IOException {
    Schema extractedSchema = AvroSchemaExtractor.extractSchema(avroFile.getPath());
    assertEquals(schema, extractedSchema);
}

@Test
void whenAvroFileHasContent_thenItIsReadCorrectly() throws IOException {
    List<GenericRecord> records = AvroSchemaExtractor.readAvroData(avroFile.getPath());
    assertEquals("John Doe", records.get(0).get(0).toString());
}

These tests create a temporary Avro file with a sample schema and data. Then, they verify that our methods correctly extract the schema and read the data.

6. Conclusion

In this article, we’ve explored how to extract the schema from an Avro file and read its data using Java. In addition, we’ve demonstrated how to use GenericDatumReader and DataFileReader to handle Avro files without prior knowledge of the schema.

Furthermore, these techniques are crucial for working with Avro in various Java applications, such as data analytics or big data processing. By applying these methods we can manage Avro files in a flexible way.

Finally, we should remember to correctly handle exceptions and manage resources properly in our projects. This way, we’ll be able to work with serialized data in an efficient way, especially in Avro-centric ecosystems.

As always, the code is available over on GitHub.

1. Spring and Java

>> Exploring New Features in JDK 23: Just-Write-And-Run prototyping with JEP-477 not only for beginners [foojay.io]

Quick prototyping made much easier with JDK 23. A long time coming

>> Advancing AI by Accelerating Java on Parallel Architectures [inside.java]

A solid reading to understand just how dynamic the Java platform is. Low level but well worth it.

>> JEP targeted to JDK 24: 484: Class-File API [openjdk.org]

And an interesting API addition in the upcoming JDK 24. Not that far away, actually.

Also worth reading:

>> The Art of Full Stack Debugging [foojay.io]
>> Deep dive into bits, bytes, shorts, ints, longs, signed, and unsigned with Java [foojay.io]
>> Using React components in a Spring Boot Thymeleaf project [wimdeblauwe.com]
>> Leverage the Power of 45k, free, Hugging Face Models with Spring AI and Ollama [spring.io]
>> Profile IntelliJ IDEA with its own profiler [foojay.io]

Webinars and presentations:

>> A Bootiful Podcast: GraalVM founder and BDFL Thomas Wuerthinger on GraalPy, GraalVM, and so much more [spring.io]
>> Foojay Podcast #59: DevRel Explained and How to Become a Conference Speaker [foojay.io]
>> A Sneak Peek at StableValue and SegmentMapper in Java [dev.java]
>> A Bootiful Podcast: Oracle Developer Advocate Andres Almiray [spring.io]
>> A Preview of What’s Coming in Project Leyden – Inside Java Newscast #78 [inside.java]

Time to upgrade:

>> Spring Boot 3.3.5 available now [spring.io]
>> JDK 23.0.1, 21.0.5, 17.0.13, 11.0.25, and 8u431 Have Been Released [oracle.com]
>> Spring Framework 6.1.14 Available Now [spring.io]
>> Spring Data 2024.0.5 and 2023.1.11 released [spring.io]
>> Spring Security 6.3.4, 6.2.7 and 5.8.15 are available now [spring.io]
>> GraalVM for JDK 23 Community 23.0.1 [github.com]
>> 3.16.0.CR1 [github.com]
>> Elasticsearch 7.17.25 [github.com]
>> Spring Tools 4.26.0 released [spring.io]
>> Spring REST Docs 3.0.2 [spring.io]
>> WildFly 34 is released! [wildfly.org]

2. Technical & Musings

>> Benchmarking LLM for business workloads [abdullin.com]

An interesting, boots-on-the-ground read comparing AI models in practical scenarios.

Also worth reading:

>> Our Approach to Architecture [scottlogic.com]
>> How I’d Like Automation Engineers to Support Delivering Features [jbrains.ca]
>> Summary of the AJAX frameworks comparison [frankel.ch]
>> Tutorial: Jextract – The Native Library Binding Extraction Tool [dev.java]
>> Into The Multi-cloud [scottlogic.com]

3. Pick of the Week

>> Software Engineer Titles Have (Almost) Lost All Their Meaning [trevorlasn.com]

1. Introduction

Yavi is a Java validation library that allows us to easily and cleanly ensure that our objects are in a valid state.

Yavi is an excellent lightweight choice for object validation within Java applications. It doesn’t rely on reflection or adding additional annotations to the objects being validated, so it can be used entirely separately from the classes we wish to validate. It also emphasizes a type-safe API, ensuring we can’t accidentally define impossible validation rules. In addition, it has full support for any types that we can define in our application, as well as having a large range of predefined constraints to rely on whilst still allowing us to easily define our own where necessary.

In this tutorial, we’re going to have a look at Yavi. We’ll see what it is, what we can do with it, and how to use it.

2. Dependencies

Before using Yavi, we need to include the latest version in our build, which is 0.14.1 at the time of writing.

If we’re using Maven, we can include this dependency in our pom.xml file:

<dependency>
    <groupId>am.ik.yavi</groupId>
    <artifactId>yavi</artifactId>
    <version>0.14.1</version>
</dependency>

At this point, we’re ready to start using it in our application.

3. Simple Validations

Once we’ve got Yavi available in our project, we’re ready to start using it to validate our objects.

The simplest validators that we can build are for simple value types such as String or Integer. Each of the supported types is constructed using a builder class found in the am.ik.yavi.builder package:

StringValidator<String> validator = StringValidatorBuilder.of("name", c -> c.notBlank())
  .build();

This constructs a validator that we can use to validate String instances to ensure they’re not blank. The first parameter to the builder is a name for the value we’re validating, and the second is a lambda that defines the validation rules to apply.

More often though, we want to validate an entire bean, not just a single value. Validators for these are built using the ValidatorBuilder instead, to which we can add multiple rules for different fields:

public record Person(String name, int age) {}
Validator<Person> validator = ValidatorBuilder.of(Person.class)
  .constraint(Person::name, "name", c -> c.notBlank())
  .constraint(Person::age, "age", c -> c.positiveOrZero().lessThan(150))
  .build();

Every constraint() call adds a constraint to our validator for a different field. The first parameter to this is the getter for the field, which must be a method on the type we’re validating. The third parameter is then a lambda for defining the constraint, the same as before. Yavi ensures that this is suitable for the return type of our getter method. For example, we can use notBlank() on a String field but not an integer field.

Once we’ve got our validator, we can use it to validate appropriate objects:

ConstraintViolations result = validator.validate(new Person("", 42));
assertFalse(result.isValid());

This returned ConstraintViolations object tells us whether or not the provided object is valid, and if it’s invalid we can see what the actual violations are:

assertEquals(1, result.size());
assertEquals("name", result.get(0).name());
assertEquals("charSequence.notBlank", result.get(0).messageKey());

Here we can see that the name field is invalid and that the violation is because it shouldn’t be blank.

3.1. Validating Nested Objects

Often our beans that we want to validate have other beans within them, and we want to ensure these are also valid. We can achieve this using the nest() method of our builder instead of the constraint() call:

public record Name(String firstName, String surname) {}
public record Person(Name name, int age) {}
Validator<Name> nameValidator = ValidatorBuilder.of(Name.class)
  .constraint(Name::firstName, "firstName", c -> c.notBlank())
  .constraint(Name::surname, "surname", c -> c.notBlank())
  .build();
Validator<Person> personValidator = ValidatorBuilder.of(Person.class)
  .nest(Person::name, "name", nameValidator)
  .constraint(Person::age, "age", c -> c.positiveOrZero().lessThan(150))
  .build();

Once defined, we can use this the same as before. Now though, Yavi will automatically compose the names of any violations using dotted notation so that we can see exactly what’s happened:

assertEquals(2, result.size());
assertEquals("name.firstName", result.get(0).name());
assertEquals("name.surname", result.get(1).name());

Here we’ve got our two expected violations – one on name.firstName and the other on name.surname. These tell us that the fields in question are nested within the name field of the outer object.

3.2. Cross-Field Validations

In some cases, we can’t validate a single field in isolation. The validation rules might depend on the values of other fields in the same object. We can achieve this using the constraintOnTarget() method, which validates the provided object and not a single field of it:

record Range(int start, int end) {}
Validator<Range> validator = ValidatorBuilder.of(Range.class)
  .constraintOnTarget(range -> range.end > range.start, "end", "range.endGreaterThanStart",
    "\"end\" must be greater than \"start\"")
  .build();

In this case, we’re ensuring that the end value of our range is greater than the start value. When doing this, we need to provide some extra values since we’re effectively creating a custom constraint.

Unsurprisingly, using this validator is the same as before. However, because we’ve defined the constraint ourselves we’ll get our custom values through in the violation:

assertEquals(1, result.size());
assertEquals("end", result.get(0).name());
assertEquals("range.endGreaterThanStart", result.get(0).messageKey());

4. Custom Constraints

Most of the time, Yavi provides us with all the constraints that we need for validating our objects. However, in some cases, we might need something that isn’t covered by the standard set.

We saw earlier an example of writing a custom constraint inline by providing a lambda. We can do similar within our constraint builder to define a custom constraint for any field:

Validator<Data> validator = ValidatorBuilder.of(Data.class)
  .constraint(Data::palindrome, "palindrome",
    c -> c.predicate(s -> validatePalindrome(s), "palindrome.valid", "\"{0}\" must be a palindrome"))
  .build();

Here we’ve used the predicate() method to provide our lambda, as well as giving it a message key and default message. This lambda can do anything we want, as long as it fits the definition of java.util.function.Predicate<T>. In this case, we’re using a function that checks if a string is a palindrome or not.

Sometimes though we might want to write our custom constraint in a more reusable manner, we’re able to do this by creating a class that implements the CustomConstraint interface:

class PalindromeConstraint implements CustomConstraint<String> {
    @Override
    public boolean test(String input) {
        String reversed = new StringBuilder()
          .append(input)
          .reverse()
          .toString();
        return input.equals(reversed);
    }
    @Override
    public String messageKey() {
        return "palindrome.valid";
    }
    @Override
    public String defaultMessageFormat() {
        return "\"{0}\" must be a palindrome";
    }
}

Functionally this is the same as our lambda, only as a class we can more easily reuse it between validators. In this case, we need only pass an instance of this to our predicate() call and everything else is configured for us:

Validator<Data> validator = ValidatorBuilder.of(Data.class)
  .constraint(Data::palindrome, "palindrome", c -> c.predicate(new PalindromeConstraint()))
  .build();

Whichever of these methods we use, we can use the resulting validator exactly as expected:

ConstraintViolations result = validator.validate(new Data("other"));
assertFalse(result.isValid());
assertEquals(1, result.size());
assertEquals("palindrome", result.get(0).name());
assertEquals("palindrome.valid", result.get(0).messageKey());

Here we can see that our field is invalid and that the result includes our defined message key to indicate exactly what was wrong with it.

5. Conditional Constraints

Not all constraints make sense to be applied in all cases. Yavi gives us some tools to configure some constraints to only work in some cases.

One option that we have is to provide a context to the validator. We can define this as any type that we want, as long as it implements the ConstraintGroup interface, though an enum is a very convenient option:

enum Action implements ConstraintGroup {
    CREATE,
    UPDATE,
    DELETE
}

We can then define a constraint using the constraintOnCondition() wrapper to define a constraint that only applies under a particular context:

 Validator<Person> validator = ValidatorBuilder.of(Person.class)
  .constraint(Person::name, "name", c -> c.notBlank())
  .constraintOnCondition(Action.UPDATE.toCondition(),
    b -> b.constraint(Person::id, "id", c -> c.notBlank()))
  .build();

This will always validate that the name field isn’t blank, but will only validate that the id field isn’t blank if we provide a context of UPDATE.

When using this, we need to validate slightly differently, by providing the context along with the value we’re validating:

ConstraintViolations result = validator.validate(new Person(null, "Baeldung"), Action.UPDATE);
assertFalse(result.isValid());

If we want to have even more control, the constraintOnCondition() method can take a lambda that accepts the value being validated and the context, and indicates if the constraint should be applied. This allows us to define whatever conditions we want:

Validator<Person> validator = ValidatorBuilder.of(Person.class)
  .constraintOnCondition((person, ctx) -> person.id() != null,
    b -> b.constraint(Person::name, "name", c -> c.notBlank()))
  .build();

In this case, the name field will only be validated if the id field has a value:

ConstraintViolations result = validator.validate(new Person(null, null));
assertTrue(result.isValid());

6. Argument Validation

One of Yavi’s unique points is its ability to wrap method calls in validation, ensuring that the arguments are valid before calling the method.

Argument validators are all built using the ArgumentsValidatorBuilder builder class. To ensure type safety, this constructs one of 16 possible types, supporting between 1 and 16 arguments to the method.

This is especially useful to wrap the call to the constructor. This allows us to guarantee valid arguments before calling the constructor, instead of constructing a potentially invalid object and validating it afterwards:

Arguments2Validator<String, Integer, Person> validator = ArgumentsValidatorBuilder.of(Person::new)
  .builder(b -> b
    ._string(Arguments1::arg1, "name", c -> c.notBlank())
    ._integer(Arguments2::arg2, "age", c -> c.positiveOrZero())
  )
  .build();

The slightly unusual syntax of _string() and _integer() are so the compiler knows the type to use for each argument.

Once we’ve built our validator, we can then call it passing in all of the appropriate arguments:

Validated<Person> result = validator.validate("", -1);

This result tells us if the arguments were valid, and if not then returns the validation errors:

assertFalse(result.isValid());
assertEquals(2, result.errors().size());
assertEquals("name", result.errors().get(0).name());
assertEquals("charSequence.notBlank", result.errors().get(0).messageKey());
assertEquals("age", result.errors().get(1).name());
assertEquals("numeric.positiveOrZero", result.errors().get(1).messageKey());

If the arguments were all valid then we can instead get back the result of our method – in this case, the constructed object:

assertTrue(result.isValid());
Person person = result.value();

We can also use this same technique to wrap methods on objects as well:

record Person(String name, int age) {
    boolean isOlderThan(int check) {
        return this.age > check;
    }
}
Arguments2Validator<Person, Integer, Boolean> validator = ArgumentsValidatorBuilder.of(Person::isOlderThan)
  .builder(b -> b
    ._integer(Arguments2::arg2, "age", c -> c.positiveOrZero())
  )
  .build();

This will validate the arguments on the method call and only call the method if they’re all valid. In this case, we pass the instance we’re calling the method on as the first argument and then pass all other arguments afterward:

Person person = new Person("Baeldung", 42);
Validated<Boolean> result = validator.validate(person, -1);

As before, if the arguments pass validation then Yavi calls the method and we can access the return value. If the arguments fail validation, it never calls the wrapped method and instead returns the validation errors.

7. Annotation Processing

So far, Yavi has been rather repetitive in several places. For example, we’ve needed to specify the field name and the method reference to get the value, which typically has the same name. Yavi comes with a Java annotation processor that will help here.

7.1. Annotating Fields

We can annotate fields on our objects with the @ConstraintTarget annotation to automatically generate some meta classes:

record Person(@ConstraintTarget String name, @ConstraintTarget int age) {}

These annotations can go on constructor arguments, getters, or fields and it will work the same.

We can then use these generated classes when we’re building our validator:

Validator<Person> validator = ValidatorBuilder.of(Person.class)
  .constraint(_PersonMeta.NAME, c -> c.notBlank())
  .constraint(_PersonMeta.AGE, c -> c.positiveOrZero().lessThan(150))
  .build();

We no longer need to specify both the field name and the corresponding getter. In addition, if we try to use a field that doesn’t exist then this will no longer compile.

Once built, this validator is identical to before and can used the same as before.

7.2. Annotating Arguments

We can also use the same technique for method arguments when using the support for wrapping method calls. In this case, we use the @ConstraintArguments annotation instead, annotating the method or constructor that we plan on validating:

record Person(String name, int age) {
    @ConstraintArguments
    Person {
    }
    @ConstraintArguments
    boolean isOlderThan(int check) {
        return this.age() > check;
    }
}

Yavi generates one meta-class for each method which we can use to generate the validators as before.

Arguments2Validator<String, Integer, Person> validator = ArgumentsValidatorBuilder.of(Person::new)
  .builder(b -> b
    .constraint(_PersonArgumentsMeta.NAME, c -> c.notBlank())
    .constraint(_PersonArgumentsMeta.AGE, c -> c.positiveOrZero())
  )
  .build();

As before, we no longer need to manually specify the argument names or positions. We also no longer need to specify the correct type of constraint – our meta-class has already defined all of this for us.

8. Summary

This was a quick introduction to Yavi. There’s a lot more that can be done with this library, as well as offering good integration with popular frameworks such as Spring. Next time you need a validation library, why not give it a try?

As usual, all of the examples from this article are available over on GitHub.

1. Introduction

As we know, Java and Go are two prominent programming languages, each excelling in different domains. Java is renowned for its portability and extensive ecosystem, while Go is celebrated for its simplicity, performance, and efficient concurrency handling. In certain scenarios, combining the strengths of both languages can lead to more robust and efficient applications.

In this tutorial we’ll explore how to invoke Go functions from Java without writing any C code, utilizing the Java Native Access (JNA) library to bridge the gap between the two languages.

2. Bridging Java and Go with JNA

Traditionally, invoking native code from Java required writing Java Native Interface (JNI) code in C, which adds complexity and overhead. However, with the advent of the Java Native Access (JNA) library, it is possible to call native shared libraries directly from Java without delving into C code. This approach simplifies the integration process and allows developers to harness Go’s capabilities seamlessly within Java applications.

To understand how this integration works, first, we’ll explore the role of the Java Native Access (JNA) library in bridging Java and Go. Specifically, JNA provides a straightforward way to call functions in native shared libraries from Java code. By compiling Go code into a shared library and exporting the necessary functions, Java can interact with Go functions as if they were part of its own ecosystem.

In essence, this process involves writing the Go functions, compiling them into a shared library, and then creating corresponding Java interfaces that map to these functions using JNA.

2. Setting Up the Project

Before diving into the implementation, it is essential to set up the project environment properly. This involves configuring the build tools and dependencies required for the integration.
In our case, we need the following components:

Java Development Kit (JDK): For compiling and running Java code.
Go Programming Language: For writing and compiling Go code.
Java Native Access (JNA) Library: Included as a dependency in the Maven project.
Build Tools: Maven for Java and the Go compiler for Go code.

Let’s add the JNA library as a maven dependency:

<dependency>
    <groupId>net.java.dev.jna</groupId>
    <artifactId>jna-platform</artifactId>
    <version>5.15.0</version>
</dependency>

3. Invoking Go Function from Java

To demonstrate the integration of Go functions into a Java application, we’ll create a simple program where Java calls a Go function that prints a message to the console. The implementation involves several key steps: writing the Go function, compiling it into a shared library, and then writing the Java code that uses the Java Native Access (JNA) library to invoke the Go function.

3.1. Building Go Shared Library

First, we need to define the Go code properly. To make a Go function accessible from the shared library, we must export it with C linkage. This is achieved by including the import “C” statement and placing the //export directive before the function definition. Additionally, an empty main function is required when building a shared library in Go.

Let’s create a file called hello.go:

package main
/*
#include <stdlib.h>
*/
import "C"
import "fmt"
//export SayHello
func SayHello() {
    fmt.Println("Hello Baeldung from Go!")
}
func main() {}

Let’s highlight the most important parts of the code above:

The import “C” statement enables CGO, allowing the use of C features and exporting functions.
The //export SayHello directive tells the Go compiler to export the SayHello function with C linkage.
The empty main function is necessary when compiling the Go code into a shared library.

After writing the Go function, the next step is to compile the Go code into a shared library. This is done using the Go build command with the -buildmode=c-shared option, which produces a shared library file (e.g., libhello.so on Linux or libhello.dll on Windows).

Depending on the operating system we use we need to execute different commands to compile a shared library.

For Linux-based operating systems, we can use the following command:

go build -o libhello.so -buildmode=c-shared hello.go

For Windows, in order to build a dll library, we apply the next command:

go build -o libhello.dll -buildmode=c-shared hello.go

For macOS, to get a dylib library, we execute the corresponding command:

go build -o libhello.dylib -buildmode=c-shared hello.go

As a result, we should find the shared library in our current directory.

For convenience, all these scripts are included in the source code along with the README.md file.

3.2. Creating the JNA Interface

The next step is to proceed to the Java side of the integration. In the Java application, we utilize the Java Native Access (JNA) library to load the shared library and call the Go function. First, we define a Java interface that extends com.sun.jna.Library, declaring the methods corresponding to the exported Go function:

import com.sun.jna.Library;
public interface GoLibrary extends Library {
    void SayHello();
}

In this interface, GoLibrary extends Library, a marker interface from JNA, and the SayHello method signature matches the exported Go function from the shared library.

Next, we write the Java application that uses this interface to load the shared library and invoke the Go function.

import com.sun.jna.Native;
public class HelloGo {
    public static void main(String[] args) {
        GoLibrary goLibrary = Native.load("hello", GoLibrary.class);
        goLibrary.SayHello();
    }
}

In this code, Native.load loads the shared library libhello.so (omitting the lib prefix and .so extension), creates an instance of GoLibrary to access the exported functions, and calls the SayHello method, which invokes the Go function and prints the message to the console.

When running the Java application, it is important to ensure that the shared library is accessible in the library path. This can be achieved by placing the shared library in the same directory as the Java application or by setting the appropriate environment variables:

For Linux-based systems, we define environment variables by calling the export command:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.

For Windows, we need to add the directory containing the libhello.dll file to the PATH environment variable. This can be done using the following command in Command Prompt (or permanently through the system environment settings):

set PATH=%PATH%;C:\path\to\directory

For macOS, we use a command similar to Linux:

export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:.

Finally, if everything is setup correctly, we should be able to run our app and get the following output:

Hello Baeldung from Go!

4. Passing Parameters and Returning Values

Building upon the simple example, we can enhance the integration by passing parameters to the Go function and returning values to the Java application. This demonstrates how data can be exchanged between Java and Go.

First, we modify the Go code to include a function that adds two integers and returns the result. In the Go code, we define the AddNumbers function to accept two integers of type C.int and return a C.int.

//export AddNumbers
func AddNumbers(a, b C.int) C.int {
    return a + b
}

After updating the Go code, we need to recompile the shared library to include the new function.

Next, we update the Java interface to include the AddNumbers function. We define an interface corresponding to the Go function, specifying the method signature with appropriate parameters and return types.

public interface GoLibrary extends Library {
    void SayHello();
    int AddNumbers(int a, int b);
}

As a result, we can call the AddNumbers function and pass int parameters:

public static void main(String[] args) {
    GoLibrary goLibrary = Native.load("hello", GoLibrary.class);
    int result = goLibrary.AddNumbers(2, 3);
    System.out.printf("Result is %d%n", result);
}

After running the application we should see the result of the calculation in the output:

Result is 5

5. Handling Complex Data Types

In addition to simple data types, it is often necessary to handle more complex data types, such as strings. To pass a string from Java to Go and receive a string back, we need to handle pointers to C strings in the Go code and map them appropriately in Java.

First, we’ll implement a Go function that accepts a string and returns a greeting message. In the Go code, we define the Greet function, which accepts a *C.char and returns a *C.char. We use C.GoString to convert the C string to a Go string and C.CString to convert the Go string back to a C string.

//export Greet
func Greet(name *C.char) *C.char {
    greeting := fmt.Sprintf("Hello, %s!", C.GoString(name))
    return C.CString(greeting)
}

After adding the new function, we need to recompile the shared library.

Next, we need to update the Java interface to include the Greet function. Since the Go function returns a C string, we’ll use the Pointer class provided by JNA to handle the returned value.

public static void main(String[] args) {
    GoLibrary goLibrary = Native.load("hello", GoLibrary.class);
    Pointer ptr = goLibrary.Greet("Alice");
    String greeting = ptr.getString(0);
    System.out.println(greeting);
    Native.free(Pointer.nativeValue(ptr));
}

In this code, the Greet method is called with the argument “Alice,” the returned Pointer retrieves the string, and Native.free releases the memory allocated by the Go function.

If we run the app, we should probably get the following result:

Hello, Alice!

6. Conclusion

By following the guidelines in this tutorial, we can easily integrate Go functions into a Java application using the Java Native Access (JNA) library without writing any C code. This method combines Go’s performance and concurrency with Java, streamlining integration and speeding up development.

Key factors include ensuring compatibility between Java and Go data types, properly managing memory to avoid leaks, and setting up robust error handling. By combining Java’s ecosystem with Go’s performance optimizations, developers can create efficient and powerful applications.

JNA offers several advantages over JNI, such as eliminating the need for C code, supporting cross-platform development, and simplifying the native integration process for faster implementation.

In conclusion, integrating Go functions into Java using JNA is an effective and straightforward approach that enhances performance while simplifying development.

As always, code snippets are available over on GitHub.

1. Introduction

This tutorial will discuss Apache Iceberg, a popular open table format in today’s big data landscape.

We’ll explore Iceberg’s architecture and some of its important features through a hands-on example with open-source distributions.

2. Origin of Apache Iceberg

Iceberg was started at Netflix by Ryan Blue and Dan Weeks around 2017. It came into existence mainly because of the limitations of the Hive table format. One of the critical issues with Hive was its inability to guarantee correctness in the absence of stable atomic transactions.

The design goals of Iceberg were to address these issues and provide three key improvements:

Support ACID transactions and ensure the correctness of data
Improve performance by allowing fine-grained operations at the level of files
Simplify and obfuscate table maintenance

Iceberg was later open-sourced and contributed to the Apache Foundation, where it became a top-level project in 2020. As a result, Apache Iceberg has become the most popular open standard for table formats. Almost all major players in the big data landscape nowadays support Iceberg tables.

3. Architecture of Apache Iceberg

One of Iceberg’s key architecture decisions was tracking the complete list of data files within a table instead of directories. This approach has many advantages, like better query performance.

This all happens in the metadata layer, one of the there are three layers in the architecture of the Iceberg:

What do we have here? When a table is read from Iceberg, it loads the table’s metadata using the current snapshot (s1). If we update this table, the update creates a new metadata file optimistically with a new snapshot (s2).

Then, the value of the current metadata pointer is atomically updated to point to this new metadata file. If the snapshot on which this update was based (s1) is no longer current, the write operation must be aborted.

3.1. Catalog Layer

The catalog layer has several functions, but most importantly, it stores the location of the current metadata pointer. Any compute engine that wishes to operate on Iceberg tables must access the catalog and get this current metadata pointer.

The catalog also supports atomic operations while updating the current metadata pointer. This is essential for allowing atomic transactions on Iceberg tables.

Available features depend on the catalog we use. For instance, Nessie provides a Git-inspired data version control.

3.2. Metadata Layer

The metadata layer contains the hierarchy of files. The one on the top is a metadata file that stores metadata about an Iceberg table. It tracks the table’s schema, partitioning config, custom properties, snapshots, and also which snapshot is the current one.

The metadata file points to a manifest list, a list of manifest files. The manifest list stores metadata about each manifest file that makes up a snapshot, including information like the location of the manifest file and what snapshot it was added to.

Finally, the manifest file tracks data files and provides additional details. Manifest files allow Iceberg to track data at the file level and contain useful information that improves the efficiency and performance of the read operations.

3.3. Data Layer

The data layer is where data files sit, most likely in a cloud object storage service like AWS S3. Iceberg supports several file formats, such as Apache Parquet, Apache Avro, and Apache ORC.

Parquet is the default file format for storing data in Iceberg. It’s a column-oriented data file format. Its key benefit is efficient storage. Moreover, it comes with high-performance compression and encoding schemes. It also supports efficient data access, especially for queries that target specific columns from a wide table.

4. Important Features of Apache Iceberg

Apache Iceberg offers transactional consistency, allowing multiple applications to work together on the same data.

It also has features like snapshots, complete schema evolution, and hidden partitioning.

4.1. Snapshots

Iceberg table metadata maintains a snapshot log that represents the changes applied to a table.

Hence, a snapshot represents the state of the table at some time. Iceberg supports reader isolation and time travel queries based on snapshots.

For snapshot lifecycle management, Iceberg also supports branches and tags, which are named references to snapshots:

Snapshots with branches and tags in Apache Iceberg.

Here, we tagged the important snapshots as “end-of-week,” “end-of-month,” and “end-of-year” to retain them for auditing purposes. Their lifecycle management is controlled by branch- and tag-level retention policies.

Branches and tags can have multiple use cases, like retaining important historical snapshots for auditing.

The schema tracked for a table is valid across all branches. However, querying a tag uses the snapshot’s schema.

4.2. Partitioning

Iceberg partitions the data by grouping similar rows when writing. For example, it can partition log events by date and group them into files with the same event date. This way, it can skip files for other dates that don’t have useful data and make queries faster.

Interestingly, Iceberg supports hidden partitioning. That means it handles the tedious and error-prone task of producing partition values for rows in a table. Users don’t need to know how the table is partitioned, and the partition layouts can evolve as needed.

This is a fundamental difference from partitioning supported by earlier table formats like Hive. With Hive, we must provide the partition values. This ties our working queries to the table’s partitioning scheme, so it can’t change without breaking queries.

4.3. Evolution

Iceberg supports table evolution seamlessly and refers to it as “in-place table evolution.” For instance, we can change the table schema, even in a nested structure. Further, the partition layout can also change in response to data volume changes.

To support this, Iceberg does not require rewriting table data or migrating to a new table. Behind the scenes, Iceberg performs schema evolution just by performing metadata changes. So, no data files get rewritten to perform the update.

We can also update the Iceberg table partitioning in an existing table. The old data written with an earlier partition spec remains unchanged. However, the new data gets written using the new partition spec. Metadata for each partition version is kept separately.

5. Hands-on With Apache Iceberg

Apache Iceberg has been designed as an open community standard. It’s a popular choice in modern data architectures and is interoperable with many data tools.

In this section, we’ll see Apache Iceberg in action by deploying an Iceberg REST catalog over Minio storage with Trino as the query engine.

5.1. Installation

We’ll use Docker images to deploy and connect Minio, Iceberg REST catalog, and Trino. It’s preferable to have a solution like Docker Desktop or Podman to complete these installations.

Let’s begin by creating a network within Docker:

docker network create data-network

The commands in this tutorial are meant for a Windows machine. Changes might be required for other operating systems.

Let’s now deploy Minio with persistent storage (mount host directory “data” as volume):

docker run --name minio --net data-network -p 9000:9000 -p 9001:9001 \
  --volume .\data:/data quay.io/minio/minio:RELEASE.2024-09-13T20-26-02Z.fips \
  server /data --console-address ":9001"

As the next step, we’ll deploy the Iceberg Rest catalog. This is a Tabular contributed image with a thin server to expose an Iceberg Rest catalog server-side implementation backed by an existing catalog implementation:

docker run --name iceberg-rest --net data-network -p 8181:8181 \
  --env-file ./env.list \
  tabulario/iceberg-rest:1.6.0

Here, we are providing environment variables as a file containing all the necessary configurations for the Iceberg REST catalog to work with the Minio:

CATALOG_WAREHOUSE=s3://warehouse/
CATALOG_IO__IMPL=org.apache.iceberg.aws.s3.S3FileIO
CATALOG_S3_ENDPOINT=http://minio:9000
CATALOG_S3_PATH-STYLE-ACCESS=true
AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
AWS_REGION=us-east-1

Now, we’ll deploy Trino to work with the Iceberg REST catalog. We can configure Trino to use the REST catalog and Minio that we deployed earlier by providing a properties file as a volume mount:

docker run --name trino --net data-network -p 8080:8080 \
  --volume .\catalog:/etc/trino/catalog \
  --env-file ./env.list \
  trinodb/trino:449

The properties file contains the details of the REST catalog and Minio:

connector.name=iceberg 
iceberg.catalog.type=rest 
iceberg.rest-catalog.uri=http://iceberg-rest:8181/
iceberg.rest-catalog.warehouse=s3://warehouse/
iceberg.file-format=PARQUET
hive.s3.endpoint=http://minio:9000
hive.s3.path-style-access=true

As before, we also feed the environment variable as a file with access credentials for Minio:

AWS_ACCESS_KEY_ID=minioadmin
AWS_SECRET_ACCESS_KEY=minioadmin
AWS_REGION=us-east-1

The property hive.s3.path-style-access is required for Minio and isn’t necessary if we use AWS S3.

5.2. Data Operations

We can use Trino to perform different operations on the REST catalog. Trino comes with a built-in CLI to make this easier for us. Let’s first get access to the CLI from within the Docker container:

docker exec -it trino trino

This should provide us with a shell-like prompt to submit our SQL queries. As we have seen earlier, a client for Iceberg needs to begin by accessing the catalog first. Let’s see if we have any default catalogs available to us:

trino> SHOW catalogs;
 Catalog
---------
 iceberg
 system
(2 rows)

We’ll use iceberg. Let’s begin by creating a schema in Trino (which translates to a namespace in Iceberg):

trino> CREATE SCHEMA iceberg.demo;
CREATE SCHEMA

Now, we can create a table inside this schema:

trino> CREATE TABLE iceberg.demo.customer (
    -> id INT,
    -> first_name VARCHAR,
    -> last_name VARCHAR,
    -> age INT);
CREATE TABLE

Let’s insert a few rows:

trino> INSERT INTO iceberg.demo.customer (id, first_name, last_name, age) VALUES
    -> (1, 'John', 'Doe', 24),
    -> (2, 'Jane', 'Brown', 28),
    -> (3, 'Alice', 'Johnson', 32),
    -> (4, 'Bob', 'Williams', 26),
    -> (5, 'Charlie', 'Smith', 35);
INSERT: 5 rows

We can query the table to fetch the inserted data:

trino> SELECT * FROM iceberg.demo.people;
 id | first_name | last_name | age
----+------------+-----------+-----
  1 | John       | Doe       |  24
  2 | Jane       | Brown     |  28
  3 | Alice      | Johnson   |  32
  4 | Bob        | Williams  |  26
  5 | Charlie    | Smith     |  35
(5 rows)

As we can see, we can use the familiar SQL syntax to work with a highly scalable and open table format for massive volumes of data.

5.3. A Peek Into the Files

Let’s see what type of files are generated in our storage.

Minio provides a console that we can access at http://localhost:9001. We find two directories under warehouse/demo:

data and
metadata

Let’s first look into the metadata directory:

It contains the metadata files (*.metadata.json), manifest lists (snap-*.avro), and manifest files (*.avro, *.stats). The .stats file contains information about the table’s data used to improve the query performance.

Now, let’s see what’s there in the data directory:

It has a data file in the Parquet format that contains the actual data that we created through our queries.

6. Conclusion

Apache Iceberg has become a popular choice for implementing data lakehouses today. It offers features like snapshots, hidden partitioning, and in-place table evolution.

Together with the REST catalog specification, it’s fast becoming the de-facto standard for open table formats.

1. Overview

In this article, we’ll explore how to mock multiple responses for the same request using MockServer.

A MockServer simulates real APIs by mimicking their behavior, allowing us to test applications without needing backend services.

2. Application Set Up

Let’s consider a payment processing API that provides an endpoint for handling payment requests. When a payment is initiated, this API calls an external bank payment service. The bank’s API responds with a reference paymentId. Using this ID, the API periodically checks the payment status by polling the bank’s API, ensuring the payment is processed successfully.

Let’s begin by defining the payment request model, which includes the card details needed to process the payment:

public record PaymentGatewayRequest(
  String cardNumber, String expiryMonth, String expiryYear, String currency, int amount, String cvv) {
}

Similarly, let’s define the payment response model, which contains the payment status:

public record PaymentGatewayResponse(UUID id, PaymentStatus status) {
    public enum PaymentStatus {
        PENDING,
        AUTHORIZED,
        DECLINED,
        REJECTED
    }
}

Now, let’s add the controller and implementation to integrate with the bank’s payment service for submitting payment and status polling. The API will keep polling while the payment status starts as pending and later updates to AUTHORIZED, DECLINED, or REJECTED:

@PostMapping("payment/process")
public ResponseEntity<PaymentGatewayResponse> submitPayment(@RequestBody PaymentGatewayRequest paymentGatewayRequest) 
  throws JSONException {
    String paymentSubmissionResponse = webClient.post()
      .uri("http://localhost:9090/payment/submit")
      .body(BodyInserters.fromValue(paymentGatewayRequest))
      .retrieve()
      .bodyToMono(String.class)
      .block();
    UUID paymentId = UUID.fromString(new JSONObject(paymentSubmissionResponse).getString("paymentId"));
    PaymentGatewayResponse.PaymentStatus paymentStatus = PaymentGatewayResponse.PaymentStatus.PENDING;
    while (paymentStatus.equals(PaymentGatewayResponse.PaymentStatus.PENDING)) {
        String paymentStatusResponse = webClient.get()
          .uri("http://localhost:9090/payment/status/%s".formatted(paymentId))
          .retrieve()
          .bodyToMono(String.class)
          .block();
        paymentStatus = PaymentGatewayResponse.PaymentStatus.
          valueOf(new JSONObject(paymentStatusResponse).getString("paymentStatus"));
        logger.info("Payment Status {}", paymentStatus);
    }
    return new ResponseEntity<>(new PaymentGatewayResponse(paymentId, paymentStatus), HttpStatus.OK);
}

To test this API and ensure it polls the payment status until reaching a terminal state, we need the ability to mock multiple responses from the payment status API. The mock response should initially return a PENDING status a few times before updating to AUTHORIZED, enabling us to effectively validate the polling mechanism.

3. How to Mock Multiple Responses for the Same Requests

The first step in testing this API is to start a mock server on port 9090. Our API uses this port to interact with the bank’s payment submission and status services:

class PaymentControllerTest {
    private ClientAndServer clientAndServer;
    private final MockServerClient mockServerClient = new MockServerClient("localhost", 9090);
    @BeforeEach
    void setup() {
        clientAndServer = startClientAndServer(9090);
    }
    
    @AfterEach
    void tearDown() {
        clientAndServer.stop();
    }
    // ...
}

Next, let’s set up a mock for the payment submission endpoint to return the paymentId:

mockServerClient
  .when(request()
    .withMethod("POST")
    .withPath("/payment/submit"))
  .respond(response()
    .withStatusCode(200)
    .withBody("{\"paymentId\": \"%s\"}".formatted(paymentId))
    .withHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE));

To mock multiple responses for the same request, we need to use the Times class together with the when() method.

The when() method uses the Times argument to specify how many times a request should match. This allows us to mock different responses for repeated requests.

Following that, let’s mock the payment status endpoint to return a PENDING status 4 times:

mockServerClient
  .when(request()
    .withMethod("GET")
    .withPath("/payment/status/%s".formatted(paymentId)), Times.exactly(4))
  .respond(response()
    .withStatusCode(200)
    .withHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
    .withBody("{\"paymentStatus\": \"%s\"}"
    .formatted(PaymentGatewayResponse.PaymentStatus.PENDING.toString())));

Next, let’s mock the payment status endpoint to return AUTHORIZED:

mockServerClient
  .when(request()
    .withMethod("GET")
    .withPath("/payment/status/%s".formatted(paymentId)))
  .respond(response()
    .withStatusCode(200)
    .withHeader(HttpHeaders.CONTENT_TYPE, MediaType.APPLICATION_JSON_VALUE)
    .withBody("{\"paymentStatus\": \"%s\"}"
    .formatted(PaymentGatewayResponse.PaymentStatus.AUTHORIZED.toString())));

Lastly, let’s send a request to the payment processing API endpoint to receive the AUTHORIZED result:

webTestClient.post()
  .uri("http://localhost:9000/api/payment/process")
  .bodyValue(new PaymentGatewayRequest("4111111111111111", "12", "2025", "USD", 10000, "123"))
  .exchange()
  .expectStatus()
  .isOk()
  .expectBody(PaymentGatewayResponse.class)
  .value(response -> {
      Assertions.assertNotNull(response);
      Assertions.assertEquals(PaymentGatewayResponse.PaymentStatus.AUTHORIZED, response.status());
  });

We should see the log printing “Payment Status PENDING” four times, followed by “Payment Status AUTHORIZED“.

4. Conclusion

In this tutorial, we explored how to mock multiple responses for the same request, enabling flexible testing of APIs using the Times class.

The default when() method in MockServerClient uses Times.unlimited() to respond to all matching requests consistently. To mock a response for a specific number of requests, we can use Times.exactly().

As always, the source code for the examples is available over on GitHub.

1. Introduction

As distributed systems become more complex, monitoring becomes crucial to maintaining application performance and quickly identifying issues. One of the best tools for this is Prometheus, a robust open-source monitoring and alerting toolkit.

The Prometheus Java client allows us to instrument our applications with minimal effort by exposing real-time metrics for Prometheus to scrape and monitor.

We will explore how to use the Prometheus Java client library with Maven, including creating custom metrics and configuring an HTTP server to expose them. Additionally, we will cover the different metric types offered by the library, and provide practical examples that tie all these elements together.

2. Setting Up the Project

To get started with the Prometheus Java client, we’ll use Maven to manage our project’s dependencies. There are several essential dependencies that we need to add to our pom.xml file to enable Prometheus metrics collection and exposure:

<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>prometheus-metrics-core</artifactId>
    <version>1.3.1</version>
</dependency>
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>prometheus-metrics-instrumentation-jvm</artifactId>
    <version>1.3.1</version>
</dependency>
<dependency>
    <groupId>io.prometheus</groupId>
    <artifactId>prometheus-metrics-exporter-httpserver</artifactId>
    <version>1.3.1</version>
</dependency>

We use the following dependencies prometheus-metrics-core is the core library of the Prometheus Java client. It provides the foundation for defining and registering custom metrics such as counters, gauges, histograms, etc.

prometheus-metrics-instrumentation-jvm provides out-of-the-box JVM metrics, including heap memory usage, garbage collection times, thread counts, and more.

prometheus-metrics-exporter-httpserver provides an embedded HTTP server to expose metrics in the Prometheus format. It creates a /metrics endpoint that Prometheus can scrape to collect data.

3. Creating and Exposing JVM Metrics

This section will cover how to expose the JVM metrics available through the Prometheus Java client. These metrics offer valuable insights into the performance of our application. Thanks to the prometheus-metrics-instrumentation-jvm dependency, we can easily register out-of-the-box JVM metrics without needing custom instrumentation:

public static void main(String[] args) throws InterruptedException, IOException {
    JvmMetrics.builder().register();
    HTTPServer server = HTTPServer.builder()
      .port(9400)
      .buildAndStart();
    System.out.println("HTTPServer listening on http://localhost:" + server.getPort() + "/metrics");
    Thread.currentThread().join();
}

To make the JVM metrics available to Prometheus, we exposed them over an HTTP endpoint. We used the prometheus-metrics-exporter-httpserver dependency to set up a simple HTTP server that listens on a port and serves the metrics.

We used the join() method to keep the main thread running indefinitely, ensuring that the HTTP server stays active so Prometheus can continuously scrape the metrics over time.

3.2. Testing the Application

Once the application is running, we can open a browser and navigate to http://localhost:9400/metrics to view the exposed metrics or we can use the curl command to fetch and inspect the metrics from the command line:

$ curl http://localhost:9400/metrics

We should see a list of JVM-related metrics in the Prometheus format, similar to this:

# HELP jvm_memory_bytes_used Used bytes of a given JVM memory area.
# TYPE jvm_memory_bytes_used gauge
jvm_memory_bytes_used{area="heap",} 5242880
jvm_memory_bytes_used{area="nonheap",} 2345678
# HELP jvm_gc_collection_seconds Time spent in a given JVM garbage collector in seconds.
# TYPE jvm_gc_collection_seconds summary
jvm_gc_collection_seconds_count{gc="G1 Young Generation",} 5
jvm_gc_collection_seconds_sum{gc="G1 Young Generation",} 0.087
...

The output displays various JVM metrics, such as memory usage, garbage collection details, and thread counts. Prometheus collects and analyzes these metrics, which are formatted in its dedicated exposition format.

4. Metric Types

In the Prometheus Java client, metrics are categorized into different types, each serving a specific purpose in measuring various aspects of our application’s behavior. These types are based on the OpenMetrics standard, which Prometheus adheres to.

Let’s explore the main metric types available in the Prometheus Java client and how they are typically used.

4.1. Counter

A Counter is a metric that only increments over time. We can use them to count the requests received, errors encountered, or tasks completed. Counters cannot be decreased, their values are reset only when the process restarts.

We can count the total number of HTTP requests our application handles:

Counter requestCounter = Counter.builder()
  .name("http_requests_total")
  .help("Total number of HTTP requests")
  .labelNames("method", "status")
  .register();
requestCounter.labelValues("GET", "200").inc();

We use labelNames and labelValues to add dimensions or context to our metrics. Labels in Prometheus are key-value pairs that allow us to differentiate between different categories of the same metric.

4.2. Gauge

A Gauge is a metric that can increase or decrease over time. We can use them to track values that change over time, like memory usage, temperature, or the number of active threads.

To measure the current memory usage or CPU load we would probably use a gauge:

Gauge memoryUsage = Gauge.builder()
  .name("memory_usage_bytes")
  .help("Current memory usage in bytes")
  .register();
memoryUsage.set(5000000);

4.3. Histogram

A Histogram is typically used to observe and track the distribution of values over time, such as request latencies or response sizes. It records pre-configured buckets and provides metrics on the number of observations in each bucket, the total count, and the sum of all observed values. This allows us to understand the data distribution and calculate percentiles or ranges.

Let’s walk through a detailed example that measures HTTP request durations and uses custom buckets to track specific ranges of response times:

Histogram requestLatency = Histogram.builder()
  .name("http_request_latency_seconds")
  .help("Tracks HTTP request latency in seconds")
  .labelNames("method")
  .register();
Random random = new Random();
for (int i = 0; i < 100; i++) {
    double latency = 0.1 + (3 * random.nextDouble());
    requestLatency.labelValues("GET").observe(latency);
}

We didn’t specify any custom buckets when creating the histogram therefore the library uses a set of default buckets. The default buckets cover exponential ranges of values, which suit many applications that measure durations or latencies. Specifically, the default bucket boundaries are as follows:

[5ms, 10ms, 25ms, 50ms, 100ms, 250ms, 500ms, 1s, 2.5s, 5s, 10s, +Inf]

When checking the result we might see output like this:

http_request_latency_seconds_bucket{method="GET",le="0.005"} 0
http_request_latency_seconds_bucket{method="GET",le="0.01"} 0
http_request_latency_seconds_bucket{method="GET",le="0.025"} 0
http_request_latency_seconds_bucket{method="GET",le="0.05"} 0
http_request_latency_seconds_bucket{method="GET",le="0.1"} 0
http_request_latency_seconds_bucket{method="GET",le="0.25"} 6
http_request_latency_seconds_bucket{method="GET",le="0.5"} 15
http_request_latency_seconds_bucket{method="GET",le="1.0"} 32
http_request_latency_seconds_bucket{method="GET",le="2.5"} 79
http_request_latency_seconds_bucket{method="GET",le="5.0"} 100
http_request_latency_seconds_bucket{method="GET",le="10.0"} 100
http_request_latency_seconds_bucket{method="GET",le="+Inf"} 100
http_request_latency_seconds_count{method="GET"} 100
http_request_latency_seconds_sum{method="GET"} 157.8138389516349

Each bucket shows how many observations fell into that range. For example, http_request_duration_seconds_bucket{le=”0.25″} shows that 6 requests took less than or equal to 250ms. The +Inf bucket captures all observations, so its count is the total number of observations.

4.4. Summary

A Summary is similar to a Histogram, but instead of using predefined buckets, it calculates quantiles to summarize the observed data. As a result, it becomes useful for tracking request latencies or response sizes. Furthermore, it helps us to determine key metrics like the median (50th percentile) or the 90th percentile:

Summary requestDuration = Summary.builder()
  .name("http_request_duration_seconds")
  .help("Tracks the duration of HTTP requests in seconds")
  .quantile(0.5, 0.05)
  .quantile(0.9, 0.01)
  .register();
for (int i = 0; i < 100; i++) {
    double duration = 0.05 + (2 * random.nextDouble());
    requestDuration.observe(duration);
}

We define two quantiles:

0.5 (50th percentile) approximates the median, with a 5% error.
0.9 (90th percentile) shows that 90% of the requests were faster than this value, with a 1% error.

When Prometheus scrapes the metrics, we’ll see output like this:

http_request_duration_seconds{quantile="0.5"} 1.3017345289221114
http_request_duration_seconds{quantile="0.9"} 1.8304437814581778
http_request_duration_seconds_count 100
http_request_duration_seconds_sum 110.5670284649691

The quantiles show the observed value at the 50th and 90th percentiles. In other words, 50% of requests took less than 1.3 seconds, and 90% took less than 1.9 seconds.

4.5. Info

An Info metric stores static labels about the application. It is used for version numbers, build information, or environment details. It is not a performance metric but a way to add informative metadata to the Prometheus output.

Info appInfo = Info.builder()
  .name("app_info")
  .help("Application version information")
  .labelNames("version", "build")
  .register();
appInfo.addLabelValues("1.0.0", "12345");

4.6. StateSet

A StateSet metric represents multiple states that can be either active or inactive. It is useful when we need to track different operational states of our application or feature flag statuses:

StateSet stateSet = StateSet.builder()
  .name("feature_flags")
  .help("Feature flags")
  .labelNames("env")
  .states("feature1")
  .register();
stateSet.labelValues("dev").setFalse("feature1");

5. Overview of Prometheus Metric Types

The Prometheus Java client provides various metrics to capture different dimensions of our application’s performance and behavior. Below is a summary table that outlines the key features of each metric type, including their purpose and usage examples:

Metric Type	Description	Example Use Case
Counter	A metric that only increases over time, is typically used for counting events	Counting the number of HTTP requests or errors
Gauge	It can increase or decrease and is used for values that fluctuate over time	Tracking memory usage or the number of active threads
Histogram	Measures the distribution of values into configurable buckets	Observing request latencies or response sizes
Summary	Tracks the distribution of observations and calculates configurable quantiles	Measuring request duration or latency percentiles
Info	Stores static labels with metadata about the application	Capturing version or build information
StateSet	Tracks multiple operational states that can be active or inactive	Monitoring feature flag statuses

6. Conclusion

In this article, we’ve explored how to effectively use the Prometheus Java client to monitor applications by instrumenting custom and JVM metrics. First, we covered setting up your project using Maven dependencies. Consequently, we moved on to exposing metrics via an HTTP endpoint. Afterward, we discussed key metric types such as Counters, Gauges, Histograms, and Summaries, each serving a distinct purpose in tracking various performance indicators.

As always, the full implementation code of this article can be found over on GitHub.

1. Overview

AspectJ is a powerful tool for handling cross-cutting concerns like logging, security, and transaction management in Java applications. A common use case is applying an aspect to all methods within a specific package. In this tutorial, we’ll learn to create a pointcut in AspectJ that matches all methods in a package, with step-by-step code examples.

To learn more about AspectJ, check out our comprehensive AspectJ tutorials.

2. Maven Dependencies

When running an AspectJ program, the classpath should contain the classes and aspects, along with the AspectJ runtime library aspectjrt:

<dependency>
    <groupId>org.aspectj</groupId> 
    <artifactId>aspectjrt</artifactId>
    <version>1.9.22.1</version>
</dependency>

In addition to the AspectJ runtime dependency, we’ll also need to include the aspectjweaver library to introduce advice to the Java class at load time:

<dependency>
    <groupId>org.aspectj</groupId>
    <artifactId>aspectjweaver</artifactId> 
    <version>1.9.22.1</version>
</dependency>

3. What is a Pointcut?

A pointcut in AspectJ is a core concept defining where an aspect should be applied in the code. Aspects manage cross-cutting concerns like logging, security, or transaction management. A pointcut specifies specific points, called join points, in the program’s execution where the aspect’s advice (or action) should run. These join points can be identified using different expressions, including method signatures, class names, or specific packages.

A join point is a specific moment in program execution where an aspect can be applied. This includes method calls, executions, object instantiations, and field accesses. Advice is the action an aspect takes at a join point. It can occur before (@Before), after (@After), or around (@Around) the joining point. A pointcut expression is a declaration that defines which join points should be matched. This expression follows a specific syntax that enables it to target method executions, field accesses, and more.

3.2. Pointcut Syntax

A pointcut expression usually has two key components: the type of join point and the signature pattern. The type of join point defines the event, including a method call, method execution, or constructor execution. The signature pattern identifies specific methods or fields using class, package, parameters, or return type criteria.

4. Pointcut Expression

To create a pointcut that matches all methods in a specific package, we can use the following expression:

execution(* com.baeldung.aspectj..*(..))

Here’s a breakdown of this expression:

execution: The pointcut designator, specifies that we’re targeting method execution.
*: A wildcard indicating any return type.
com.baeldung.aspectj..*: Matches any class within the com.baeldung.aspectj package and any sub-packages.
(..): Matches any method parameters.

4.1. Logging Aspect for All Methods in a Package

Let’s create an example aspect that logs the execution of all methods within a package named com.baeldung.aspectj:

@Before("execution(* com.baeldung.aspectj..*(..))")
public void pointcutInsideAspectjPackage(JoinPoint joinPoint) {
    String methodName = joinPoint.getSignature().getName();
    String className = joinPoint.getTarget().getClass().getSimpleName();
    System.out.println(
        "Executing method inside aspectj package: " + className + "." + methodName
    );
}

The pointcut expression in @Before targets all methods within the com.baeldung.aspectj package and its sub-packages.

Let’s create UserService in the service package:

@Service
public class UserService {
    public void createUser(String name, int age) {
        System.out.println("Request to create user: " + name + " | age: " + age);
    }
    public void deleteUser(String name) {
        System.out.println("Request to delete user: " + name);
    }
}

When the UserService methods run, the aspect pointcutInsideAspectjPackage() will log both methods. Now, let’s test our code:

@Test
void testUserService() {
    userService.createUser("create new user john", 21);
    userService.deleteUser("john");
}

The aspect pointcutInsideAspectjPackage() should be invoked right before the createdUser() and deleteUser() in the UserService class are executed:

Executing method inside aspectj package: UserService.createUser
Request to create user: create new user john | age: 21
Executing method inside aspectj package: UserService.deleteUser
Request to delete user: john

Next, let’s create another class named UserRepository in a different package called the repository package:

@Repository
public class UserRepository {
    public void createUser(String name, int age) {
        System.out.println("User: " + name + ", age:" + age + " is created.");
    }
    public void deleteUser(String name) {
        System.out.println("User: " + name + " is deleted.");
    }
}

When methods in the UserRepository class are executed, the aspect pointcutInsideAspectjPackage() will log both methods. Now, let’s test our code:

@Test
void testUserRepository() {
    userRepository.createUser("john", 21);
    userRepository.deleteUser("john");
}

The aspect pointcutInsideAspectjPackage() should be invoked right before the createdUser() and deleteUser() methods in the UserService class are executed:

Executing method inside aspectj package: UserRepository.createUser
User: john, age:21 is created
Executing method inside aspectj package: UserRepository.deleteUser
User: john is deleted.

4.2. Logging Aspect for All Methods in a Sub-Package

Let’s create an example aspect that logs the execution of all methods within a package named com.baeldung.aspectj.service:

@Before("execution(* com.baeldung.aspectj.service..*(..))")
public void pointcutInsideServicePackage(JoinPoint joinPoint) {
    String methodName = joinPoint.getSignature().getName();
    String className = joinPoint.getTarget().getClass().getSimpleName();
    System.out.println(
        "Executing method inside service package: " + className + "." + methodName
    );
}

The pointcut expression inside the @Before annotation (execution(* com.baeldung.aspectj.service..*(..))) matches all methods within the com.baeldung.aspectj.service package.

Next, let’s create another class named MessageService in the service package to provide additional test cases:

@Service
public class MessageService {
    public void sendMessage(String message) {
        System.out.println("sending message: " + message);
    }
    public void receiveMessage(String message) {
        System.out.println("receiving message: " + message);
    }
}

When any method of MessageService is executed, the aspect pointcutInsideServicePackage() will log both methods. Now, let’s test our code:

@Test
void testMessageService() {
    messageService.sendMessage("send message from user john");
    messageService.receiveMessage("receive message from user john");
}

The aspect pointcutInsideAspectjPackage() previously and pointcutInsideServicePackage() should be invoked right before the method sendMessage() and receiveMessage() in the MessageService class are called:

Executing method inside aspectj package: MessageService.sendMessage
Executing method inside service package: MessageService.sendMessage 
sending message: send message from user john
Executing method inside aspectj package: MessageService.receiveMessage
Executing method inside service package: MessageService.receiveMessage
receiving message: receive message from user john

4.3. Logging Aspect by Excluding a Specific Package

Let’s create an example aspect that will exclude the execution of a specific package named com.baeldung.aspectj.service:

@Before("execution(* com.baeldung.aspectj..*(..)) && !execution(* com.baeldung.aspectj.repository..*(..))")
public void pointcutWithoutSubPackageRepository(JoinPoint joinPoint) {
    String methodName = joinPoint.getSignature().getName();
    String className = joinPoint.getTarget().getClass().getSimpleName();
    System.out.println(
        "Executing method without sub-package repository: " + className + "." + methodName
    );
}

The pointcut expression inside the @Before annotation (execution(* com.baeldung.aspectj..*(..)) && !execution(* com.baeldung.aspectj.repository..*(..))) matches all methods within the com.baeldung.aspectj package and its sub-packages and excluding the sub-package repository.

Now, let’s re-run our previous unit test. The aspect named pointcutWithoutSubPackageRepository() should be invoked right before all methods in the aspectj package while excluding the repository sub-package in this case:

Executing method inside aspectj package: UserService.createUser
Executing method inside service package: UserService.createUser
Executing method without sub-package repository: UserService.createUser
Request to create user: create new user john | age: 21
Executing method inside aspectj package: UserService.deleteUser
Executing method inside service package: UserService.deleteUser
Executing method without sub-package repository: UserService.deleteUser
Request to delete user: john
Executing method inside aspectj package: UserRepository.createUser
User: john, age:21 is created.
Executing method inside aspectj package: UserRepository.deleteUser
User: john is deleted.
Executing method inside aspectj package: MessageService.sendMessage
Executing method inside service package: MessageService.sendMessage
Executing method without sub-package repository: MessageService.sendMessage
sending message: send message from user john
Executing method inside aspectj package: MessageService.receiveMessage
Executing method inside service package: MessageService.receiveMessage
Executing method without sub-package repository: MessageService.receiveMessage
receiving message: receive message from user john

5. Conclusion

In this tutorial, we learned that a pointcut in AspectJ is a powerful tool for specifying exactly where aspect advice should be applied (such as to methods, classes, or fields).

Creating a pointcut to target all methods within the main package or a specific package is straightforward with AspectJ. We can also exclude certain packages if needed.

This approach is useful for applying the same logic such as logging or security checks across multiple classes and methods without duplicating code. By defining a pointcut for the desired package, we can keep your code clean and easy to maintain.

The code examples are available over on GitHub.

1. Introduction

Jakarta Persistence (formerly JPA) is the standard API for object-relational mapping in Java. It enables developers to manage relational data in Java applications and simplifies database interactions by mapping Java objects to database tables using annotations and entity classes.

In this tutorial, we’ll explore some of the key new features introduced in Jakarta Persistence 3.2, highlighting improvements in configuration, performance, and usability.

2. What is Jakarta Persistence 3.2?

Jakarta Persistence 3.2 is the latest version of the Jakarta Persistence API, which provides a standardized approach for object-relational mapping (ORM) in Java applications.

This version introduces improvements in query capabilities, performance, usability, and enhanced support for modern database features.

To add support for Jakarta Persistence 3.2, we must add the following Maven dependency to our pom.xml:

<dependency>
    <groupId>jakarta.persistence</groupId>
    <artifactId>jakarta.persistence-api</artifactId>
    <version>3.2.0</version>
</dependency>

Additionally, we need the latest Hibernate 7 version, which supports this API:

<dependency>
    <groupId>org.hibernate.orm</groupId>
    <artifactId>hibernate-core</artifactId>
    <version>7.0.0.Beta1</version>
</dependency>

3. Key New Features

Jakarta Persistence 3.2 introduces a few new features to improve database connection handling, schema configuration, and transaction management.

3.1. Persistence Configuration

The latest Jakarta Persistence 3.2 version adds programmatic API to obtain an instance of the EntityManagerFactory interface using the PersistenceConfiguration class instead of the traditional persistence.xml file – providing flexibility, especially in environments where runtime configurations may vary.

To demonstrate the new features and enhancements, let’s create the Employee entity class with a few fields like id, fullName, and department:

@Entity
public class Employee {
    @Id
    private Long id;
    private String fullName;
    private String department;
    // getters and setters ...
}

Here, the @Entity annotation indicates that the Employee class is a persistent entity and the @Id annotation marks the id field as the primary key.

Now, let’s programmatically configure an instance of the EntityManagerFactory class using the newly introduced PersistenceConfiguration class:

EntityManagerFactory emf = new PersistenceConfiguration("EmployeeData")
  .jtaDataSource("java:comp/env/jdbc/EmployeeData")
  .managedClass(Employee.class)
  .property(PersistenceConfiguration.LOCK_TIMEOUT, 5000)
  .createEntityManagerFactory();
assertNotNull(emf);

We create the instance of the EntityManagerFactory by setting up the data source, registering the entity class, and configuring properties like lock timeouts.

3.2. Schema Manager API

The new version of Jakarta Persistence also introduces the Schema Manager API, allowing developers to manage schema programmatically. This simplifies database migrations and schema validation in both development and production environments.

For instance, we can now enable schema creation using the API:

emf.getSchemaManager().create(true);

In total, there are four functions available for schema management:

create(): creates the tables associated with entities in the persistence unit
drop(): drops tables associated with entities in the persistence unit
validate(): validates the schema against the entity mappings
truncate(): clears data from tables related to entities

3.3. Run/Call in Transaction

There are now new methods like runInTransaction() and callInTransaction() to improve the handling of database transactions by providing an application-managed EntityManager with an active transaction.

With these methods, we can perform operations within a transaction scope and access the underlying database connection when necessary.

We can use these methods to run a query within a transaction and operate directly on the database connection:

emf.runInTransaction(em -> em.runWithConnection(connection -> {
    try (var stmt = ((Connection) connection).createStatement()) {
        stmt.execute(
          "INSERT INTO employee (id, fullName, department) VALUES (8, 'Jane Smith', 'HR')"
        );
    } catch (Exception e) {
        Assertions.fail("JDBC operation failed");
    }
}));
var employee = emf.callInTransaction(em -> em.find(Employee.class, 8L));
assertNotNull(employee);
assertEquals("Jane Smith", employee.getFullName());

First, we’ve executed SQL to insert a new employee into the database within a transaction using the runInTransaction(). Then, the callInTransaction() method retrieves and verifies the inserted employee’s details.

3.4. TypedQueryReference Interface

Named queries are usually referenced by strings in Jakarta Persistence making them prone to errors such as typos in the query name.

The newly introduced TypedQueryReference interface aims to solve this by linking named queries to the static metamodel, thus making them type-safe and discoverable at compile-time.

Let’s update our Employee entity with a named query to search using the department field:

@Entity
@NamedQuery(
  name = "Employee.byDepartment",
  query = "FROM Employee WHERE department = :department",
  resultClass = Employee.class
)
public class Employee {
// ...
}

Once compiled, the corresponding static metamodel would be generated as follows:

@StaticMetamodel(Employee.class)
@Generated("org.hibernate.processor.HibernateProcessor")
public abstract class Employee_ {
    public static final String QUERY_EMPLOYEE_BY_DEPARTMENT = "Employee.byDepartment";
    public static final String FULL_NAME = "fullName";
    public static final String ID = "id";
    public static final String DEPARTMENT = "department";
    // ...
}

Now, we can use the QUERY_EMPLOYEE_BY_DEPARTMENT constant to refer to the named query byDepartment defined on the Employee entity:

Map<String, TypedQueryReference> namedQueries = emf.getNamedQueries(Employee.class);
List employees = em.createQuery(namedQueries.get(QUERY_EMPLOYEE_BY_DEPARTMENT))
  .setParameter("department", "Science")
  .getResultList();
assertEquals(1, employees.size());

In the code snippet, we can observe that the getNamedQueries() method of EntityManagerFactory returns a map of the named query and its TypedQueryReference. Then, we used the EntityManager‘s createQuery() method to get the employees from the Science department and assert that the list contains exactly one result, confirming the query’s expected output.

Therefore, the TypedQueryReference interface ensures that the named query exists and is correctly referenced, providing compile-time validation.

3.5. Type-safety in EntityGraph

Jakarta Persistence’s entity graphs allow the eager loading of properties when executing a query.

Now, with the new version of Jakarta Persistence, they are type-safe too – ensuring properties referenced in the graph are valid and exist at compile time, reducing the risk of errors.

For example, let’s use the static metamodel Employee_ to ensure type safety at compile time:

var employeeGraph = emf.callInTransaction(em -> em.createEntityGraph(Employee.class));
employeeGraph.addAttributeNode(Employee_.department);
var employee = emf.callInTransaction(em -> em.find(employeeGraph, 7L));
assertNotNull(employee);
assertEquals("Engineering", employee.getDepartment());

Here, the property department is accessed from the static meta-model class which validates that it exists in the Employee class, otherwise, this creates a compilation error if we get the property wrong.

4. Usability Enhancements

Jakarta Persistence 3.2 introduces several performance and usability enhancements to simplify database queries and improve overall application performance.

4.1. Streamlined JPQL

This streamlined query syntax is now supported in JPQL and is commonly used in Jakarta Data Query Language, a subset of JPQL.

For example, when an entity doesn’t specify an alias, it automatically defaults to the associated table:

Employee employee = emf.callInTransaction(em -> 
  em.createQuery("from Employee where fullName = 'Tony Blair'", Employee.class).getSingleResult()
);
assertNotNull(employee);

Here, we didn’t specify an alias for the Employee entity. Instead, the alias defaults to this, allowing us to perform operations directly on the entity without needing to qualify field names.

4.2. cast() Function

The new cast() method in the Jakarta Persistence allows us to cast query results:

emf.runInTransaction(em -> em.persist(new Employee(11L, "123456", "Art")));
TypedQuery<Integer> query = em.createQuery(
  "select cast(e.fullName as integer) from Employee e where e.id = 11", Integer.class
);
Integer result = query.getSingleResult();
assertEquals(123456, result);

In this example, we first insert a new Employee record with 123456 as the value for the fullName. Then, using a JPQL query, we cast the String property fullName to an Integer.

4.3. left() and right() Function

Next, JPQL also allows string manipulation methods like left() to extract substrings using the index value:

TypedQuery<String> query = em.createQuery(
  "select left(e.fullName, 3) from Employee e where e.id = 2", String.class
);
String result = query.getSingleResult();
assertEquals("Tom", result);

Here, we’ve extracted the substring Tom from the left of the fullName using the JPQL functions left().

Similarly, it also provides the right() method for substring extraction:

query = em.createQuery("select right(e.fullName, 6) from Employee e where e.id = 2", String.class);
result = query.getSingleResult();
assertEquals("Riddle", result);

So, as demonstrated, we’ve extracted the substring Riddle from the right of the fullName.

4.4. replace() Function

Similarly, the replace() function is also available in JPQL now, allowing us to replace part of the String:

TypedQuery<String> query = em.createQuery(
  "select replace(e.fullName, 'Jade', 'Jane') from Employee e where e.id = 4", String.class
);
String result = query.getSingleResult();
assertEquals("Jane Gringer", result);

Here, the replace() function has substituted the occurrences of Jade to the new String value Jane in the fullName property.

4.5. id() Function

Additionally, the new id() method lets us extract the identifier of the database record:

TypedQuery<Long> query = em.createQuery(
  "select id(e) from Employee e where e.fullName = 'John Smith'", Long.class
);
Long result = query.getSingleResult();
assertEquals(1L, result);

The id() function fetches the primary key of the Employee record matching the fullName to John Smith.

4.6. Improved Sorting

Finally, sorting improvements for Jakarta Persistence 3.2 add null-first and case-insensitive ordering using scalar expressions like lower() and upper():

emf.runInTransaction(em -> {
    em.persist(new Employee(21L, "alice", "HR"));
    em.persist(new Employee(22L, "Bob", "Engineering"));
    em.persist(new Employee(23L, null, "Finance"));
    em.persist(new Employee(24L, "charlie", "HR"));
});
TypedQuery<Employee> query = em.createQuery(
  "SELECT e FROM Employee e ORDER BY lower(e.fullName) ASC NULLS FIRST, e.id DESC", Employee.class
);
List<Employee> sortedEmployees = query.getResultList();
assertNull(sortedEmployees.get(0).getFullName());
assertEquals("alice", sortedEmployees.get(1).getFullName());
assertEquals("Bob", sortedEmployees.get(2).getFullName());
assertEquals("charlie", sortedEmployees.get(3).getFullName());

In this example, we’ve sorted the Employee records by fullName in case-insensitive ascending order (using the lower() function), with null values first, and by the id descending.

5. Conclusion

In this article, we’ve discussed the latest Jakarta Persistence 3.2 with a host of new features and improvements to streamline ORM operations and efficient data handling.

We covered features including simplified persistence configuration, programmatic schema management, and transaction management. Then, we explored usability improvements to JPQL providing additional functions to write better queries.

The complete code for this article is available over on GitHub.

1. Overview

Amazon S3 has cemented itself as the most widely used cloud storage backend due to its scalability, durability, and extensive feature set. This is evident by the fact that many other storage backends aim to be compatible with the S3 API, which is the programming interface used to interact with Amazon S3.

However, applications that rely on the S3 API may face challenges when migrating to alternative storage backends that are not fully compatible. This can lead to significant development effort and vendor lock-in.

This is where S3Proxy comes to the rescue. S3Proxy is an open-source library that addresses the above challenge by providing a compatibility layer between the S3 API and various storage backends. It allows us to seamlessly interact with different storage backends using the already familiar S3 API, without the need for extensive modifications.

In this tutorial, we’ll explore how to integrate S3Proxy in a Spring Boot application and configure it to work with Azure Blob Storage and Google Cloud Storage. We’ll also look at how to set up a file system as a storage backend for local development and testing.

2. How S3Proxy Works

Before we dive into the implementation, let’s take a closer look at how S3Proxy works.

S3Proxy sits between the application and the storage backend, acting as a proxy server. When the application sends a request using the S3 API, it intercepts the request and translates it into the corresponding API call for the configured storage backend. Similarly, the response from the storage backend is then translated back into the S3 format and returned to the application.

Diagram showing how S3Proxy works to translate S3 API calls to other storage backends.

S3Proxy runs using an embedded Jetty server and handles the translation process using Apache jclouds, a multi-cloud toolkit, to interact with various storage backends.

3. Setting up the Project

Before we can use S3Proxy to access various storage backends, we’ll need to include the necessary SDK dependencies and configure our application correctly.

3.1. Dependencies

Let’s start by adding the necessary dependencies to our project’s pom.xml file:

<dependency>
    <groupId>org.gaul</groupId>
    <artifactId>s3proxy</artifactId>
    <version>2.3.0</version>
</dependency>
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>2.28.23</version>
</dependency>

The S3Proxy dependency provides us with the proxy server and the necessary Apache jclouds components that we’ll configure later in the tutorial.

Meanwhile, the Amazon S3 dependency provides us with the S3Client class, a java wrapper around the S3 API.

3.2. Defining Cloud-Agnostic Storage Properties

Now, we’ll define a set of cloud-agnostic storage properties that can be used across different storage backends.

We’ll store these properties in our project’s application.yaml file and use @ConfigurationProperties to map the values to a POJO, which we’ll reference when defining our jclouds components and S3Client bean:

@ConfigurationProperties(prefix = "com.baeldung.storage")
class StorageProperties {
    private String identity;
    private String credential;
    private String region;
    private String bucketName;
    private String proxyEndpoint;
    // standard setters and getters
}

The above properties represent the common configuration parameters required by most storage backends, such as the security credentials, region, and bucket name. In addition, we also declare the proxyEndpoint property, which specifies the URL where our embedded S3Proxy server will be running.

Let’s have a look at a snippet of our application.yaml file that defines the required properties that’ll be mapped to our StorageProperties class automatically:

com:
  baeldung:
    storage:
      identity: ${STORAGE_BACKEND_IDENTITY}
      credential: ${STORAGE_BACKEND_CREDENTIAL}
      region: ${STORAGE_BACKEND_REGION}
      bucket-name: ${STORAGE_BACKEND_BUCKET_NAME}
      proxy-endpoint: ${S3PROXY_ENDPOINT}

We use the ${} property placeholder to load the values of our properties from environment variables.

Accordingly, this setup allows us to externalize our backend storage properties and easily access them in our application.

3.3. Initializing S3Proxy at Application Startup

To ensure that the embedded S3Proxy server is up and running when our application starts, we’ll create an S3ProxyInitializer class that implements the ApplicationRunner interface:

@Component
class S3ProxyInitializer implements ApplicationRunner {
    private final S3Proxy s3Proxy;
    // standard constructor
    @Override
    public void run(ApplicationArguments args) {
        s3Proxy.start();
    }
}

Using constructor injection, we inject an instance of S3Proxy and use it to start the embedded proxy server inside the run() method.

It’s important to note that we haven’t yet created a bean of S3Proxy class, which we’ll be doing in the next section.

4. Accessing Azure Blob Storage

Now, to access Azure Blob Storage using S3Proxy, we’ll create a StorageConfiguration class and inject our cloud-agnostic StorageProperties that we created earlier. We’ll define all the necessary beans in this new class.

First, let’s start by creating a BlobStore bean. This bean represents the underlying storage backend that we’ll be interacting with:

@Bean
public BlobStore azureBlobStore() {
    return ContextBuilder
      .newBuilder("azureblob")
      .credentials(storageProperties.getIdentity(), storageProperties.getCredential())
      .build(BlobStoreContext.class)
      .getBlobStore();
}

We use Apache jclouds’ ContextBuilder to create a BlobStoreContext instance configured with the azureblob provider. Then, we obtain the BlobStore instance from this context.

We also pass the security credentials from our injected StorageProperties instance. For Azure Blob Storage, the name of the storage account will be our identity, and its corresponding access key will be our credential.

With our BlobStore configured, let’s define the S3Proxy bean:

@Bean
public S3Proxy s3Proxy(BlobStore blobStore) {
    return S3Proxy
      .builder()
      .blobStore(blobStore)
      .endpoint(URI.create(storageProperties.getProxyEndpoint()))
      .build();
}

We create our S3Proxy bean using the Blobstore instance and the configured proxyEndpoint in our application.yaml file. This bean is responsible for translating the S3 API calls to the underlying storage backend.

Finally, let’s create our S3Client bean:

@Bean
public S3Client s3Client() {
    S3Configuration s3Configuration = S3Configuration
      .builder()
      .checksumValidationEnabled(false)
      .build();
    AwsCredentials credentials = AwsBasicCredentials.create(
        storageProperties.getIdentity(),
        storageProperties.getCredential()
    );
    return S3Client
      .builder()
      .region(Region.of(storageProperties.getRegion()))
      .endpointOverride(URI.create(storageProperties.getProxyEndpoint()))
      .credentialsProvider(StaticCredentialsProvider.create(credentials))
      .serviceConfiguration(s3Configuration)
      .build();
}

We should note that we disable checksum validation in the S3Configuration. This is necessary because Azure returns a non-MD5 ETag, which would cause an error when using the default configuration.

In this tutorial, for simplicity, we’ll be using the same S3Client bean for other backend storages as well. However, if we’re not using Azure Blob Storage, we can remove this configuration.

With these beans in place, our application can now interact with Azure Blob Storage using the familiar S3 API.

5. Accessing GCP Cloud Storage

Now, to access Google Cloud Storage, we’ll only need to make changes to our BlobStore bean.

First, let’s create a new BlobStore bean for Google Cloud Storage. We’ll use Spring profiles to conditionally create either the Azure or GCP BlobStore bean based on the active profile:

@Bean
@Profile("azure")
public BlobStore azureBlobStore() {
    // ... same as above
}
@Bean
@Profile("gcp")
public BlobStore gcpBlobStore() {
    return ContextBuilder
      .newBuilder("google-cloud-storage")
      .credentials(storageProperties.getIdentity(), storageProperties.getCredential())
      .build(BlobStoreContext.class)
      .getBlobStore();
}

Here, we create a BlobStore instance using the google-cloud-storage provider when the gcp profile is active.

For Google Cloud Storage, the identity will be our Service Account’s email-id and the credential will be the corresponding RSA private key.

With this configuration change, our application can now interact with Google Cloud Storage using the S3 API.

6. Local Development and Testing Using File System

For local development and testing, it’s often convenient to use the local file system as the storage backend. Let’s see how we can configure S3Proxy to work with it.

6.1. Setting up Our Local Configuration

First, let’s add a new property to our StorageProperties class to specify the base directory for our local file system storage:

private String localFileBaseDirectory;
// standard setters and getters

Next, we’ll create a new LocalStorageConfiguration class. We’ll use @Profile to activate this class for the local and test profiles. In this class, we’ll update our beans as needed to work with the local file system:

@Configuration
@Profile("local | test")
@EnableConfigurationProperties(StorageProperties.class)
public class LocalStorageConfiguration {
    
    private final StorageProperties storageProperties;
    // standard constructor
    
    @Bean
    public BlobStore blobStore() {
        Properties properties = new Properties();
        String fileSystemDir = storageProperties.getLocalFileBaseDirectory();
        properties.setProperty("jclouds.filesystem.basedir", fileSystemDir);
        return ContextBuilder
          .newBuilder("filesystem")
          .overrides(properties)
          .build(BlobStoreContext.class)
          .getBlobStore();
    }
    @Bean
    public S3Proxy s3Proxy(BlobStore blobStore) {
        return S3Proxy
          .builder()
          .awsAuthentication(AuthenticationType.NONE, null, null)
          .blobStore(blobStore)
          .endpoint(URI.create(storageProperties.getProxyEndpoint()))
          .build();
    }
}

Here, we create a BlobStore bean using the filesystem provider and configure our base directory.

Then, we create an S3Proxy bean for our file system BlobStore. Notice that we set the authentication type to NONE since we don’t need any authentication for local file system storage.

Finally, let’s create a simplified S3Client bean that doesn’t require any credentials:

@Bean
public S3Client s3Client() {
    return S3Client
      .builder()
      .region(Region.US_EAST_1)
      .endpointOverride(URI.create(storageProperties.getProxyEndpoint()))
      .build();
}

In the above, we hardcode the US_EAST_1 region, however, the region selection doesn’t really matter for this configuration.

With this setup, our application is now configured to use the local file system as its storage backend. This eliminates the need to connect to a real cloud storage service, which reduces cost and speeds up our development and testing cycles.

6.2. Testing Interactions With S3Client

Now, let’s write a test to verify that we can, in fact, use the S3Client to interact with our local file system storage.

We’ll start by defining the necessary properties in our application-local.yaml file:

com:
  baeldung:
    storage:
      proxy-endpoint: http://127.0.0.1:8080
      bucket-name: baeldungbucket
      local-file-base-directory: tmp-store

Next, let’s set up our test class:

@SpringBootTest
@TestInstance(Lifecycle.PER_CLASS)
@ActiveProfiles({ "local", "test" })
@EnableConfigurationProperties(StorageProperties.class)
class LocalFileSystemStorageIntegrationTest {
    @Autowired
    private S3Client s3Client;
    @Autowired
    private StorageProperties storageProperties;
    @BeforeAll
    void setup() {
        File directory = new File(storageProperties.getLocalFileBaseDirectory());
        directory.mkdir();
        String bucketName = storageProperties.getBucketName();
        try {
            s3Client.createBucket(request -> request.bucket(bucketName));
        } catch (BucketAlreadyOwnedByYouException exception) {
            // do nothing
        }
    }
    
    @AfterAll
    void teardown() {
        File directory = new File(storageProperties.getLocalFileBaseDirectory());
        FileUtils.forceDelete(directory);
    }
}

In our setup() method annotated with @BeforeAll, we create the base directory and the bucket if they don’t exist. And, in our teardown() method, we delete the base directory to clean up after our tests.

Finally, let’s write a test to verify that we can upload a file using the S3Client class:

@Test
void whenFileUploaded_thenFileSavedInFileSystem() {
    // Prepare test file to upload
    String key = RandomString.make(10) + ".txt";
    String fileContent = RandomString.make(50);
    MultipartFile fileToUpload = createTextFile(key, fileContent);
    
    // Save file to file system
    s3Client.putObject(request -> 
        request
          .bucket(storageProperties.getBucketName())
          .key(key)
          .contentType(fileToUpload.getContentType()),
        RequestBody.fromBytes(fileToUpload.getBytes()));
    
    // Verify that the file is saved successfully by checking if it exists in the file system
    List<S3Object> savedObjects = s3Client.listObjects(request -> 
        request.bucket(storageProperties.getBucketName())
    ).contents();
    assertThat(savedObjects)
      .anyMatch(savedObject -> savedObject.key().equals(key));
}
private MultipartFile createTextFile(String fileName, String content) {
    byte[] fileContentBytes = content.getBytes();
    InputStream inputStream = new ByteArrayInputStream(fileContentBytes);
    return new MockMultipartFile(fileName, fileName, "text/plain", inputStream);
}

In our test method, we first prepare a MultipartFile with random name and content. We then use the S3Client to upload this file to our test bucket.

Finally, we verify that the file was saved successfully by listing all objects in the bucket and asserting that the file with the random key is present.

7. Conclusion

In this article, we’ve explored integrating S3Proxy in our Spring Boot application.

We walked through the necessary configurations and set up cloud-agnostic storage properties to use across different storage backends.

Then, we looked at how we can access Azure Blob Storage and GCP Cloud Storage using the Amazon S3 API.

Finally, we set up an environment using a file system for local development and testing.

As always, all the code examples used in this article are available over on GitHub.