Quantcast
Channel: Baeldung
Viewing all 4464 articles
Browse latest View live

Overriding Spring Boot Managed Dependency Versions

$
0
0

1. Introduction

Spring Boot is an excellent framework for quickly starting new projects. One of the ways it helps developers quickly create new applications is by defining a set of dependencies suitable for most users.

However, in some cases, it may be necessary to override one or more dependency versions.

In this tutorial, we'll look at how to override Spring Boot managed dependencies and their versions.

2. Spring Boot Bill of Materials (BOM)

Let's start by looking at how Spring Boot manages dependencies. In short, Spring Boot uses a Bill of Materials (BOM) to define dependencies and versions.

Most Spring Boot projects inherit from the spring-boot-starter-parent artifact, which itself inherits from the spring-boot-dependencies artifact. This latter artifact is the Spring Boot BOM, which is just a Maven POM file with a large dependencyManagement section:

<dependencyManagement>
    <dependencies>
        <dependency>
            ...
        </dependency>
        <dependency>
            ...
        </dependency>
    </dependencies>
</dependencyManagement>

By using Maven's dependencyManagement, the BOM can specify default library versions should our application choose to use them. Let's look at an example.

One of the entries in the Spring Boot BOM is as follows:

<dependency>
    <groupId>org.apache.activemq</groupId>
    <artifactId>activemq-amqp</artifactId>
    <version>${activemq.version}</version>
</dependency>

This means any artifact in the project that depends on ActiveMQ will get this version by default.

Also, notice the version is specified using a property placeholder. This is a common practice in the Spring Boot BOM, and it provides the value for this and other properties inside its own properties section.

3. Overriding Spring Boot Managed Dependency Versions

Now that we understand how Spring Boot manages dependency versions, let's look at we can override them.

3.1. Maven

For Maven, we have two options for overriding a Spring Boot managed dependency. First, for any dependency where the Spring Boot BOM specifies the version with a property placeholder, we simply need to set that property in our project POM:

<properties>
    <activemq.version>5.16.3</activemq.version>
</properties>

This would cause any dependency that uses the activemq.version property to use our specified version instead of the one in the Spring Boot BOM.

Additionally, if the version is specified explicitly within the dependency tag in the BOM rather than as a placeholder, then we can simply override the version explicitly in our project dependency entry:

<dependency>
    <groupId>org.apache.activemq</groupId>
    <artifactId>activemq-amqp</artifactId>
    <version>5.16.3</version>
</dependency>

3.2. Gradle

Gradle requires a plugin to honor dependency management from the Spring Boot BOM. Therefore, to get started, we have to include the plugin and import the BOM:

apply plugin: "io.spring.dependency-management"
dependencyManagement {
  imports {
    mavenBom 'io.spring.platform:platform-bom:2.5.5'
  }
}

Now, if we want to override a specific version of a dependency, we just need to specify the corresponding property from the BOM as a Gradle ext property:

ext['activemq.version'] = '5.16.3'

And if there is no property in the BOM to override, we can always specify the version directly when we declare the dependency:

compile 'org.apache.activemq:activemq-amqp:5.16.3'

3.3. Caveats

Several caveats are worth mentioning here.

For starters, it's important to remember that Spring Boot is built and tested using the library versions specified in their BOM. Any time we specify a different library version, there is a risk we can introduce an incompatibility. Therefore it's essential to test our applications anytime we deviate from the standard dependency versions.

Also, remember that these tips only apply when we use the Spring Boot Bill of Materials (BOM). For Maven, this means using the Spring Boot parent. And for Gradle, this means using the Spring dependencies plugin.

4. Finding Dependency Versions

We've seen how Spring Boot manages dependency versions and how we can override them. In this section, we'll look at how we can find the version of a library our project is using. This is useful for identifying library versions and confirming that any overrides we apply to a project are being honored. 

4.1. Maven

Maven provides a goal that we can use to display a list of all dependencies and their versions. For example, if we run the command:

mvn dependency:tree

We should see output similar to:

[INFO] com.baeldung:dependency-demo:jar:0.0.1-SNAPSHOT
[INFO] +- org.springframework.boot:spring-boot-starter-web:jar:2.5.7-SNAPSHOT:compile
[INFO] |  +- org.springframework.boot:spring-boot-starter:jar:2.5.7-SNAPSHOT:compile
[INFO] |  |  +- org.springframework.boot:spring-boot:jar:2.5.7-SNAPSHOT:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-autoconfigure:jar:2.5.7-SNAPSHOT:compile
[INFO] |  |  +- org.springframework.boot:spring-boot-starter-logging:jar:2.5.7-SNAPSHOT:compile
[INFO] |  |  |  +- ch.qos.logback:logback-classic:jar:1.2.6:compile
[INFO] |  |  |  |  \- ch.qos.logback:logback-core:jar:1.2.6:compile

The output shows all artifacts and versions that are dependencies of the project. These dependencies are presented in a tree structure, making it easy to identify how every artifact is imported into the project.

In the example above, the logback-classic artifact is a dependency of the spring-boot-starter-logging library, which itself is a dependency of the spring-boot-starter module. Thus, we can navigate up the tree back to our top-level project.

4.2. Gradle

Gradle provides a task that generates a similar dependency tree. For example, if we run the command:

gradle dependencies

We will get output similar to:

compileClasspath - Compile classpath for source set 'main'.
\--- org.springframework.boot:spring-boot-starter-web -> 1.3.8.RELEASE
     +--- org.springframework.boot:spring-boot-starter:1.3.8.RELEASE
     |    +--- org.springframework.boot:spring-boot:1.3.8.RELEASE
     |    |    +--- org.springframework:spring-core:4.2.8.RELEASE
     |    |    \--- org.springframework:spring-context:4.2.8.RELEASE
     |    |         +--- org.springframework:spring-aop:4.2.8.RELEASE

Just like the Maven output, we can easily identify why each artifact is being pulled into the project, along with the version being used.

5. Conclusion

In the article, we have learned how Spring Boot manages dependency versions. We also saw how to override those dependency versions in both Maven and Gradle. Finally, we saw how we could verify dependency versions in both project types.

       

What Is the –release Option in the Java 9 Compiler?

$
0
0

1. Overview

In this tutorial, we'll learn about Java 9's new command-line option –release. The Java compiler running with the –release N option automatically generates class files compatible with Java version N. We'll discuss how this option relates to the existing compiler command-line options -source and -target.

2. Need for —release Option

To understand the need for a —release option, let us consider a scenario where we need to compile our code with Java 8 and want the compiled classes to be compatible with Java 7.

It was possible to achieve this before Java 9 by using the —source and —target options, where

  • -source: specifies the Java version accepted by the compiler
  • -target: specifies the Java version of the class files to produce

Suppose the compiled program uses APIs exclusively available in the current version of the platform, in our case, Java 8. In that case, the compiled program cannot run on earlier versions like Java 7, regardless of the values passed to the –source and –target options.

Furthermore, we would need to add the –bootclasspath option along with –source and –target to work in Java versions 8 and below.

To streamline this cross-compilation problem, Java 9 introduced the new option —release to simplify the process.

3. Relationship With -source and -target Options

According to the JDK definition, –release N can be expand as:

  • for N < 9, -source N -target N -bootclasspath <documented-APIs-from-N>
  • for N >= 9, -source N -target N –system <documented-APIs-from-N>
Here are a few details about these internal options:
  • -bootclasspath: a semicolon-separated list of directories, JAR archives, and ZIP archives for searching boot class files
  • system: overrides the location of system modules for Java 9 and later versions
Also, the documented APIs are located in $JDK_ROOT/lib/ct.sym, which is a ZIP file containing class files stripped down according to the Java version.

For Java version N< 9, these APIs include the bootstrap classes retrieved from jars located in jre/lib/rt.jar and other related jars.

For Java version N >= 9, these APIs include the bootstrap classes retrieved from the Java modules located in the jdkpath/jmods/ directory.

4. Usage with the Command Line

First, let's create a sample class and use the flip() method of ByteBuffer, which was added in Java 9:

import java.nio.ByteBuffer;
public class TestForRelease {
    public static void main(String[] args) {
        ByteBuffer bb = ByteBuffer.allocate(16);
        bb.flip();
        System.out.println("Baeldung: --release option test is successful");
    }
}

4.1. With Existing -source and -target Option

Let's compile the code in Java 9 using the -source and -target options value as 8:

/jdk9path/bin/javac TestForRelease.java -source 8 -target 8 

The result of this is successful, but with a warning:

warning: [options] bootstrap class path not set in conjunction with -source 8
1 warning

Now, let's run our code on Java 8:

/jdk8path/bin/java TestForRelease

We see that this fails:

Exception in thread "main" java.lang.NoSuchMethodError: java.nio.ByteBuffer.flip()Ljava/nio/ByteBuffer;
at com.corejava.TestForRelease.main(TestForRelease.java:9)

As we can see, this is not what we expected to see with the given value of 8 in our -target option. So although the compiler should consider it, that's not the case.

Let's understand this in more detail. During compilation, we got the warning which we ignored earlier. This is because the Java compiler, by default, compiles with the latest APIs. In other words, though we provided the –source and —target options as 8, the compilation was performed against the Java 9 classes.

Therefore, we must pass another command-line option called –bootclasspath to the Java compiler to choose the correct versions.

Now, let's recompile the same code with –bootclasspath option:

/jdk9path/bin/javac TestForRelease.java -source 8 -target 8 -Xbootclasspath ${jdk8path}/jre/lib/rt.jar

Again, the result of this is successful, and we have no warning.

Now, let's run our code on Java 8, and we see that this is successful:

/jdk8path/bin/java TestForRelease 
Baeldung: --release option test is successful

Although cross-compilation works now, we had to provide three command-line options.

4.2. With –release Option

Now, let’s compile the same code with the –release option:

/jdk9path/bin/javac TestForRelease.java —-release 8

Again, the compilation is successful this time, with no warnings.

Finally, when we run the code on Java 8, we see that it is successful:

/jdk8path/bin/java TestForRelease
Baeldung: --release option test is successful

We see that it's straightforward with the —release option as javac internally sets the correct values for -source, -target, and –bootclasspath.

5. Usage with the Maven Compiler Plugin

Usually, we use build tools like Maven or Gradle and not the command-line javac tool. So in this section, we will see how we can apply the –release option in the maven compiler plugin.

Let's first see how we use the existing -source and -target options:

<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.1</version>
        <configuration>
            <source>1.8</source>
            <target>1.8</target>
        </configuration>
    </plugin>
 </plugins>

Here's how we can use the –release option :

<plugins>
    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.1</version>
        <configuration>
            <release>1.8</release>
        </configuration>
    </plugin>
 </plugins>

Although the behavior is the same as we described earlier, the way we are passing these values to the Java compiler is different.

6. Conclusion

In this article, we learned about the –release option and its relationship with the existing -source and -target options. Then, we saw how to use the option on the command line and with the Maven compiler plugin.

Finally, we saw that the new —release option requires fewer input options for cross-compilation. For this reason, it is recommended to use it whenever possible instead of the -target, -source, and -bootclasspath options.

       

Splitting a Java String by Multiple Delimiters

$
0
0

1. Introduction

We all know that splitting a string is a very common task. However, we often split using just one delimiter.

In this tutorial, we'll discuss in detail different options for splitting a string by multiple delimiters.

2. Splitting a Java String by Multiple Delimiters

In order to show how each of the solutions below performs splitting, we'll use the same example string:

String example = "Mary;Thomas:Jane-Kate";
String[] expectedArray = new String[]{"Mary", "Thomas", "Jane", "Kate"};

2.1. Regex Solution

Programmers often use different regular expressions to define a search pattern for strings. They're also a very popular solution when it comes to splitting a string. So, let's see how we can use a regular expression to split a string by multiple delimiters in Java.

First, we don't need to add a new dependency since regular expressions are available in the java.util.regex packageWe just have to define an input string we want to split and a pattern.

The next step is to apply a pattern. A pattern can match zero or multiple times. To split by all different delimiters, we should use the OR operator. Using this logical operator, we define that an input string must match any one of the characters in the pattern.

We'll write a simple test to demonstrate this approach:

String[] names = example.split(";|:|-");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);

We've defined a test string with names that should be split by characters in the pattern. The pattern itself contains a semicolon, a colon, and a hyphen. When applied to the example string, we'll get four names in the array.

2.2. Guava Solution

Guava also offers a solution for splitting a string by multiple delimiters. Its solution is based on a Splitter class. This class extracts the substrings from an input string using the separator sequence. We can define this sequence in multiple ways:

  • as a single character
  • a fixed string
  • a regular expression
  • CharMatcher instance

Further on, the Splitter class has two methods for defining the delimiters. So, let's test both of them.

Firstly, we'll add the Guava dependency:

<dependency>
    <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>31.0.1-jre</version>
</dependency>

Then, we'll start with the on method: public static Splitter on(Pattern separatorPattern)

It takes the pattern for defining the delimiters for splitting. First, we'll define the combination of the delimiters and compile the pattern. After that, we can split the string.

In our example, we'll use a regular expression to specify the delimiters:

Iterable<String> names = Splitter.on(Pattern.compile(";|:|-")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

The other method is the onPattern method: public static Splitter onPattern(String separatorPattern)

The difference between this and the previous method is that the onPattern method takes the pattern as a string. There is no need to compile it like in the on method. We'll define the same combination of the delimiters for testing the onPattern method:

Iterable<String> names = Splitter.onPattern(";|:|-").split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

In both tests, we managed to split the string and get the array with four names.

Since we're splitting an input string with multiple delimiters, we can also use the anyOf method in the CharMatcher class:

Iterable<String> names = Splitter.on(CharMatcher.anyOf(";:-")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);

This option comes only with the on method in the Splitter class. The outcome is the same as for the previous two tests.

2.3. Apache Commons Solution

The last option we'll discuss is available in the Apache Commons Lang 3 library.

We'll start by adding the Apache Commons Lang dependency to our pom.xml file:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-lang3</artifactId>
    <version>3.12.0</version>
</dependency>

Next, we'll use the split method from the StringUtils class:

String[] names = StringUtils.split(example, ";:-");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);

We only have to define all the characters we'll use to split the string. Calling the split method will divide the example string into four names.

3. Conclusion

In this article, we've seen different options for splitting an input string by multiple delimiters. First, we discussed a solution based on regular expressions and plain Java. Later, we showed different options available in Guava. Finally, we wrapped up our examples with a solution based on the Apache Commons Lang 3 library.

As always, the code for these examples is available over on GitHub.

       

Switching Between Frames Using Selenium WebDriver in Java

$
0
0

1. Introduction

Managing frames and iframes is a crucial skill for test automation engineers. Selenium WebDriver allows us to work with both frames and iframes in the same way.

In this tutorial, we’ll explore a few distinct methods to switch between frames with Selenium WebDriver. These methods include using a WebElement, a name or ID, and an index.

By the end, we’ll be well-equipped to tackle iframe interactions confidently, enhancing the scope and effectiveness of our automation tests.

2. Difference Between Frame and Iframe

The terms frames and iframes are often encountered in web development. Each serves a distinct purpose in structuring and enhancing web content.

Frames, an older HTML feature, partition a web page into separate sections where each section has its own dedicated HTML document. Although frames are deprecated, they are still encountered on the web.

Iframes (inline frames) embed a separate HTML document within a single frame on a web page. They are widely used in web pages for various purposes, such as incorporating external content like maps, social media widgets, advertisements, or interactive forms seamlessly.

3. Switch to Frame Using a WebElement

Switching using a WebElement is the most flexible option. We can find the frame using any selector, like ID, name, CSS selector, or XPath, to find the specific iframe we want:

WebElement iframeElement = driver.findElement(By.cssSelector("#frame_selector"));
driver.switchTo().frame(iframeElement);

For a more reliable approach, it’s better to use explicit waits, such as ExpectedConditions.frameToBeAvailableAndSwitchToIt():

WebElement iframeElement = driver.findElement(By.cssSelector("#frame_selector"));
new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt(iframeElement))

This helps ensure that the iframe is fully loaded and ready for interaction, reducing potential timing issues and making our automation scripts more robust when working with iframes.

4. Switch to Frame Using a Name or ID

Another method to navigate into a frame is by leveraging its name or ID attribute. This approach is straightforward and particularly useful when these attributes are unique:

driver.switchTo().frame("frame_name_or_id");

Using explicit wait ensures that the frame is fully loaded and prepared for interaction:

new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt("frame_name_or_id"));

5. Switch to Frame Using an Index

Selenium allows us to switch to a frame using a simple numerical index. The first frame has an index of 0, the second has an index of 1, and so on. Switching to frames using an index offers a flexible and convenient approach, especially when an iframe lacks a distinct name or ID.

By specifying the index of the frame, we can seamlessly navigate through the frames within a web page:

driver.switchTo().frame(0);

Explicit wait makes code more robust:

new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt(0));

However, it’s important to use frame indexes with caution because the order of frames can change on a web page. If a frame is added or removed, it can disrupt the index order, leading to potential failures in our automated tests.

6. Switching to a Nested Frame

When frames are nested, it means that one or more frames are embedded within other frames, forming a parent-child relationship. This hierarchy can continue to multiple levels, resulting in complex nested frame structures:

<!DOCTYPE html>
<html>
<head>
    <title>Frames Example</title>
</head>
<body>
    <h1>Main Content</h1>
    <p>This is the main content of the web page.</p>
    <iframe id="outer_frame" width="400" height="300">
        <h2>Outer Frame</h2>
        <p>This is the content of the outer frame.</p>
        <iframe id="inner_frame" width="300" height="200">
            <h3>Inner Frame</h3>
            <p>This is the content of the inner frame.</p>
        </iframe>
    </iframe>
    <p>More content in the main page.</p>
</body>
</html>

Selenium provides a straightforward method for handling them. To access an inner frame within a nested frame structure, we should switch from the outermost to the inner one sequentially. This allows us to access the elements within each frame as we go deeper into the hierarchy:

driver.switchTo().frame("outer_frame");
driver.switchTo().frame("inner_frame");

7. Switching Back From Frame or Nested Frame

Selenium provides a mechanism to switch back from frames and nested frames with distinct methods. For returning to the main content, we can use the method defaultContent():

driver.switchTo().defaultContent()

It essentially exits all frames and ensures that our subsequent interactions take place in the main context of the web page. This is particularly useful when we’ve completed tasks within frames and need to continue our actions in the main content.

For moving to the parent frame, we can use the parentFrame() method:

driver.switchTo().parentFrame()

This method allows us to transition from a child frame back to its immediate parent frame. It’s particularly valuable when we’re working with nested frames, each embedded within another, and we need to move between them.

8. Conclusion

In this article, we’ve explored frames and how to work with them using Selenium WebDriver. We’ve learned different methods to switch between them using WebElements, names or IDs, and numerical indices. These methods offer flexibility and precision.

By using explicit waits, we’ve ensured reliable interactions with frames, reducing potential issues and making our automation scripts more robust.

We’ve learned how to handle nested frames by sequentially switching from the outermost frame to the inner ones, allowing us to access elements within complex nested frame structures. We also learned how to switch back to the main content as well as move to the parent frame.

In conclusion, mastering frame and iframe handling with Selenium WebDriver is vital for test automation engineers. With the knowledge and techniques, we’re well-prepared to confidently deal with frames.

As always, the code presented in this article is available over on GitHub.

       

Why Is sun.misc.Unsafe.park Actually Unsafe?

$
0
0

1. Overview

Java provides certain APIs for internal use and discourages unnecessary use in other cases. The JVM developers gave the packages and classes names such as Unsafe, which should warn developers.  However, often, it doesn’t stop developers from using these classes.

In this tutorial, we’ll learn why Unsafe.park() is actually unsafe. The goal isn’t to scare but to educate and provide a better insight into the interworking of the park() and unpark(Thread) methods.

2. Unsafe

The Unsafe class contains a low-level API that aims to be used only with internal libraries. However, sun.misc.Unsafe is still accessible even after the introduction of JPMS. This was done to maintain backward compatibility and support all the libraries and frameworks that might use this API. In more detail, the reasons are explained in JEP 260,

In this article, we won’t use Unsafe directly but rather the LockSupport class from the java.util.concurrent.locks package that wraps calls to Unsafe:

public static void park() {
    UNSAFE.park(false, 0L);
}
public static void unpark(Thread thread) {
    if (thread != null)
        UNSAFE.unpark(thread);
}

3. park() vs. wait()

The park() and unpark(Thread) functionality are similar to wait() and notify(). Let’s review their differences and understand the danger of using the first instead of the second.

3.1. Lack of Monitors

Unlike wait() and notify(), park() and unpark(Thread) don’t require a monitor. Any code that can get a reference to the parked thread can unpark it. This might be useful in low-level code but can introduce additional complexity and hard-to-debug problems. 

Monitors are designed in Java so that a thread cannot use it if it hasn’t acquired it in the first place. This is done to prevent race conditions and simplify the synchronization process. Let’s try to notify a thread without acquiring it’s monitor:

@Test
@Timeout(3)
void giveThreadWhenNotifyWithoutAcquiringMonitorThrowsException() {
    Thread thread = new Thread() {
        @Override
        public void run() {
            synchronized (this) {
                try {
                    this.wait();
                } catch (InterruptedException e) {
                    // The thread was interrupted
                }
            }
        }
    };
    assertThrows(IllegalMonitorStateException.class, () -> {
        thread.start();
        Thread.sleep(TimeUnit.SECONDS.toMillis(1));
        thread.notify();
        thread.join();
    });
}

Trying to notify a thread without acquiring a monitor results in IllegalMonitorStateException. This mechanism enforces better coding standards and prevents possible hard-to-debug problems.

Now, let’s check the behavior of park() and unpark(Thread):

@Test
@Timeout(3)
void giveThreadWhenUnparkWithoutAcquiringMonitor() {
    Thread thread = new Thread(LockSupport::park);
    assertTimeoutPreemptively(Duration.of(2, ChronoUnit.SECONDS), () -> {
        thread.start();
        LockSupport.unpark(thread);
    });
}

We can control threads with little work. The only thing required is the reference to the thread. This provides us with more power over locking, but at the same time, it exposes us to many more problems.

It’s clear why park() and unpark(Thread) might be helpful for low-level code, but we should avoid this in our usual application code because it might introduce too much complexity and unclear code.

3.2. Information About the Context

The fact that no monitors are involved also might reduce the information about the context. In other words, the thread is parked, and it’s unclear why, when, and if other threads are parked for the same reason. Let’s run two threads:

public class ThreadMonitorInfo {
    private static final Object MONITOR = new Object();
    public static void main(String[] args) throws InterruptedException {
        Thread waitingThread = new Thread(() -> {
            try {
                synchronized (MONITOR) {
                    MONITOR.wait();
                }
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }, "Waiting Thread");
        Thread parkedThread = new Thread(LockSupport::park, "Parked Thread");
        waitingThread.start();
        parkedThread.start();
        waitingThread.join();
        parkedThread.join();
    }
}

Let’s check the thread dump using jstack:

"Parked Thread" #12 prio=5 os_prio=31 tid=0x000000013b9c5000 nid=0x5803 waiting on condition [0x000000016e2ee000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at com.baeldung.park.ThreadMonitorInfo$$Lambda$2/284720968.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:750)
"Waiting Thread" #11 prio=5 os_prio=31 tid=0x000000013b9c4000 nid=0xa903 in Object.wait() [0x000000016e0e2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007401811d8> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:502)
        at com.baeldung.park.ThreadMonitorInfo.lambda$main$0(ThreadMonitorInfo.java:12)
        - locked <0x00000007401811d8> (a java.lang.Object)
        at com.baeldung.park.ThreadMonitorInfo$$Lambda$1/1595428806.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:750)

While analyzing the thread dump, it’s clear that the parked thread contains less information. Thus, it might create a situation when a certain thread problem, even with a thread dump, would be hard to debug.

An additional benefit of using specific concurrent structures or specific locks would provide even more context in the thread dumps, giving more information about the application state. Many JVM concurrent mechanisms are using park() internally. However, if a thread dump explains that the thread is waiting, for example, on a CyclicBarrier, it’s waiting for other threads.

3.3. Interrupted Flag

Another interesting thing is the difference in handling interrupts. Let’s review the behavior of a waiting thread:

@Test
@Timeout(3)
void givenWaitingThreadWhenNotInterruptedShouldNotHaveInterruptedFlag() throws InterruptedException {
    Thread thread = new Thread() {
        @Override
        public void run() {
            synchronized (this) {
                try {
                    this.wait();
                } catch (InterruptedException e) {
                    // The thread was interrupted
                }
            }
        }
    };
    thread.start();
    Thread.sleep(TimeUnit.SECONDS.toMillis(1));
    thread.interrupt();
    thread.join();
    assertFalse(thread.isInterrupted(), "The thread shouldn't have the interrupted flag");
}

If we’re interrupting a thread from its waiting state, the wait() method would immediately throw an InterruptedException and clear the interrupted flag. That’s why the best practice is to use while loops checking the waiting conditions instead of the interrupted flag.

In contrast, a parked thread isn’t interrupted immediately and rather does it on its terms. Also, the interrupt doesn’t cause an exception, and the thread just returns from the park() method. Subsequently, the interrupted flag isn’t reset, as happens while interrupting a waiting thread:

@Test
@Timeout(3)
void givenParkedThreadWhenInterruptedShouldNotResetInterruptedFlag() throws InterruptedException {
    Thread thread = new Thread(LockSupport::park);
    thread.start();
    thread.interrupt();
    assertTrue(thread.isInterrupted(), "The thread should have the interrupted flag");
    thread.join();
}

Not accounting for this behavior may cause problems while handling the interruption. For example, if we don’t reset the flag after the interrupt on a parked thread, it may cause subtle bugs.

3.4. Preemptive Permits

Parking and unparking work on the idea of a binary semaphore. Thus, we can provide a thread with a preemptive permit. For example, we can unpark a thread, which would give it a permit, and the subsequent park won’t suspend it but would take the permit and proceed:

private final Thread parkedThread = new Thread() {
    @Override
    public void run() {
        LockSupport.unpark(this);
        LockSupport.park();
    }
};
@Test
void givenThreadWhenPreemptivePermitShouldNotPark()  {
    assertTimeoutPreemptively(Duration.of(1, ChronoUnit.SECONDS), () -> {
        parkedThread.start();
        parkedThread.join();
    });
}

This technique can be used in some complex synchronization scenarios. As the parking uses a binary semaphore, we cannot add up permits, and two unpark calls wouldn’t produce two permits:

private final Thread parkedThread = new Thread() {
    @Override
    public void run() {
        LockSupport.unpark(this);
        LockSupport.unpark(this);
        LockSupport.park();
        LockSupport.park();
    }
};
@Test
void givenThreadWhenRepeatedPreemptivePermitShouldPark()  {
    Callable<Boolean> callable = () -> {
        parkedThread.start();
        parkedThread.join();
        return true;
    };
    boolean result = false;
    Future<Boolean> future = Executors.newSingleThreadExecutor().submit(callable);
    try {
        result = future.get(1, TimeUnit.SECONDS);
    } catch (InterruptedException | ExecutionException | TimeoutException e) {
        // Expected the thread to be parked
    }
    assertFalse(result, "The thread should be parked");
}

In this case, the thread would have only one permit, and the second call to the park() method would park the thread. This might produce some undesired behavior if not appropriately handled.

4. Conclusion

In this article, we learned why the park() method is considered unsafe. JVM developers hide or suggest not to use internal APIs for specific reasons. This is not only because it might be dangerous and produce unexpected results at the moment but also because these APIs might be subject to change in the future, and their support isn’t guaranteed.

Additionally, these APIs require extensive learning about underlying systems and techniques, which may differ from platform to platform. Not following this might result in fragile code and hard-to-debug problems.

As always, the code in this article is available over on GitHub.

       

HashSet toArray() Method in Java

$
0
0

1. Introduction

HashSet is one of the common data structures that we can utilize in Java Collеctions.

In this tutorial, we’ll dive into the toArray() method of the HashSet class, illustrating how to convert a HashSet to an array.

2. Convеrting HashSеt to Array

Let’s look at a set of examples that illustrate how to apply the toArray() method to convert a HashSet into an array.

2.1. HashSet to an Array of Strings

In the following method, we are seeking to convert a HashSet of strings into an array of strings:

@Test
public void givenStringHashSet_whenConvertedToArray_thenArrayContainsStringElements() {
    HashSet<String> stringSet = new HashSet<>();
    stringSet.add("Apple");
    stringSet.add("Banana");
    stringSet.add("Cherry");
    // Convert the HashSet of Strings to an array of Strings
    String[] stringArray = stringSet.toArray(new String[0]);
    // Test that the array is of the correct length
    assertEquals(3, stringArray.length);
    for (String str : stringArray) {
        assertTrue(stringSet.contains(str));
    }
}

Here, a HashSet named stringSet is initialized with three String elements: (“Apple” “Banana” and “Cherry“). To be specific, the test method ensures that the resulting array has a length of 3, matching the number of elements in the HashSet.

Then, it iterates through the stringArray and checks if each element is contained within the original stringSet, asserting that the array indeed contains the String elements, confirming the successful conversion of the HashSet to a String array. 

2.2. HashSet to an Array of Integers

Additionally, we can utilize the toArray() method to convert an Integer HashSet into an array of Integers as follows:

@Test
public void givenIntegerHashSet_whenConvertedToArray_thenArrayContainsIntegerElements() {
    HashSet<Integer> integerSet = new HashSet<>();
    integerSet.add(5);
    integerSet.add(10);
    integerSet.add(15);
    // Convert the HashSet of Integers to an array of Integers
    Integer[] integerArray = integerSet.toArray(new Integer[0]);
    // Test that the array is of the correct length
    assertEquals(3, integerArray.length);
    for (Integer num : integerArray) {
        assertTrue(integerSet.contains(num));
    }
    assertTrue(integerSet.contains(5));
    assertTrue(integerSet.contains(10));
    assertTrue(integerSet.contains(15));
}

Here, we create a HashSet named integerSet with three Integer elements: (5, 10, and 15). The test method is responsible for verifying the conversion of this Integer HashSet into an array of Integers, referred to as integerArray.

Moreover, it confirms that the resulting array has length = 3, corresponding to the number of elements in the original HashSet. Subsequently, the method iterates through integerArray, ensuring each element is contained within the original integerSet.

3. Conclusion

In conclusion, it is easy to convert a HashSet into an array using the toArray() method of the HashSet class. This can also be useful while handling array-based data structures or some other components in our Java apps.

As always, the complete code samples for this article can be found over on GitHub.

       

Convert ResultSet Into Map

$
0
0

1. Introduction

Java applications widely use the Java Database Connectivity (JDBC) API to connect and execute queries on a database. ResultSet is a tabular representation of the data extracted by these queries.

In this tutorial, we’ll learn how to convert the data of a JDBC ResultSet into a Map.

2. Setup

We’ll write a few test cases to achieve our goal. Our data source will be an H2 database. H2 is a fast, open-source, in-memory database that supports the JDBC API. Let’s add the relevant Maven dependency:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

Once the database connection is ready, we’ll write a method to do the initial data setup for our test cases. To achieve this, we first create a JDBC Statement, and subsequently create a database table named employee using the same. The employee table consists of columns named empId, empName, and empCity that will hold information about the ID, name, and city of the employee. We now can insert sample data in the table using the Statement.execute() method:

void initialDataSetup() throws SQLException {
    Statement statement = connection.createStatement();
    String sql = "CREATE TABLE employee ( " +
      "empId INTEGER not null, " +
      "empName VARCHAR(50), " +
      "empCity VARCHAR(50), " +
      "PRIMARY KEY (empId))";
    statement.execute(sql);
    List<String> sqlQueryList = Arrays.asList(
      "INSERT INTO employee VALUES (1, 'Steve','London')", 
      "INSERT INTO employee VALUES (2, 'John','London')", 
      "INSERT INTO employee VALUES (3, 'David', 'Sydney')",
      "INSERT INTO employee VALUES (4, 'Kevin','London')", 
      "INSERT INTO employee VALUES (5, 'Jade', 'Sydney')");
    
    for (String query: sqlQueryList) {
        statement.execute(query);
    }
}

3. ResultSet to Map

Now that the sample data is present in the database, we can query it for extraction. Querying the database gives the output in the form of a ResultSet. Our goal is to transform the data from this ResultSet into a Map where the key is the city name, and the value is the list of employee names in that city.

3.1. Using Java 7

We’ll first create a PreparedStatement from the database connection and provide an SQL query to it. Then, we can use the PreparedStatement.executeQuery() method to get the ResultSet.

We can now iterate over the ResultSet data and fetch the column data individually. In order to do this, we can use the ResultSet.getString() method by passing the column name of the employee table into it. After that, we can use the Map.containsKey() method to check if the map already contains an entry for that city name. If there’s no key found for that city, we’ll add an entry with the city name as the key and an empty ArrayList as the value. Then, we add the employee’s name to the list of employee names for that city:

@Test
void whenUsingContainsKey_thenConvertResultSetToMap() throws SQLException {
    ResultSet resultSet = connection.prepareStatement(
        "SELECT * FROM employee").executeQuery();
    Map<String, List<String>> valueMap = new HashMap<>();
    while (resultSet.next()) {
        String empCity = resultSet.getString("empCity");
        String empName = resultSet.getString("empName");
        if (!valueMap.containsKey(empCity)) {
            valueMap.put(empCity, new ArrayList<>());
        }
        valueMap.get(empCity).add(empName);
    }
    assertEquals(3, valueMap.get("London").size());
}

3.2. Using Java 8

Java 8 introduced the concept of lambda expressions and default methods. We can leverage them in our implementation to simplify the entry of new keys in the output map. We can use the method named computeIfAbsent() of the Map class, which takes two parameters: a key and a mapping function. If the key is found, then it returns the relevant value; otherwise, it will use the mapping function to create the default value and store it in the map as a new key-value pair. We can add the employee’s name to the list afterward.

Here’s the modified version of the previous test case using Java 8:

@Test
void whenUsingComputeIfAbsent_thenConvertResultSetToMap() throws SQLException {
    ResultSet resultSet = connection.prepareStatement(
        "SELECT * FROM employee").executeQuery();
    Map<String, List<String>> valueMap = new HashMap<>();
    while (resultSet.next()) {
        String empCity = resultSet.getString("empCity");
        String empName = resultSet.getString("empName");
        valueMap.computeIfAbsent(empCity, data -> new ArrayList<>()).add(empName);
    }
    assertEquals(3, valueMap.get("London").size());
}

3.3. Using Apache Commons DbUtils

Apache Commons DbUtils is a third-party library that provides additional and simplified functionalities for JDBC operations. It provides an interesting interface named ResultSetHandler that consumes JDBC ResultSet as input and allows us to transform it into the desired object that the application expects. Moreover, this library uses the QueryRunner class to run SQL queries on the database table. The QueryRunner.query() method takes the database connection, SQL query, and ResultSetHandler as input and directly returns the expected format.

Let’s look at an example of how to create a Map from a ResultSet using ResultSetHandler:

@Test
void whenUsingDbUtils_thenConvertResultSetToMap() throws SQLException {
    ResultSetHandler <Map<String, List<String>>> handler = new ResultSetHandler <Map <String, List<String>>>() {
        public Map<String, List<String>> handle(ResultSet resultSet) throws SQLException {
            Map<String, List<String>> result = new HashMap<>();
            while (resultSet.next()) {
                String empCity = resultSet.getString("empCity");
                String empName = resultSet.getString("empName");
                result.computeIfAbsent(empCity, data -> new ArrayList<>()).add(empName);
            }
            return result;
        }
    };
    QueryRunner run = new QueryRunner();
    Map<String, List<String>> valueMap = run.query(connection, "SELECT * FROM employee", handler);
    assertEquals(3, valueMap.get("London").size());
}

4. Conclusion

To summarize, we took a look at several ways we can aggregate data from ResultSet and convert it into a Map using Java 7, Java 8, and the Apache DbUtils library.

As always, the full code for this article can be found over on GitHub.

       

MongoDB Atlas Search Using the Java Driver and Spring Data

$
0
0

1. Introduction

In this tutorial, we’ll learn how to use Atlas Search functionalities using the Java MongoDB driver API. By the end, we’ll have a grasp on creating queries, paginating results, and retrieving meta-information. Also, we’ll cover refining results with filters, adjusting result scores, and selecting specific fields to be displayed.

2. Scenario and Setup

MongoDB Atlas has a free forever cluster that we can use to test all features. To showcase Atlas Search functionalities, we’ll only need a service class. We’ll connect to our collection using MongoTemplate.

2.1. Dependencies

First, to connect to MongoDB, we’ll need spring-boot-starter-data-mongodb:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
    <version>3.1.2</version>
</dependency>

2.2. Sample Dataset

Throughout this tutorial, we’ll use the movies collection from MongoDB Atlas’s sample_mflix sample dataset to simplify examples. It contains data about movies since the 1900s, which will help us showcase the filtering capabilities of Atlas Search.

2.3. Creating an Index With Dynamic Mapping

For Atlas Search to work, we need indexes. These can be static or dynamic. A static index is helpful for fine-tuning, while a dynamic one is an excellent general-purpose solution. So, let’s start with a dynamic index.

There are a few ways to create search indexes (including programmatically); we’ll use the Atlas UI. There, we can do this by accessing Search from the menu, selecting our cluster, then clicking Go to Atlas Search:

Creating an index

 

After clicking on Create Search Index, we’ll choose the JSON Editor to create our index, then click Next:

JSON editor

Finally, on the next screen, we choose our target collection, a name for our index, and input our index definition:

{
    "mappings": {
        "dynamic": true
    }
}

We’ll use the name idx-queries for this index throughout this tutorial. Note that if we name our index default, we don’t need to specify its name when creating queries. Most importantly, dynamic mappings are a simple choice for more flexible, frequently changing schemas.

By setting mappings.dynamic to true, Atlas Search automatically indexes all dynamically indexable and supported field types in a document. While dynamic mappings provide convenience, especially when the schema is unknown, they tend to consume more disk space and might be less efficient compared to static ones.

2.4. Our Movie Search Service

We’ll base our examples on a service class containing some search queries for our movies, extracting interesting information from them. We’ll slowly build them up to more complex queries:

@Service
public class MovieAtlasSearchService {
    private final MongoCollection<Document> collection;
    public MovieAtlasSearchService(MongoTemplate mongoTemplate) {
        MongoDatabase database = mongoTemplate.getDb();
        this.collection = database.getCollection("movies");
    }
    // ...
}

All we need is a reference to our collection for future methods.

3. Constructing a Query

Atlas Search queries are created via pipeline stages, represented by a List<Bson>. The most essential stage is Aggregates.search(), which receives a SearchOperator and, optionally, a SearchOptions object. Since we called our index idx-queries instead of default, we must include its name with SearchOptions.searchOptions().index(). Otherwise, we’ll get no errors and no results.

Many search operators are available to define how we want to conduct our query. In this example, we’ll find movies by tags using SearchOperator.text(), which performs a full-text search. We’ll use it to search the contents of the fullplot field with SearchPath.fieldPath(). We’ll omit static imports for readability:

public Collection<Document> moviesByKeywords(String keywords) {
    List<Bson> pipeline = Arrays.asList(
        search(
          text(
            fieldPath("fullplot"), keywords
          ),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

Also, the second stage in our pipeline is Aggregates.project(), which represents a projection. If not specified, our query results will include all the fields in our documents. But we can set it and choose which fields we want (or don’t want) to appear in our results. Note that specifying a field for inclusion implicitly excludes all other fields except the _id field. So, in this case, we’re excluding the _id field and passing a list of the fields we want. Note we can also specify nested fields, like imdb.rating.

To execute the pipeline, we call aggregate() on our collection. This returns an object we can use to iterate on results. Finally, for simplicity, we call into() to iterate over results and add them to a collection, which we return. Note that a big enough collection can exhaust the memory in our JVM. We’ll see how to eliminate this concern by paginating our results later on.

Most importantly, pipeline stage order matters. We’ll get an error if we put the project() stage before search().

Let’s take a look at the first two results of calling moviesByKeywords(“space cowboy”) on our service:

[
    {
        "title": "Battle Beyond the Stars",
        "fullplot": "Shad, a young farmer, assembles a band of diverse mercenaries in outer space to defend his peaceful planet from the evil tyrant Sador and his armada of aggressors. Among the mercenaries are Space Cowboy, a spacegoing truck driver from Earth; Gelt, a wealthy but experienced assassin looking for a place to hide; and Saint-Exmin, a Valkyrie warrior looking to prove herself in battle.",
        "year": 1980,
        "imdb": {
            "rating": 5.4
        }
    },
    {
        "title": "The Nickel Ride",
        "fullplot": "Small-time criminal Cooper manages several warehouses in Los Angeles that the mob use to stash their stolen goods. Known as \"the key man\" for the key chain he always keeps on his person that can unlock all the warehouses. Cooper is assigned by the local syndicate to negotiate a deal for a new warehouse because the mob has run out of storage space. However, Cooper's superior Carl gets nervous and decides to have cocky cowboy button man Turner keep an eye on Cooper.",
        "year": 1974,
        "imdb": {
            "rating": 6.7
        }
    },
    (...)
]

3.1. Combining Search Operators

It’s possible to combine search operators using SearchOperator.compound(). In this example, we’ll use it to include must and should clauses. A must clause contains one or more conditions for matching documents. On the other hand, a should clause contains one or more conditions that we’d prefer our results to include.

This alters the score so the documents that meet these conditions appear first:

public Collection<Document> late90sMovies(String keywords) {
    List<Bson> pipeline = asList(
        search(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000)
            ))
            .should(asList(
              text(
                fieldPath("fullplot"), keywords
              )
            )),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

We kept the same searchOptions() and projected fields from our first query. But, this time, we moved text() to a should clause because we want the keywords to represent a preference, not a requirement.

Then, we created a must clause, including SearchOperator.numberRange(), to only show movies from 1995 to 2000 (exclusive) by restricting the values on the year field. This way, we only return movies from that era.

Let’s see the first two results for hacker assassin:

[
    {
        "title": "Assassins",
        "fullplot": "Robert Rath is a seasoned hitman who just wants out of the business with no back talk. But, as things go, it ain't so easy. A younger, peppier assassin named Bain is having a field day trying to kill said older assassin. Rath teams up with a computer hacker named Electra to defeat the obsessed Bain.",
        "year": 1995,
        "imdb": {
            "rating": 6.3
        }
    },
    {
        "fullplot": "Thomas A. Anderson is a man living two lives. By day he is an average computer programmer and by night a hacker known as Neo. Neo has always questioned his reality, but the truth is far beyond his imagination. Neo finds himself targeted by the police when he is contacted by Morpheus, a legendary computer hacker branded a terrorist by the government. Morpheus awakens Neo to the real world, a ravaged wasteland where most of humanity have been captured by a race of machines that live off of the humans' body heat and electrochemical energy and who imprison their minds within an artificial reality known as the Matrix. As a rebel against the machines, Neo must return to the Matrix and confront the agents: super-powerful computer programs devoted to snuffing out Neo and the entire human rebellion.",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix"
    },
    (...)
]

4. Scoring the Result Set

When we query documents with search(), the results appear in order of relevance. This relevance is based on the calculated score, from highest to lowest. This time, we’ll modify late90sMovies() to receive a SearchScore modifier to boost the relevance of the plot keywords in our should clause:

public Collection<Document> late90sMovies(String keywords, SearchScore modifier) {
    List<Bson> pipeline = asList(
        search(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000)
            ))
            .should(asList(
              text(
                fieldPath("fullplot"), keywords
              )
              .score(modifier)
            )),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating"),
          metaSearchScore("score")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

Also, we include metaSearchScore(“score”) in our fields list to see the score for each document in our results. For example, we can now multiply the relevance of our “should” clause by the value of the imdb.votes field like this:

late90sMovies(
  "hacker assassin", 
  SearchScore.boost(fieldPath("imdb.votes"))
)

And this time, we can see that The Matrix comes first, thanks to the boost:

[
    {
        "fullplot": "Thomas A. Anderson is a man living two lives (...)",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix",
        "score": 3967210.0
    },
    {
        "fullplot": "(...) Bond also squares off against Xenia Onatopp, an assassin who uses pleasure as her ultimate weapon.",
        "imdb": {
            "rating": 7.2
        },
        "year": 1995,
        "title": "GoldenEye",
        "score": 462604.46875
    },
    (...)
]

4.1. Using a Score Function

We can achieve greater control by using a function to alter the score of our results. Let’s pass a function to our method that adds the value of the year field to the natural score. This way, newer movies end up with a higher score:

late90sMovies(keywords, function(
  addExpression(asList(
    pathExpression(
      fieldPath("year"))
      .undefined(1), 
    relevanceExpression()
  ))
));

That code starts with a SearchScore.function(), which is a SearchScoreExpression.addExpression() since we want an add operation. Then, since we want to add a value from a field, we use a SearchScoreExpression.pathExpression() and specify the field we want: year. Also, we call undefined() to determine a fallback value for year in case it’s missing. In the end, we call relevanceExpression() to return the document’s relevance score, which is added to the value of year.

When we execute that, we’ll see “The Matrix” now appears first, along with its new score:

[
    {
        "fullplot": "Thomas A. Anderson is a man living two lives (...)",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix",
        "score": 2003.67138671875
    },
    {
        "title": "Assassins",
        "fullplot": "Robert Rath is a seasoned hitman (...)",
        "year": 1995,
        "imdb": {
            "rating": 6.3
        },
        "score": 2003.476806640625
    },
    (...)
]

That’s useful for defining what should have greater weight when scoring our results.

5. Getting Total Rows Count From Metadata

If we need to get the total number of results in a query, we can use Aggregates.searchMeta() instead of search() to retrieve metadata information only. With this method, no documents are returned. So, we’ll use it to count the number of movies from the late 90s that also contain our keywords.

For meaningful filtering, we’ll also include the keywords in our must clause:

public Document countLate90sMovies(String keywords) {
    List<Bson> pipeline = asList(
        searchMeta(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000),
              text(
                fieldPath("fullplot"), keywords
              )
            )),
          searchOptions()
            .index("idx-queries")
            .count(total())
        )
    );
    return collection.aggregate(pipeline)
      .first();
}

This time, searchOptions() includes a call to SearchOptions.count(SearchCount.total()), which ensures we get an exact total count (instead of a lower bound, which is faster depending on the collection size). Also, since we expect a single object in the results, we call first() on aggregate().

Finally, let’s see what is returned for countLate90sMovies(“hacker assassin”):

{
    "count": {
        "total": 14
    }
}

This is useful for getting information about our collection without including documents in our results.

6. Faceting on Results

In MongoDB Atlas Search, a facet query is a feature that allows retrieving aggregated and categorized information about our search results. It helps us analyze and summarize data based on different criteria, providing insights into the distribution of search results.

Also, it enables grouping search results into different categories or buckets and retrieving counts or additional information about each category. This helps answer questions like “How many documents match a specific category?” or “What are the most common values for a certain field within the results?”

6.1. Creating a Static Index

In our first example, we’ll create a facet query to give us information about genres from movies since the 1900s and how these relate. We’ll need an index with facet types, which we can’t have when using dynamic indexes.

So, let’s start by creating a new search index in our collection, which we’ll call idx-facets. Note that we’ll keep dynamic as true so we can still query the fields that are not explicitly defined:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "genres": [
        {
          "type": "stringFacet"
        },
        {
          "type": "string"
        }
      ],
      "year": [
        {
          "type": "numberFacet"
        },
        {
          "type": "number"
        }
      ]
    }
  }
}

We started by specifying that our mappings aren’t dynamic. Then, we selected the fields we were interested in for indexing faceted information. Since we also want to use filters in our query, for each field, we specify an index of a standard type (like string) and one of a faceted type (like stringFacet).

6.2. Running a Facet Query

Creating a facet query involves using searchMeta() and starting a SearchCollector.facet() method to include our facets and an operator for filtering results. When defining the facets, we have to choose a name and use a SearchFacet method that corresponds to the type of index we created. In our case, we define a stringFacet() and a numberFacet():

public Document genresThroughTheDecades(String genre) {
    List pipeline = asList(
      searchMeta(
        facet(
          text(
            fieldPath("genres"), genre
          ), 
          asList(
            stringFacet("genresFacet", 
              fieldPath("genres")
            ).numBuckets(5),
            numberFacet("yearFacet", 
              fieldPath("year"), 
              asList(1900, 1930, 1960, 1990, 2020)
            )
          )
        ),
        searchOptions()
          .index("idx-facets")
      )
    );
    return collection.aggregate(pipeline)
      .first();
}

We filter movies with a specific genre with the text() operator. Since films generally contain multiple genres, the stringFacet() will also show five (specified by numBuckets()) related genres ranked by frequency. For the numberFacet(), we must set the boundaries separating our aggregated results. We need at least two, with the last one being exclusive.

Finally, we return only the first result. Let’s see what it looks like if we filter by the “horror” genre:

{
    "count": {
        "lowerBound": 1703
    },
    "facet": {
        "genresFacet": {
            "buckets": [
                {
                    "_id": "Horror",
                    "count": 1703
                },
                {
                    "_id": "Thriller",
                    "count": 595
                },
                {
                    "_id": "Drama",
                    "count": 395
                },
                {
                    "_id": "Mystery",
                    "count": 315
                },
                {
                    "_id": "Comedy",
                    "count": 274
                }
            ]
        },
        "yearFacet": {
            "buckets": [
                {
                    "_id": 1900,
                    "count": 5
                },
                {
                    "_id": 1930,
                    "count": 47
                },
                {
                    "_id": 1960,
                    "count": 409
                },
                {
                    "_id": 1990,
                    "count": 1242
                }
            ]
        }
    }
}

Since we didn’t specify a total count, we get a lower bound count, followed by our facet names and their respective buckets.

6.3. Including a Facet Stage to Paginate Results

Let’s return to our late90sMovies() method and include a $facet stage in our pipeline. We’ll use it for pagination and a total rows count. The search() and project() stages will remain unmodified:

public Document late90sMovies(int skip, int limit, String keywords) {
    List<Bson> pipeline = asList(
        search(
          // ...
        ),
        project(fields(
          // ...
        )),
        facet(
          new Facet("rows",
            skip(skip),
            limit(limit)
          ),
          new Facet("totalRows",
            replaceWith("$$SEARCH_META"),
            limit(1)
          )
        )
    );
    return collection.aggregate(pipeline)
      .first();
}

We start by calling Aggregates.facet(), which receives one or more facets. Then, we instantiate a Facet to include skip() and limit() from the Aggregates class. While skip() defines our offset, limit() will restrict the number of documents retrieved. Note that we can name our facets anything we like.

Also, we call replaceWith(“$$SEARCH_META“) to get metadata info in this field. Most importantly, so that our metadata information is not repeated for each result, we include a limit(1). Finally, when our query has metadata, the result becomes a single document instead of an array, so we only return the first result.

7. Conclusion

In this article, we saw how MongoDB Atlas Search provides developers with a versatile and potent toolset. Integrating it with the Java MongoDB driver API can enhance search functionalities, data aggregation, and result customization. Our hands-on examples have aimed to provide a practical understanding of its capabilities. Whether implementing a simple search or seeking intricate data analytics, Atlas Search is an invaluable tool in the MongoDB ecosystem.

Remember to leverage the power of indexes, facets, and dynamic mappings to make our data work for us. As always, the source code is available over on GitHub.

       

Sharing Memory Between JVMs

$
0
0

1. Introduction

In this tutorial, we’ll show how to share memory between two or more JVMs running on the same machine. This capability enables very fast inter-process communication since we can move data blocks around without any I/O operation.

2. How Shared Memory Works?

A process running in any modern operating system gets what’s called a virtual memory space. We call it virtual because, although it looks like a large, continuous, and private addressable memory space, in fact, it’s made of pages spread all over the physical RAM. Here, page is just OS slang for a block of contiguous memory, whose size range depends on the particular CPU architecture in use. For x86-84, a page can be as small as 4KB or as large as 1 GB.

At a given time, only a fraction of this virtual space is actually mapped to physical pages. As time passes and the process starts to consume more memory for its tasks, the OS starts to allocate more physical pages and map them to the virtual space. When the demand for memory exceeds what’s physically available, the OS will start to swap out pages that are not being used at that moment to secondary storage to make room for the request.

A shared memory block behaves just like regular memory, but, in contrast with regular memory, it is not private to a single process. When a process changes the contents of any byte within this block, any other process with access to this same shared memory “sees” this change instantly.

This is a list of common uses for shared memory:

  • Debuggers (ever wondered how a debugger can inspect variables in another process?)
  • Inter-process communication
  • Read-only content sharing between processes (ex: dynamic library code)
  • Hacks of all sorts ;^)

3. Shared Memory and Memory-Mapped Files

A memory-mapped file, as the name suggests, is a regular file whose contents are directly mapped to a contiguous area in the virtual memory of a process. This means that we can read and/or change its contents without explicit use of I/O operations. The OS will detect any writes to the mapped area and will schedule a background I/O operation to persist the modified data.

Since there are no guarantees on when this background operation will happen, the OS also offers a system call to flush any pending changes. This is important for use cases like database redo logs, but not needed for our inter-process communication (IPC, for short) scenario.

Memory-mapped files are commonly used by database servers to achieve high throughput I/O operations, but we can also use them to bootstrap a shared-memory-based IPC mechanism. The basic idea is that all processes that need to share data map the same file and, voilà, they now have a shared memory area.

4. Creating Memory-Mapped Files in Java

In Java, we use the FileChannel‘s map() method to map a region of a file into memory, which returns a MappedByteBuffer that allows us to access its contents:

MappedByteBuffer createSharedMemory(String path, long size) {
    try (FileChannel fc = (FileChannel)Files.newByteChannel(new File(path).toPath(),
      EnumSet.of(
        StandardOpenOption.CREATE,
        StandardOpenOption.SPARSE,
        StandardOpenOption.WRITE,
        StandardOpenOption.READ))) {
        return fc.map(FileChannel.MapMode.READ_WRITE, 0, size);
    }
    catch( IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

The use of the SPARSE option here is quite relevant. As long the underlying OS and file system supports it, we can map sizable memory area without actually consuming disk space.

Now, let’s create a simple demo application. The Producer application will allocate a shared memory large enough to hold 64KB of data plus a SHA1 hash (20 bytes). Next, it will start a loop where it will fill the buffer with random data, followed by its SHA1 hash. We’ll repeat this operation continuously for 30 seconds and then exit:

// ... SHA1 digest initialization omitted
MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
Random rnd = new Random();
long start = System.currentTimeMillis();
long iterations = 0;
int capacity = shm.capacity();
System.out.println("Starting producer iterations...");
while(System.currentTimeMillis() - start < 30000) {
    for (int i = 0; i < capacity - hashLen; i++) {
        byte value = (byte) (rnd.nextInt(256) & 0x00ff);
        digest.update(value);
        shm.put(i, value);
    }
    // Write hash at the end
    byte[] hash = digest.digest();
    shm.put(capacity - hashLen, hash);
    iterations++;
}
System.out.printf("%d iterations run\n", iterations);

To test that we indeed can share memory, we’ll also create a Consumer app that will read the buffer’s content, compute its hash, and compare it with the Producer-generated one. We’ll repeat this process for 30 seconds. At each iteration, will also compute the buffer content’s hash and compare it with the one present at the buffer’s end:

// ... digest initialization omitted
MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
long start = System.currentTimeMillis();
long iterations = 0;
int capacity = shm.capacity();
System.out.println("Starting consumer iterations...");
long matchCount = 0;
long mismatchCount = 0;
byte[] expectedHash = new byte[hashLen];
while (System.currentTimeMillis() - start < 30000) {
    for (int i = 0; i < capacity - 20; i++) {
        byte value = shm.get(i);
        digest.update(value);
    }
    byte[] hash = digest.digest();
    shm.get(capacity - hashLen, expectedHash);
    if (Arrays.equals(hash, expectedHash)) {
        matchCount++;
    } else {
        mismatchCount++;
    }
    iterations++;
}
System.out.printf("%d iterations run. matches=%d, mismatches=%d\n", iterations, matchCount, mismatchCount);

To test our memory-sharing scheme, let’s start both programs at the same time. This is their output when running on a 3Ghz, quad-core Intel I7 machine:

# Producer output
Starting producer iterations...
11722 iterations run
# Consumer output
Starting consumer iterations...
18893 iterations run. matches=11714, mismatches=7179

We can see that, in many cases, the consumer detects that the expected computed values are different. Welcome to the wonderful world of concurrency issues!

5. Synchronizing Shared Memory Access

The root cause for the issue we’ve seen is that we need to synchronize access to the shared memory buffer. The Consumer must wait for the Producer to finish writing the hash before it starts reading the data. On the other hand, the Producer also must wait for the Consumer to finish consuming the data before writing to it again.

For a regular multithreaded application, solving this issue is no big deal. The standard library offers several synchronization primitives that allow us to control who can write to the shared memory at a given time.

However, ours is a multi-JVM scenario, so none of those standard methods apply. So, what should we do? Well, the short answer is that we’ll have to cheat. We could resort to OS-specific mechanisms like semaphores, but this would hinder our application’s portability. Also, this implies using JNI or JNA, which also complicates things.

Enter Unsafe. Despite its somewhat scary name, this standard library class offers exactly what we need to implement a simple lock mechanism: the compareAndSwapInt() method.

This method implements an atomic test-and-set primitive that takes four arguments. Although not clearly stated in the documentation, it can target not only Java objects but also a raw memory address. For the latter, we pass null in the first argument, which makes it treat the offset argument as a virtual memory address.

When we call this method, it will first check the value at the target address and compare it with the expected value. If they’re equal, then it will modify the location’s content to the new value and return true indicating success. If the value at the location is different from expected, nothing happens, and the method returns false.

More importantly, this atomic operation is guaranteed to work even in multicore architectures, which is a critical feature for synchronizing multiple executing threads.

Let’s create a SpinLock class that takes advantage of this method to implement a (very!) simple lock mechanism:

//... package and imports omitted
public class SpinLock {
    private static final Unsafe unsafe;
    // ... unsafe initialization omitted
    private final long addr;
    public SpinLock(long addr) {
        this.addr = addr;
    }
    public boolean tryLock(long maxWait) {
        long deadline = System.currentTimeMillis() + maxWait;
        while (System.currentTimeMillis() < deadline ) {
            if (unsafe.compareAndSwapInt(null, addr, 0, 1)) {
                return true;
            }
        }
        return false;
    }
    public void unlock() {
        unsafe.putInt(addr, 0);
    }
}

This implementation lacks key features, like checking whether it owns the lock before releasing it, but it will suffice for our purpose.

Okay, so how do we get the memory address that we’ll use to store the lock status? This must be an address within the shared memory buffer so both processes can use it, but the MappedByteBuffer class does not expose the actual memory address.

Inspecting the object that map() returns, we can see that it is a DirectByteBuffer. This class has a public method called address() that returns exactly what we want. Unfortunately, this class is package-private so we can’t use a simple cast to access this method.

To bypass this limitation, we’ll cheat a little again and use reflection to invoke this method:

private static long getBufferAddress(MappedByteBuffer shm) {
    try {
        Class<?> cls = shm.getClass();
        Method maddr = cls.getMethod("address");
        maddr.setAccessible(true);
        Long addr = (Long) maddr.invoke(shm);
        if (addr == null) {
            throw new RuntimeException("Unable to retrieve buffer's address");
        }
        return addr;
    } catch (NoSuchMethodException | InvocationTargetException | IllegalAccessException ex) {
        throw new RuntimeException(ex);
    }
}

Here, we’re using setAccessible() to make the address() method callable through the Method handle. However, be aware that, from Java 17 onwards, this technique won’t work unless we explicitly use the runtime –add-opens flag.

6. Adding Synchronization to Producer and Consumer

Now that we have a lock mechanism, let’s apply it to the Producer first. For the purposes of this demo, we’ll assume that the Producer will always start before the Consumer. We need this so we can initialize the buffer, clearing its content including the area we’ll use with the SpinLock:

public static void main(String[] args) throws Exception {
    // ... digest initialization omitted
    MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
    // Cleanup lock area 
    shm.putInt(0, 0);
    long addr = getBufferAddress(shm);
    System.out.println("Starting producer iterations...");
    long start = System.currentTimeMillis();
    long iterations = 0;
    Random rnd = new Random();
    int capacity = shm.capacity();
    SpinLock lock = new SpinLock(addr);
    while(System.currentTimeMillis() - start < 30000) {
        if (!lock.tryLock(5000)) {
            throw new RuntimeException("Unable to acquire lock");
        }
        try {
            // Skip the first 4 bytes, as they're used by the lock
            for (int i = 4; i < capacity - hashLen; i++) {
                byte value = (byte) (rnd.nextInt(256) & 0x00ff);
                digest.update(value);
                shm.put(i, value);
            }
            // Write hash at the end
            byte[] hash = digest.digest();
            shm.put(capacity - hashLen, hash);
            iterations++;
        }
        finally {
            lock.unlock();
        }
    }
    System.out.printf("%d iterations run\n", iterations);
}

Compared to the unsynchronized version, there are just minor changes:

  • Retrieve the memory address associated with the MappedByteBufer
  • Create a SpinLock instance using this address. The lock uses an int, so it will take the four initial bytes of the buffer
  • Use the SpinLock instance to protect the code that fills the buffer with random data and its hash

Now, let’s apply similar changes to the Consumer side:

private static void main(String args[]) throws Exception {
    // ... digest initialization omitted
    MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
    long addr = getBufferAddress(shm);
    System.out.println("Starting consumer iterations...");
    Random rnd = new Random();
    long start = System.currentTimeMillis();
    long iterations = 0;
    int capacity = shm.capacity();
    long matchCount = 0;
    long mismatchCount = 0;
    byte[] expectedHash = new byte[hashLen];
    SpinLock lock = new SpinLock(addr);
    while (System.currentTimeMillis() - start < 30000) {
        if (!lock.tryLock(5000)) {
            throw new RuntimeException("Unable to acquire lock");
        }
        try {
            for (int i = 4; i < capacity - hashLen; i++) {
                byte value = shm.get(i);
                digest.update(value);
            }
            byte[] hash = digest.digest();
            shm.get(capacity - hashLen, expectedHash);
            if (Arrays.equals(hash, expectedHash)) {
                matchCount++;
            } else {
                mismatchCount++;
            }
            iterations++;
        } finally {
            lock.unlock();
        }
    }
    System.out.printf("%d iterations run. matches=%d, mismatches=%d\n", iterations, matchCount, mismatchCount);
}

With those changes, we can now run both sides and compare them with the previous result:

# Producer output
Starting producer iterations...
8543 iterations run
# Consumer output
Starting consumer iterations...
8607 iterations run. matches=8607, mismatches=0

As expected, the reported iteration count will be lower compared to the non-synchronized version. The main reason is that we spend most part of the time within the critical section of the code holding the lock. Whichever program holding the lock prevents the other side from doing anything.

If we compare the average iteration count reported from the first case, it will be approximately the same as the sum of iterations we got this time. This shows that the overhead added by the lock mechanism itself is minimal.

6. Conclusion

In this tutorial, we’ve explored how to use share a memory area between two JVMs running on the same machine. We can use the technique presented here as the foundation for high-throughput, low-latency inter-process communication libraries.

As usual, all code is available over on GitHub.

       

Java Weekly, Issue 516

$
0
0

1. Spring and Java

>> Table partitioning with Spring and Hibernate [vladmihalcea.com]

Splitting a large table into multiple smaller partition tables using Spring and Hibernate — allowing a more efficient seek/scan.

>> Pattern Matching for switch – Sip of Java [inside.java]

And, a crash course on Pattern Matching in Java 21: type patterns, guard conditions, null handling, exhaustiveness, sealed hierarchies, and records.

Also worth reading:

Webinars and presentations:

Time to upgrade:

2. Technical & Musings

>> Exploring the OpenTelemetry Collector [blog.frankel.ch]

Exploring the OpenTelemetry collector: setup, intermediary data processing, receivers, exporters, log manipulation, and more!

Also worth reading:

3. Pick of the Week

Our only “sale” of the year is now live – Black Friday.

If you’re planning to go deep into Spring and Spring Boot, this is a good time to explore:

>> All Baeldung Courses

       

Synchronize a Static Variable Among Different Threads

$
0
0

1. Overview

In Java, it’s not uncommon to need synchronized access to static variables. In this short tutorial, we’ll look at several ways to synchronize access to static variables among different threads.

2. About Static Variables

As a quick refresher, static variables belong to the class rather than an instance of the class. This means all instances of a class have the same state for the variable.

For example, let’s consider an Employee class with a static variable:

public class Employee {
    static int count;
    int id;
    String name;
    String title;
}

In this case, the count variable is static and represents the number of total employees that have ever worked at the company. No matter how many Employee instances we create, all of them will share the same value for count.

We can then add code to the constructor to ensure we track the count with each new employee:

public Employee(int id, String name, String title) {
    count = count + 1;
    // ...
}

While this approach is straightforward, it potentially has problems when we want to read the count variable. This is especially true in a multi-threaded environment with multiple instances of the Employee class.

Below, we’ll see different ways to synchronize access to the count variable.

3. Synchronizing Static Variables With the synchronized Keyword

The first way we can synchronize our static variable is by using Java’s synchronized keyword. There are several ways we can utilize this keyword for accessing our static variable.

First, we can create a static method that uses the synchronized keyword as a modifier in its declaration:

public Employee(int id, String name, String title) {
    incrementCount();
    // ...
}
private static synchronized void incrementCount() {
    count = count + 1;
}
public static synchronized int getCount() {
    return count;
}

In this case, the synchronized keyword locks on the class object because the variable is static. This means no matter how many instances of Employee we create, only one can access the variable at once, as long as they use the two static methods.

Secondly, we can use a synchronized block to explicitly synchronize on the class object:

private static void incrementCount() {
    synchronized(Employee.class) {
        count = count + 1;
    }
}
public static int getCount() {
    synchronized(Employee.class) {
        return count;
    }
}

Note that this is functionally equivalent to the first example, but the code is a little more explicit.

Finally, we can also use a synchronized block with a specific object instance instead of the class:

private static final Object lock = new Object();
public Employee(int id, String name, String title) {
    incrementCount();
    // ...
}
private static void incrementCount() {
    synchronized(lock) {
        count = count + 1;
    }
}
public static int getCount() {
    synchronized(lock) {
        return count;
    }
}

The reason this is sometimes preferred is because the lock is private to our class. In the first example, it’s possible for other code outside of our control to also lock on our class. With a private lock, we have full control over how it’s used.

The Java synchronized keyword is only one way to synchronize access to a static variable. Below, we’ll look at some Java APIs that can also provide synchronization to static variables.

4. Java APIs To Synchronize Static Variables

The Java programming language offers several APIs that can help with synchronization. Let’s look at two of them.

4.1. Atomic Wrappers

Introduced in Java 1.5, the AtomicInteger class is an alternative way to synchronize access to our static variable. This class provides atomic read and write operations, ensuring a consistent view of the underlying value across all threads.

For example, we could rewrite our Employee class using the AtomicInteger type instead of int:

public class Employee {
    private final static AtomicInteger count = new AtomicInteger(0);
    public Employee(int id, String name, String title) {
        count.incrementAndGet();
    }
    public static int getCount() {
        count.get();
    }
}

In addition to AtomicInteger, Java provides atomic wrappers for long and boolean, as well as reference types. All of these wrapper classes are great tools for synchronizing access to static data.

4.2. Reentrant Locks

Also introduced in Java 1.5, the ReentrantLock class is another mechanism we can use to synchronize access to static data. It provides the same basic behavior and semantics as the synchronized keyword we used earlier but with additional capabilities.

Let’s see an example of how our Employee class can use the ReentrantLock instead of int:

public class Employee {
    private static int count = 0;
    private static final ReentrantLock lock = new ReentrantLock();
    public Employee(int id, String name, String title) {
        lock.lock();
        try {
            count = count + 1;
        }
        finally {
            lock.unlock();
        }
        // set fields
    }
    public static int getCount() {
        lock.lock();
        try {
            return count;
        }
        finally {
            lock.unlock();
        }
    }
}

There are a couple of things to note about this approach. First, it’s much more verbose than the others. Each time we access the shared variable, we have to ensure we lock right before the access and unlock right after. This can lead to programmer errors if we forget to do this sequence in every place we access the shared static variable.

Additionally, the documentation for the class suggests using a try/finally block to properly lock and unlock. This adds additional lines of code and verbosity, as well as more potential for programmer error if we forget to do this in all cases.

That said, the ReentrantLock class offers additional behavior beyond the synchronized keyword. Among other things, it allows us to set a fairness flag and query the state of the lock to get a detailed view of how many threads are waiting on it.

5. Conclusion

In this article, we looked at several different ways to synchronize access to a static variable across different instances and threads. We first looked at the Java synchronized keyword and saw examples of how we use it as both a method modifier and a static code block.

We then looked at two features of the Java concurrent API: AtomicInteger and ReeantrantLock. Both of these APIs offer ways to synchronize access to shared data with some additional benefits beyond the synchronized keyword.

All of the examples above can be found over on GitHub.

       

Create Table Using ASCII in a Console in Java

$
0
0

1. Overview

The Java standard library provides the printf() and format() methods to output formatted data to the console. These two methods make it possible to create a table using ASCII characters in a console app. Also, there’s a third-party library named AsciiTable that further simplifies the task.

In this tutorial, we’ll learn how to use the Java standard API and a third-party API to create a table using ASCII characters in Java.

2. Project Setup

To understand how to output a table to the console in Java, let’s create a simple project that outputs a person’s name, height, weight, and body mass index (BMI) to the console.

First, let’s create a class named BodyMassIndex:

class BodyMassIndex {
    private String name;
    private double height;
    private double weight;
 
    // constructor, getters and setters  
 
    double calculate() {
        double bmi = weight / (height * height);
        String formattedBmi = String.format("%.2f", bmi);
        return Double.parseDouble(formattedBmi);
    }
}

Here, we create a class named BodyMassIndex. Its constructor accepts name, height, and weight as parameters. Also, we define a method named calculate() to compute body mass index.

We’ll also create a new class named BodyMassIndexApplication, which will have methods that use the BodyMassIndex object to construct a table using ASCII characters.

Next, let’s create BodyMassIndex objects in the class and store them in an ArrayList:

List<BodyMassIndex> bodyMassIndices = new ArrayList<>();
bodyMassIndices.add(new BodyMassIndex("Tom", 1.8, 80));
bodyMassIndices.add(new BodyMassIndex("Elton", 1.9, 90));
bodyMassIndices.add(new BodyMassIndex("Harry", 1.9, 90));
bodyMassIndices.add(new BodyMassIndex("Hannah", 1.9, 90));

In the subsequent sections, we’ll output the data in a tabular form to the console using System.out.format() and AsciiTable.

3. Using the System.out.format() method

The Java PrintStream object System.out provides methods like format() and print() to output format string to the console. Both are handy for constructing a table using ASCII characters. We can use these methods to carefully place ASCII characters to draw lines and position data.

3.1. System.out.format()

We’ll use format specifiers to place the data in the right column correctly. Let’s see an example code that outputs a table to the console using ASCII characters:

System.out.format("+---------+---------+---------+-------+%n");
System.out.format("| Name    | Height  |  Weight | BMI   |%n");
System.out.format("+---------+---------+---------+-------+%n");
String leftAlignment = "| %-7s | %-7.2f | %-7.2f | %-5.2f |%n";  
for (BodyMassIndex bodyMassIndex : bodyMassIndices) {
    System.out.format(leftAlignment, bodyMassIndex.getName(), bodyMassIndex.getHeight(), bodyMassIndex.getWeight(), bodyMassIndex.calculate());
    System.out.format("+---------+---------+---------+-------+%n");
}

In the code above, we create a header of four columns for the table. First, we use the plus sign to show the beginning and ending of each column. Next, we use hyphens to draw horizontal lines. Then, we use the newline character to terminate each line.

Furthermore, we use the pipe sign to draw a vertical line. The characters +, , and | signs are arranged based on the structure of the table.

Finally, we declare a string variable named leftAlignment and assign it with values of format String. The format string helps format output to the console. The format string contains the following elements:

  • | – Separate the columns in the output
  • %-7s – Helps to left-align a string and use a minimum field width of 7 characters
  • %-7.2f – Helps to left-align a float and to use a minimum field width of 7 characters and 2 decimal places
  • %-5.2f – Helps set the minimum field width to 5 and 2 decimal places
  • %n – Newline character

Alternatively, we can use System.out.printf() in place of System.out.format(). Both methods provide the same result.

3.2. The Output

Here’s the generated table:

console output ascii table using format method

The console displays a table constructed using ASCII characters. The table is rendered on the console based on our specifications.

4. Using the AsciiTable Library

AsciiTable is a third-party library that makes it easy to create a nice-looking ASCII table.

4.1. The AsciiTable Library

To use the AsciiTable library, let’s add its dependency to the pom.xml:

<dependency>
    <groupId>de.vandermeer</groupId>
    <artifactId>asciitable</artifactId>
    <version>0.3.2</version>
</dependency>

Next, let’s see an example code that uses the library to create the BMI data table in ASCII format:

AsciiTable asciiTable = new AsciiTable();
asciiTable.addRule();
asciiTable.addRow("Name", "Height", "Weight", "BMI");
asciiTable.addRule();
for (BodyMassIndex bodyMassIndex : bodyMassIndices) {
    asciiTable.addRow(bodyMassIndex.getName(), bodyMassIndex.getHeight(), bodyMassIndex.getWeight(), bodyMassIndex.calculate());
    asciiTable.addRule();
}
asciiTable.setTextAlignment(TextAlignment.CENTER);
String render = asciiTable.render();
System.out.println(render);

In the code above, we create an AsciiTable object. Next, we invoke @addRule() on it to add a horizontal line. Then, we use the addRow() method to populate the AsciiTable table object with data.

Also, the AsciiTable class provides methods to format the data. We align the data to the center by invoking the setTextAlignment() on the AsciiTable object. The method accepts enum as an argument to specify the text alignment.

Finally, we invoke the render(), which returns a string on the AsciiTable object.

4.2. The Output

Here’s the output on the console:

console_output_using_asciitable_library

The AsciiTable library provides an easy way to create nice-looking ASCII tables for the console with minimal code.

5. Conclusion

In this article, we learned how to output a table to the console using the built-in System.out.format() method and the AsciiTable library.

Both methods provide a working way to achieve the task. However, while using the AsciiTable library requires less work to align columns properly, the System.out.format() method gives more direct control over styling.

As usual, the complete source code for the examples is available over on GitHub.

       

Representing Furthest Possible Date in Java

$
0
0

1. Introduction

There are scenarios where it’s essential to represent the furthest conceivable date value, particularly when dealing with default or placeholder dates.

In this tutorial, we’ll learn how to represent the furthest possible date using the java.util.Date class and the java.lang.Long class.

2. Why Represent the Furthest Possible Date?

Let’s consider a scenario where we’re developing a software licensing system, and we want these licenses to be valid indefinitely unless they’re explicitly set to expire.

In scenarios like this one, it’s crucial to have a clear representation of the furthest possible date value in our code. This representation serves as a reference point for no expiration date, streamlining the logic of checking and managing license validity.

3. What Is the Furthest Possible Date?

The furthest possible date value in Java is the largest possible date that can be represented by the java.util.Date class.

This class stores the date and time as a long integer that represents the number of milliseconds since January 1, 1970, 00:00:00 GMT (the epoch).

The maximum value of a long integer is Long.MAX_VALUE, which is equal to 9223372036854775807. Therefore, Java’s furthest possible date value is the date and time corresponding to this number of milliseconds.

4. How to Represent the Furthest Possible Date?

To represent the furthest possible date in Java, we can use the following steps:

  • Create a Date object by passing Long.MAX_VALUE as the argument to its constructor. This creates a Date object with the furthest possible date and time.
  • Optionally, we can format the Date object using a SimpleDateFormat object to display it in a human-readable format.

Here’s an example of how to represent the furthest possible date:

public class MaxDateDisplay {
    public String getMaxDateValue() {
        Date maxDate = new Date(Long.MAX_VALUE);
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss.SSS");
        return "The maximum date value in Java is: " + sdf.format(maxDate);
    }
}

5. Unit Testing for Formatting the Furthest Possible Date

To verify, we create an instance of MaxDateDisplay and call the getMaxDateValue() method. Then, we can use assertEquals() to compare the expected output with the actual result:

@Test
void whenGetMaxDate_thenCorrectResult() {
    MaxDateDisplay display = new MaxDateDisplay();
    String result = display.getMaxDateValue();
    assertEquals("The maximum date value in Java is: 292278994-08-17 07:12:55.807", result);
}

6. Unit Testing for Comparing Date

When sorting or comparing dates, a known furthest possible date value can serve as a placeholder, particularly when null values aren’t desired. It signifies that a date is set to the furthest conceivable point in the future, making it a valuable tool in comparison operations.

Here’s an example of how to compare the date value:

@Test
void whenCompareTodayWithMaxDate_thenCorrectResult() {
    Date today = new Date();
    Date maxDate = new Date(Long.MAX_VALUE);
    int comparisonResult = today.compareTo(maxDate);
    assertTrue(comparisonResult < 0);
}

7. Conclusion

In this article, we learned how to represent the furthest possible date using the java.util.Date class and the java.lang.Long class. We also saw some examples of how to use this technique in some use cases of having the furthest possible date value.

As always, the example code is available over on GitHub.

       

How to Avoid NoSuchElementException in Stream API

$
0
0

1. Overview

In this short tutorial, we’ll explain how to avoid NoSuchElementException when working with the Stream API.

First, we’re going to explain the main cause of the exception. Then, we’ll showcase how to reproduce and fix it using practical examples.

2. The Cause of the Exception

Before delving deep into the details, let’s understand what the exception means.

In short, NoSuchElementException is thrown to signal that the requested element doesn’t exist. For instance, trying to access an element that is not available or present will lead to this exception.

Typically, calling the get() method on an empty Optional instance is one of the most common causes of NoSuchElementException when working with the Stream API.

3. Producing the Exception

Now that we know what the exception is, let’s go down the rabbit hole and see how to reproduce it in practice.

For example, let’s create a list of names and filter it using the Stream API:

@Test(expected = NoSuchElementException.class)
public void givenEmptyOptional_whenCallingGetMethod_thenThrowNoSuchElementException() {
    List<String> names = List.of("William", "Amelia", "Albert", "Philip");
    Optional<String> emptyOptional = names.stream()
      .filter(name -> name.equals("Emma"))
      .findFirst();
    emptyOptional.get();
}

As we can see, we used the filter() method to find the name “Emma”. Furthermore, we chained with the findFirst() method to get an Optional containing the first found element or an empty Optional if the filtered stream is empty.

Here, our list doesn’t contain the name “Emma”, so findFirst() returns an empty Optional. The test case fails with the NoSuchElementException exception because we’re trying to get a name that doesn’t exist and an empty Optional doesn’t hold any value.

4. Avoiding the Exception

Now, let’s see how to fix the exception. The easiest way would be to check if there’s a value present in our Optional instance before calling the get() method.

Fortunately, the Stream API provides the isPresent() method specifically for this purpose. So, let’s see it in action:

@Test
public void givenEmptyOptional_whenUsingIsPresentMethod_thenReturnDefault() {
    List<String> names = List.of("Tyler", "Amelia", "James", "Emma");
    Optional<String> emptyOptional = names.stream()
      .filter(name -> name.equals("Lucas"))
      .findFirst();
    String name = "unknown";
    if (emptyOptional.isPresent()) {
        name = emptyOptional.get();
    }
    assertEquals("unknown", name);
}

Here, we used isPresent() to make sure that there’s a value inside our Optional instance before calling the get() method. That way, we avoid the NoSuchElementException exception.

Please notice that the use of isPresent() comes with the cost of the if-else statements. So, can we do it better? Yes!

Typically, the best way to go is to use the orElse() method. In short, this method returns the value if it’s present, or the given fallback argument otherwise:

@Test
public void givenEmptyOptional_whenUsingOrElseMethod_thenReturnDefault() {
    List<String> names = List.of("Nicholas", "Justin", "James");
    Optional<String> emptyOptional = names.stream()
      .filter(name -> name.equals("Lucas"))
      .findFirst();
    String name = emptyOptional.orElse("unknown");
    assertEquals("unknown", name);
}

As shown above, this method offers a more convenient and straightforward way to avoid NoSuchElementException.

Alternatively, we can use the orElseGet() method to achieve the same outcome:

@Test
public void givenEmptyOptional_whenUsingOrElseGetMethod_thenReturnDefault() {
    List<String> names = List.of("Thomas", "Catherine", "David", "Olivia");
    Optional<String> emptyOptional = names.stream()
      .filter(name -> name.equals("Liam"))
      .findFirst();
    String name = emptyOptional.orElseGet(() -> "unknown");
    assertEquals("unknown", name);
}

Unlike orElse(), orElseGet() accepts a supplier as a parameter. Another key difference is that orElse() is executed in all cases, even if the Optional instance has a value. However, orElseGet() is only executed when the Optional value isn’t present.

Please note that our article on the difference between the orElse() and orElseGet() methods does a great job of covering the topic.

5. Best Practices to Avoid NoSuchElementException

In a nutshell, there are several key points to keep in mind when working with the Stream API to avoid the NoSuchElementException exception:

  • Always check if the returned stream/optional is not empty before calling the get() method.
  • Try to define a fallback value using orElse() or orElseGet().
  • Use a filter before calling any terminal operation on a stream.

6. Conclusion

In this short article, we explored different ways of avoiding the exception NoSuchElementException when working with the Stream API.

Along the way, we illustrated how to reproduce the exception and how to avoid it using practical examples.

As always, the full source code of the examples is available over on GitHub.

       

Ensuring Message Ordering in Kafka: Strategies and Configurations

$
0
0

1. Overview

In this article, we will explore the challenges and solutions surrounding message ordering in Apache Kafka. Processing messages in the correct order is crucial for maintaining data integrity and consistency in distributed systems. While Kafka offers mechanisms to maintain message order, achieving this in a distributed environment presents its own set of complexities.

2. Ordering Within a Partition and Its Challenges

Kafka maintains order within a single partition by assigning a unique offset to each message. This guarantees sequential message appending within that partition. However, when we scale up and use multiple partitions, maintaining a global order becomes complex. Different partitions receive messages at varying rates, complicating strict ordering across them.

2.1. Producer and Consumer Timing

Let’s talk about how Kafka handles the order of messages. There’s a bit of a difference between the order in which a producer sends messages and how the consumer receives them. By sticking to just one partition, we process messages in the order they arrive at the broker. However, this order might not match the sequence in which we originally sent them. This mix-up can happen because of things like network latency or if we are resending a message. To keep things in line, we can implement producers with acknowledgements and retries. This way, we make sure that messages not only reach Kafka but also in the right order.

2.2. Challenges with Multiple Partitions

This distribution across partitions, while beneficial for scalability and fault tolerance, introduces the complexity of achieving global message ordering. For instance, we’re sending out two messages, M1 and M2, in that order. Kafka gets them just like we sent them, but it puts them in different partitions. Here’s the catch, just because M1 was sent first doesn’t mean it’ll be processed before M2. This can be challenging in scenarios where the order of processing is crucial, such as financial transactions.

2.3. Single Partition Message Ordering

We create topics with the name ‘single_partition_topic’, which has one partition, and ‘multi_partition_topic’, which has 5 partitions. Below is an example of a topic with a single partition, where the producer is sending a message to the topic:

Properties producerProperties = new Properties();
producerProperties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
producerProperties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, LongSerializer.class.getName());
producerProperties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, JacksonSerializer.class.getName());
producer = new KafkaProducer<>(producerProperties);
for (long sequenceNumber = 1; sequenceNumber <= 10; sequenceNumber++) {
    UserEvent userEvent = new UserEvent(UUID.randomUUID().toString());
    userEvent.setGlobalSequenceNumber(sequenceNumber);
    userEvent.setEventNanoTime(System.nanoTime());
    ProducerRecord<Long, UserEvent> producerRecord = new ProducerRecord<>(Config.SINGLE_PARTITION_TOPIC, userEvent);
    Future<RecordMetadata> future = producer.send(producerRecord);
    sentUserEventList.add(userEvent);
    RecordMetadata metadata = future.get();
    logger.info("User Event ID: " + userEvent.getUserEventId() + ", Partition : " + metadata.partition());
}

UserEvent is a POJO class that implements the Comparable interface, helping in sorting the message class by globalSequenceNumber (external sequence number). Since the producer is sending POJO message objects, we implemented a custom Jackson Serializer and Deserializer.

Partition 0 receives all user events, and the event IDs appear in the following sequence:

841e593a-bca0-4dd7-9f32-35792ffc522e
9ef7b0c0-6272-4f9a-940d-37ef93c59646
0b09faef-2939-46f9-9c0a-637192e242c5
4158457a-73cc-4e65-957a-bf3f647d490a
fcf531b7-c427-4e80-90fd-b0a10bc096ca
23ed595c-2477-4475-a4f4-62f6cbb05c41
3a36fb33-0850-424c-81b1-dafe4dc3bb18
10bca2be-3f0e-40ef-bafc-eaf055b4ee26
d48dcd66-799c-4645-a977-fa68414ca7c9
7a70bfde-f659-4c34-ba75-9b43a5045d39

In Kafka, each consumer group operates as a distinct entity. If two consumers belong to different consumer groups, they both will receive all the messages on the topic. This is because Kafka treats each consumer group as a separate subscriber.

If two consumers belong to the same consumer group and subscribe to a topic with multiple partitions, Kafka will ensure that each consumer reads from a unique set of partitions. This is to allow concurrent processing of messages.

Kafka ensures that within a consumer group, no two consumers read the same message, thus each message is processed only once per group.

The code below is for a consumer consuming messages from the same topic:

Properties consumerProperties = new Properties();
consumerProperties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_CONTAINER.getBootstrapServers());
consumerProperties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, LongDeserializer.class.getName());
consumerProperties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, JacksonDeserializer.class.getName());
consumerProperties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
consumerProperties.put(Config.CONSUMER_VALUE_DESERIALIZER_SERIALIZED_CLASS, UserEvent.class);
consumerProperties.put(ConsumerConfig.GROUP_ID_CONFIG, "test-group");
consumer = new KafkaConsumer<>(consumerProperties);
consumer.subscribe(Collections.singletonList(Config.SINGLE_PARTITION_TOPIC));
ConsumerRecords<Long, UserEvent> records = consumer.poll(TIMEOUT_WAIT_FOR_MESSAGES);
records.forEach(record -> {
    UserEvent userEvent = record.value();
    receivedUserEventList.add(userEvent);
    logger.info("User Event ID: " + userEvent.getUserEventId());
});

In this case, we get the output which shows the consumer consuming messages in the same order, below are the sequential event IDs from the output:

841e593a-bca0-4dd7-9f32-35792ffc522e
9ef7b0c0-6272-4f9a-940d-37ef93c59646
0b09faef-2939-46f9-9c0a-637192e242c5
4158457a-73cc-4e65-957a-bf3f647d490a
fcf531b7-c427-4e80-90fd-b0a10bc096ca
23ed595c-2477-4475-a4f4-62f6cbb05c41
3a36fb33-0850-424c-81b1-dafe4dc3bb18
10bca2be-3f0e-40ef-bafc-eaf055b4ee26
d48dcd66-799c-4645-a977-fa68414ca7c9
7a70bfde-f659-4c34-ba75-9b43a5045d39

2.4. Multiple Partition Message Ordering

For a topic with multiple partitions, the consumer and producer configurations are the same. The only difference is the topic and partitions where messages go, the producer sends messages to the topic ‘multi_partition_topic’:

Future<RecordMetadata> future = producer.send(new ProducerRecord<>(Config.MULTI_PARTITION_TOPIC, sequenceNumber, userEvent));
sentUserEventList.add(userEvent);
RecordMetadata metadata = future.get();
logger.info("User Event ID: " + userEvent.getUserEventId() + ", Partition : " + metadata.partition());

The consumer consumes messages from the same topic:

consumer.subscribe(Collections.singletonList(Config.MULTI_PARTITION_TOPIC));
ConsumerRecords<Long, UserEvent> records = consumer.poll(TIMEOUT_WAIT_FOR_MESSAGES);
records.forEach(record -> {
    UserEvent userEvent = record.value();
    receivedUserEventList.add(userEvent);
    logger.info("User Event ID: " + userEvent.getUserEventId());
});

The producer output lists event IDs alongside their respective partitions as below:

939c1760-140e-4d0c-baa6-3b1dd9833a7d, 0
47fdbe4b-e8c9-4b30-8efd-b9e97294bb74, 4
4566a4ec-cae9-4991-a8a2-d7c5a1b3864f, 4
4b061609-aae8-415f-94d7-ae20d4ef1ca9, 3
eb830eb9-e5e9-498f-8618-fb5d9fc519e4, 2
9f2a048f-eec1-4c68-bc33-c307eec7cace, 1
c300f25f-c85f-413c-836e-b9dcfbb444c1, 0
c82efad1-6287-42c6-8309-ae1d60e13f5e, 4
461380eb-4dd6-455c-9c92-ae58b0913954, 4
43bbe38a-5c9e-452b-be43-ebb26d58e782, 3

For the consumer, the output would show that the consumer is not consuming messages in the same order. The event IDs from the output are below:

939c1760-140e-4d0c-baa6-3b1dd9833a7d
47fdbe4b-e8c9-4b30-8efd-b9e97294bb74
4566a4ec-cae9-4991-a8a2-d7c5a1b3864f
c82efad1-6287-42c6-8309-ae1d60e13f5e
461380eb-4dd6-455c-9c92-ae58b0913954
eb830eb9-e5e9-498f-8618-fb5d9fc519e4
4b061609-aae8-415f-94d7-ae20d4ef1ca9
43bbe38a-5c9e-452b-be43-ebb26d58e782
c300f25f-c85f-413c-836e-b9dcfbb444c1
9f2a048f-eec1-4c68-bc33-c307eec7cace

3. Message Ordering Strategies

3.1. Using a Single Partition

We could use a single partition in Kafka, as demonstrated in our earlier example with ‘single_partition_topic’, which ensures the ordering of messages. However, this approach has its trade-offs:

  • Throughput Constraint: Imagine we’re at a busy pizza shop. If we’ve only got one chef (the producer) and one waiter (the consumer) working on one table (the partition), they can only serve so many pizzas before things start to back up. In the world of Kafka, when we’re dealing with a ton of messages, sticking to a single partition is like that one-table scenario. A single partition becomes a bottleneck in high-volume scenarios and the rate of message processing is limited since only one producer and one consumer can operate on a single partition at a time.
  • Reduced Parallelism: In the case of the example above, if we have multiple chefs (producers) and waiters (consumers) working on multiple tables (partitions), then the number of orders fulfilled increases. Kafka’s strength lies in parallel processing across multiple partitions. With just one partition, this advantage is lost, leading to sequential processing and further restricting message flow

In essence, while a single partition guarantees order, it does so at the expense of reduced throughput.

3.2. External Sequencing with Time Window Buffering

In this approach, the producer tags each message with a global sequence number. Multiple consumer instances consume messages concurrently from different partitions and use these sequence numbers to reorder messages, ensuring global order.

In a real-world scenario with multiple producers, we will manage a global sequence by a shared resource that’s accessible across all producer processes, such as a database sequence or a distributed counter. This ensures that the sequence numbers are unique and ordered across all messages, irrespective of which producer sends them:

for (long sequenceNumber = 1; sequenceNumber <= 10 ; sequenceNumber++) {
    UserEvent userEvent = new UserEvent(UUID.randomUUID().toString());
    userEvent.setEventNanoTime(System.nanoTime());
    userEvent.setGlobalSequenceNumber(sequenceNumber);
    Future<RecordMetadata> future = producer.send(new ProducerRecord<>(Config.MULTI_PARTITION_TOPIC, sequenceNumber, userEvent));
    sentUserEventList.add(userEvent);
    RecordMetadata metadata = future.get();
    logger.info("User Event ID: " + userEvent.getUserEventId() + ", Partition : " + metadata.partition());
}

On the consumer side, we group the messages into time windows and then process them sequentially. Messages that arrive within a specific time frame we batch it together, and once the window elapses, we process the batch. This ensures that messages within that time frame are processed in order, even if they arrive at different times within the window. The consumer buffers messages and reorders them based on sequence numbers before processing. We need to ensure that messages are processed in the correct order, and for that, the consumer should have a buffer period where it polls for messages multiple times before processing the buffered messages and this buffer period is long enough to cope with potential message ordering issues:

consumer.subscribe(Collections.singletonList(Config.MULTI_PARTITION_TOPIC));
List<UserEvent> buffer = new ArrayList<>();
long lastProcessedTime = System.nanoTime();
ConsumerRecords<Long, UserEvent> records = consumer.poll(TIMEOUT_WAIT_FOR_MESSAGES);
records.forEach(record -> {
    buffer.add(record.value());
});
while (!buffer.isEmpty()) {
    if (System.nanoTime() - lastProcessedTime > BUFFER_PERIOD_NS) {
        processBuffer(buffer, receivedUserEventList);
        lastProcessedTime = System.nanoTime();
    }
    records = consumer.poll(TIMEOUT_WAIT_FOR_MESSAGES);
    records.forEach(record -> {
        buffer.add(record.value());
    });
}
void processBuffer(List buffer, List receivedUserEventList) {
    Collections.sort(buffer);
    buffer.forEach(userEvent -> {
        receivedUserEventList.add(userEvent);
        logger.info("Processing message with Global Sequence number: " + userEvent.getGlobalSequenceNumber() + ", User Event Id: " + userEvent.getUserEventId());
    });
    buffer.clear();
}

Each event ID appears in the output alongside its corresponding partition, as shown below:

d6ef910f-2e65-410d-8b86-fa0fc69f2333, 0
4d6bfe60-7aad-4d1b-a536-cc735f649e1a, 4
9b68dcfe-a6c8-4cca-874d-cfdda6a93a8f, 4
84bd88f5-9609-4342-a7e5-d124679fa55a, 3
55c00440-84e0-4234-b8df-d474536e9357, 2
8fee6cac-7b8f-4da0-a317-ad38cc531a68, 1
d04c1268-25c1-41c8-9690-fec56397225d, 0
11ba8121-5809-4abf-9d9c-aa180330ac27, 4
8e00173c-b8e1-4cf7-ae8c-8a9e28cfa6b2, 4
e1acd392-db07-4325-8966-0f7c7a48e3d3, 3

Consumer output with global sequence numbers and event IDs:

1, d6ef910f-2e65-410d-8b86-fa0fc69f2333
2, 4d6bfe60-7aad-4d1b-a536-cc735f649e1a
3, 9b68dcfe-a6c8-4cca-874d-cfdda6a93a8f
4, 84bd88f5-9609-4342-a7e5-d124679fa55a
5, 55c00440-84e0-4234-b8df-d474536e9357
6, 8fee6cac-7b8f-4da0-a317-ad38cc531a68
7, d04c1268-25c1-41c8-9690-fec56397225d
8, 11ba8121-5809-4abf-9d9c-aa180330ac27
9, 8e00173c-b8e1-4cf7-ae8c-8a9e28cfa6b2
10, e1acd392-db07-4325-8966-0f7c7a48e3d3

3.3. Considerations for External Sequencing with Buffering

In this approach, each consumer instance buffers messages and processes them in order based on their sequence numbers. However, there are a few considerations:

  • Buffer Size: The buffer’s size can increase depending on the volume of incoming messages. In implementations that prioritize strict ordering by sequence numbers, we might see significant buffer growth, especially if there are delays in message delivery. For instance, if we process 100 messages per minute but suddenly receive 200 due to a delay, the buffer will grow unexpectedly. So we must manage the buffer size effectively and have strategies ready in case it exceeds the anticipated limit
  • Latency: When we buffer messages, we’re essentially making them wait a bit before processing (introducing latency). On one hand, it helps us keep things orderly; on the other, it slows down the whole process. It’s all about finding the right balance between maintaining order and minimizing latency
  • Failures: If consumers fail, we might lose the buffered messages, to prevent this, we might need to regularly save the state of our buffer
  • Late Messages: Messages arriving post-processing of their window will be out of sequence. Depending on the use case, we might need strategies to handle or discard such messages
  • State Management: If processing involves stateful operations, we’ll need mechanisms to manage and persist state across windows.
  • Resource Utilization: Keeping a lot of messages in the buffer requires memory. We need to ensure that we have enough resources to handle this, especially if messages are staying in the buffer for longer periods

3.4. Idempotent Producers

Kafka’s idempotent producer feature aims to deliver messages precisely once, thus preventing any duplicates. This is crucial in scenarios where a producer might retry sending a message due to network errors or other transient failures. While the primary goal of idempotency is to prevent message duplication, it indirectly influences message ordering. Kafka achieves idempotency using two things a Producer ID (PID) and a sequence number which acts as the idempotency key and is unique within the context of a specific partition.

  • Sequence Numbers: Kafka assigns sequence numbers to each message sent by the producer. These sequence numbers are unique per partition, ensuring that messages, when sent by the producer in a specific sequence, are written in that same order within a specific partition upon being received by Kafka. Sequence numbers guarantee order within a single partition. However, when producing messages to multiple partitions, there’s no global order guarantee across partitions. For example, if a producer sends messages M1, M2, and M3 to partitions P1, P2, and P3, respectively, each message receives a unique sequence number within its partition. However, it does not guarantee the relative consumption order across these partitions
  • Producer ID (PID): When enabling idempotency, the broker assigns a unique Producer ID (PID) to each producer. This PID, combined with the sequence number, enables Kafka to identify and discard any duplicate messages that result from producer retries

Kafka guarantees message ordering by writing messages to partitions in the order they’re produced, thanks to sequence numbers, and prevents duplicates using the PID and the idempotency feature. To enable the idempotent producer, we need to set the “enable.idempotence” property to true in the producer’s configuration:

props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");

4. Key Configurations for Producer and Consumer

There are key configurations for Kafka producers and consumers that can influence message ordering and throughput.

4.1. Producer Configurations

  • MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION: If we are sending a bunch of messages, then this setting in Kafka helps in deciding how many messages we can send without waiting for a ‘read’ receipt. If we set it higher than 1 without turning on idempotence, we might end up disturbing the order of our messages if we have to resend them. But, if we turn on idempotence, Kafka keeps messages in order, even if we send a bunch at once. For super strict order, like ensuring every message is read before the next one is sent, we should set this value to 1. If we want to prioritize speed and over perfect order, then we can set up to 5, but this can potentially introduce ordering issues.
  • BATCH_SIZE_CONFIG and LINGER_MS_CONFIG: Kafka controls the default batch size in bytes, aiming to group records for the same partition into fewer requests for better performance. If we set this limit too low, we’ll be sending out lots of small groups, which can slow us down. But if we set it too high, it might not be the best use of our memory. Kafka can wait a bit before sending a group if it’s not full yet. This wait time is controlled by LINGER_MS_CONFIG. If more messages come in quickly enough to fill up our set limit, they go immediately, but if not, Kafka doesn’t keep waiting – it sends whatever we have when the time’s up. It’s like balancing speed and efficiency, making sure we’re sending just enough messages at a time without unnecessary delays.
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "1");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, "16384");
props.put(ProducerConfig.LINGER_MS_CONFIG, "5");

4.2. Consumer Configurations

  • MAX_POLL_RECORDS_CONFIG: It’s the limit on how many records our Kafka consumer grabs each time it asks for data. If we set this number high, we can chow down on a lot of data at once, boosting our throughput. But there’s a catch – the more we take, the trickier it might be to keep everything in order. So, we need to find that sweet spot where we’re efficient but not overwhelmed
  • FETCH_MIN_BYTES_CONFIG: If we set this number high, Kafka waits until it has enough data to meet our minimum bytes before sending it over. This can mean fewer trips (or fetches), which is great for efficiency. But if we’re in a hurry and want our data fast, we might set this number lower, so Kafka sends us whatever it has more quickly. For instance, if our consumer application is resource-intensive or needs to maintain strict message order, especially with multi-threading, a smaller batch might be beneficial
  • FETCH_MAX_WAIT_MS_CONFIG: This will decide how long our consumer waits for Kafka to gather enough data to meet our FETCH_MIN_BYTES_CONFIG. If we set this time high, our consumer is willing to wait longer, potentially getting more data in one go. But if we’re in a rush, we set this lower, so our consumer gets data faster, even if it’s not as much. It’s a balancing act between waiting for a bigger haul and getting things moving quickly
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");
props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, "1");
props.put(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG, "500");

5. Conclusion

In this article, we have delved into the intricacies of message ordering in Kafka. We have explored the challenges and presented strategies to address them. Whether it’s through single partitions, external sequencing with time window buffering, or idempotent producers, Kafka offers custom solutions to meet the needs of message ordering.

As always, the source for the examples is available on GitHub.

       

Executable Comments in Java

$
0
0

1. Overview

Comments can be useful when we need additional notes in our code. They can help us make our code more understandable. Additionally, they can be especially useful in methods that perform complex operations.

In this tutorial, we’ll explore cases where comments in our code can become executable. Or at least it may appear like they can.

2. Comments

Before we dive in, let’s revisit comments in Java. They are part of the Java syntax and come in two basic formats:

  • Single-line comments
  • Multiline comments

The text from the “//” characters to the end of the line represents a single-line comment:

// This is a single-line comment.

Additionally, a multiple-line comment (also known as a multiline comment) starts with the “/*” and ends with the “*/” symbol. Everything in between is treated as a comment:

/* This is a
 * multiline
 * comment.
 */

3. Comments and Unicode

Now, let’s start with an example. The following code prints “Baeldung” in the standard output:

// \u000d System.out.println("Baeldung");

Because the line begins with the “//”, which represents the start of a single-line comment, we might conclude the “System.out.println(“Baeldung”);” statement is part of that comment as well.

However, this isn’t accurate. It’s important to note Java doesn’t allow comment execution.

With that in mind, let’s examine our example in detail and see the reasons why the code prints “Baeldung” in the console.

3.1. Unicode Escapes

The code from the example isn’t treated as a comment because of the “\u000d” Unicode escape sequence we placed before it.

All Java programs use the ASCII character set. However, due to the non-Latin characters, we can’t represent using ASCII codes, Java allows Unicode to appear in comments, identifiers, keywords, literals, and separators.

Furthermore, to be able to use all non-ASCII characters in our code, we need to embed them through Unicode escape sequences. They start with a backslash (“\”) followed by the letter “u” which is then followed by a four-digit hexadecimal code of a specific character.

Using this convention, the CR (or Carriage return) becomes “\u000d“.

Additionally, the Unicode escape sequences are transformed into ASCII code using the lexical translation defined in the Java Language Specification.

Moving forward, let’s take a closer look at how Java performs the lexical transformation.

3.2. Lexical Translation

When executing the lexical translation, the Unicode encoding takes precedence over any other encoding, even if it’s part of the comment. To put it differently, Java will first encode all Unicode escape sequences and then move forward with other translations.

Simply put, during the transformation, the Unicode escape is translated into the Unicode character. Then, the result of the previous step is translated into the ASCII code.

As a side effect, our code won’t compile if we put an invalid Unicode escape inside the comment. Java treats everything that starts with the “\u” as a Unicode escape.

Thanks to this transformation, we can use Unicode escapes to include any Unicode characters using only ASCII characters. This way, ASCII-based programs and tools can still process the code written in Unicode.

Now, back to our example. We used the Unicode escape sequence “\u000d“, which represents a new line.

When we compile our code, the lexical translation will happen first. Therefore, the “\u000d” will translate to the new line. Since, by definition, a single-line comment ends at the end of the line, the code we put after the Unicode escape won’t be part of the comment anymore.

As a result of the transformation, our code will appear in the new line:

//
System.out.println("Baeldung");

3.3. Unicode and IDEs

Nowadays, we often use an IDE as a development tool. Additionally, we frequently rely on it and expect it’ll warn us if something in our code seems suspicious.

However, when it comes to IDEs and Unicode characters, depending on the IDE we’re using, it sometimes displays the code in the wrong way. It might not interpret Unicode escape sequences correctly and, thus, displays incorrect code highlighting.

Since we can use Unicode escapes instead of ASCII characters, nothing prevents us from substituting other parts of the code with Unicode escapes:

\u002f\u002f This is a comment
\u0053ystem.out.println("Baeldung");

Here, we replaced the “//” and the letter “S” with Unicode escapes. The code still prints “Baeldung” in the console.

4. Conclusion

In this tutorial, we learned how comments and Unicode escape sequences work together.

To sum up, Java doesn’t allow executable comments. When using Unicode escapes in our code, Java translates them to ASCII before any other transformation.

Being able to write Unicode characters is useful when we’d like to use non-Latin characters we can’t represent in any other way in our program. Although it’s perfectly legal to write an entire codebase using just Unicode escapes, we should avoid them and use them only when necessary.

       

Differences Between * and ? in Cron Expressions

$
0
0

1. Overview

With a cron scheduler, we can automate repetitive tasks we’d otherwise need to handle manually. Additionally, the cron expression allows us to schedule jobs executing at the desired date and time.

For scheduling jobs in Java, we usually use the Quartz library. It’s an open-source solution for job scheduling written entirely in Java. Furthermore, if we’re working with the Spring framework, we can use the @Scheduled annotation to easily schedule tasks.

Although cron expression represents a powerful way to schedule tasks, its syntax can sometimes be confusing and overwhelming.

In this tutorial, we’ll examine the differences between the ? and the * symbols in cron expressions.

2. Fields in Cron Expression

Before we dive in, let’s explore the fields that can appear in the cron expressions.

In Quartz, a cron expression represents a string that involves up to seven fields separated by whitespace, each representing a specific unit of date and time:

Field Required Allowed Values Allowed Special Characters
Seconds Yes 0-59 , – * /
Minutes Yes 0-59 , – * /
Hours Yes 0-23 , – * /
Day of Month Yes 1-31 , – * / ? L W
Month Yes 0-11 (or JAN-DEC) , – * /
Day of Week Yes 1-7 (or SUN-SAT) , – * / ? L C #
Year No 1970-2099 (or empty) , – * /

As we can see in the table above, all fields are mandatory except the field that specifies a year. If we don’t provide a value, the job will be executed every year.

Additionally, the syntax for the Unix cron expressions is a bit different:

Field Required Allowed Values Allowed Special Characters
Minutes Yes 0-59 , – * /
Hours Yes 0-23 , – * /
Day of Month Yes 1-31 , – * /
Month Yes 1-12 (or JAN-DEC) , – * /
Day of Week Yes 0-6 (or SUN-SAT) , – * /

The Unix cron expression consists of five fields followed by the command we’d like to execute. Unlike Quartz, there aren’t specific fields where we’d specify seconds and years. It focuses on scheduling tasks for the current year.

It’s worth noting that the cron expression in Unix doesn’t allow the ? symbol to appear in the expression.

In the next sections, we’ll primarily focus on the cron expressions with the Quartz library.

3. The ? in Cron Expression

Next, let’s examine the question mark symbol (?) in the cron expression. Simply put, it represents no specific value.

We can use it only within the fields that specify the day of the month and the day of the week.

However, it’s important to note the day of the month and the day of the week fields are mutually exclusive. In other words, we can’t specify values for both fields in the same expression.

For instance, the following expression results in an error:

0 30 10 1 OCT 2 2023

Additionally, to easily understand the expression, let’s see it in the table:

Seconds Minutes Hours Day of Month Month Day of Week Year
0 30 10 1 OCT 2 2023

We set values for both the day of the month and the day of the week parameters, which isn’t supported with Quartz.

The cron expression would be invalid even if we use the day of the month that falls on the correct weekday:

0 30 10 30 OCT 2 2023

Here, the 30th of October in 2023 falls on a Monday, but the expression is still not valid.

Furthermore, since we’re required to set values for both fields, we need to put the ? symbol on one of them to indicate the value is unset. The field where we set ? will be ignored:

0 0 0 30 OCT ?

From the example, the job runs at midnight on the 30th of October, every year.

Additionally, the ? can appear only once in a cron expression. Setting both values with ? would result in an error as well:

0 30 * ? OCT ?

4. The * in Cron Expression

On the other hand, the asterisk (*) in the cron expression means all the values. To put it differently, we’d use it to set all the values defined for a specific field.

Furthermore, unlike the ?, we can use * within any field in the cron expression.

As an example, let’s create a cron expression where we’ll set all the values from the field representing the hours:

0 30 * 1 OCT ?

Next, let’s see in the tabular format:

Seconds Minutes Hours Day of Month Month Day of Week Year
0 30 * 1 OCT ? empty

The job executes on the first of October, at every hour, 30 minutes and 0 seconds.

Additionally, we can use the * for multiple fields as well:

* * * * OCT ?

This job runs every second, every day in October.

4.1. Day of Month and Day of Week In Linux Cron

When it comes to the day of the month and the weekday fields in Linux cron, they behave differently than the ones from Quartz.

Firstly, they’re not mutually exclusive. We can set both values in the same cron expression.

Secondly, if both fields contain values other than the asterisks, they form a union:

30 10 1 10 5

The job from the example above executes at 10:30 on the first of October and every Friday.

Lastly, if one of the values starts with the asterisk, they form an intersection:

30 10 */1 * 1

Here, the job runs at 10:30 on every day of the month only if it falls on Monday.

5. Comparison Between * and ?

To conclude, let’s list the main differences between the * and the ? special characters in the cron expression:

The * Symbol The ? Symbol
Stands for all allowed values of a specific field Means no specific value
Can be used in any field Can be used only in fields representing the day of the month and the day of the week
Used to specify all the values from the field Used to set empty value
Can appear multiple times in the same expression Only one can exist per expression

6. Conclusion

In this article, we learned the differences between the asterisk and the question mark special characters in cron expressions.

To sum up, we’d use the * in the field of a cron expression to include all allowed values for that specific field. On the contrary, the ? represents no specific value and can be used only within the day of month and day of week fields.

Since Quartz doesn’t support implementation for both of those fields, we need to use ? in one of them to leave the field empty.

       

Overriding Spring Beans in Integration Test

$
0
0

1. Overview

We might want to override some of our application’s beans in Spring integration testing. Typically, this can be done using Spring Beans specifically defined for testing. However, by providing more than one bean with the same name in a Spring context, we might get a BeanDefinitionOverrideException.

This tutorial will show how to mock or stub integration test beans in a Spring Boot application while avoiding the BeanDefinitionOverrideException.

2. Mock or Stub in Testing

Before digging into the details, we should be confident in how to use a Mock or Stub in testing. This is a powerful technique to make sure our application is not prone to bugs.

We can also apply this approach with Spring. However, direct mocking of integration test beans is only available if we use Spring Boot.

Alternatively, we can stub or mock a bean using a test configuration.

3. Spring Boot Application Example

As an example, let’s create a simple Spring Boot application consisting of a controller, a service, and a configuration class:

@RestController
public class Endpoint {
    private final Service service;
    public Endpoint(Service service) {
        this.service = service;
    }
    @GetMapping("/hello")
    public String helloWorldEndpoint() {
        return service.helloWorld();
    }
}

The /hello endpoint will return a string provided by a service that we want to replace during testing:

public interface Service {
    String helloWorld();
}
public class ServiceImpl implements Service {
    public String helloWorld() {
        return "hello world";
    }
}

Notably, we’ll use an interface. Therefore, when required, we’ll stub the implementation to get a different value.

We also need a configuration to load the Service bean:

@Configuration
public class Config {
    @Bean
    public Service helloWorld() {
        return new ServiceImpl();
    }
}

Finally, let’s add the @SpringBootApplication:

@SpringBootApplication
public class Application {
    public static void main(String[] args) {
        SpringApplication.run(Application.class, args);
    }
}

4. Overriding Using @MockBean

MockBean has been available since version 1.4.0 of Spring Boot. We don’t need any test configuration. Therefore, it’s sufficient to add the @SpringBootTest annotation to our test class:

@SpringBootTest(classes = { Application.class, Endpoint.class })
@AutoConfigureMockMvc
class MockBeanIntegrationTest {
    @Autowired
    private MockMvc mockMvc;
    @MockBean
    private Service service;
    @Test
    void givenServiceMockBean_whenGetHelloEndpoint_thenMockOk() throws Exception {
        when(service.helloWorld()).thenReturn("hello mock bean");
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello mock bean")));
    }
}

We are confident that there is no conflict with the main configuration. This is because @MockBean will inject a Service mock into our application.

Finally, we use Mockito to fake the service return:

when(service.helloWorld()).thenReturn("hello mock bean");

5. Overriding Without @MockBean

Let’s explore more options for overriding beans without @MockBean. We’ll look at four different approaches: Spring profiles, conditional properties, the @Primary annotation, and bean definition overriding. We can then stub or mock the bean implementation.

5.1. Using @Profile

Defining profiles is a well-known practice with Spring. First, let’s create a configuration using @Profile:

@Configuration
@Profile("prod")
public class ProfileConfig {
    @Bean
    public Service helloWorld() {
        return new ServiceImpl();
    }
}

Then, we can define a test configuration with our service bean:

@TestConfiguration
public class ProfileTestConfig {
    @Bean
    @Profile("stub")
    public Service helloWorld() {
        return new ProfileServiceStub();
    }
}

The ProfileServiceStub service will stub the ServiceImpl already defined:

public class ProfileServiceStub implements Service {
    public String helloWorld() {
        return "hello profile stub";
    }
}

We can create a test class including the main and test configuration:

@SpringBootTest(classes = { Application.class, ProfileConfig.class, Endpoint.class, ProfileTestConfig.class })
@AutoConfigureMockMvc
@ActiveProfiles("stub")
class ProfileIntegrationTest {
    @Autowired
    private MockMvc mockMvc;
    @Test
    void givenConfigurationWithProfile_whenTestProfileIsActive_thenStubOk() throws Exception {
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello profile stub")));
    }
}

We activate the stub profile in the ProfileIntegrationTest. Therefore, the prod profile is not loaded. Thus, the test configuration will load the Service stub.

5.2. Using @ConditionalOnProperty

Similarly to a profile, we can use the @ConditionalOnProperty annotation to switch between different bean configurations.

Therefore, we’ll have a service.stub property in our main configuration:

@Configuration
public class ConditionalConfig {
    @Bean
    @ConditionalOnProperty(name = "service.stub", havingValue = "false")
    public Service helloWorld() {
        return new ServiceImpl();
    }
}

At runtime, we need to set this condition to false, typically in our application.properties file:

service.stub=false

Oppositely, in the test configuration, we want to trigger the Service load. Therefore, we need this condition to be true:

@TestConfiguration
public class ConditionalTestConfig {
    @Bean
    @ConditionalOnProperty(name="service.stub", havingValue="true")
    public Service helloWorld() {
        return new ConditionalStub();
    }
}

Then, let’s also add our Service stub:

public class ConditionalStub implements Service {
    public String helloWorld() {
        return "hello conditional stub";
    }
}

Finally, let’s create our test class. We’ll set the service.stub conditional to true and load the Service stub:

@SpringBootTest(classes = {  Application.class, ConditionalConfig.class, Endpoint.class, ConditionalTestConfig.class }
, properties = "service.stub=true")
@AutoConfigureMockMvc
class ConditionIntegrationTest {
    @AutowiredService
    private MockMvc mockMvc;
    @Test
    void givenConditionalConfig_whenServiceStubIsTrue_thenStubOk() throws Exception {
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello conditional stub")));
    }
}

5.3. Using @Primary

We can also use the @Primary annotation. Given our main configuration, we can define a primary service in a test configuration to be loaded with higher priority:

@TestConfiguration
public class PrimaryTestConfig {
    @Primary
    @Bean("service.stub")
    public Service helloWorld() {
        return new PrimaryServiceStub();
    }
}

Notably, the bean’s name needs to be different. Otherwise, we’ll still bump into the original exception. We can change the name property of @Bean or the method’s name.

Again, we need a Service stub:

public class PrimaryServiceStub implements Service {
    public String helloWorld() {
        return "hello primary stub";
    }
}

Finally, let’s create our test class by defining all relevant components:

@SpringBootTest(classes = { Application.class, NoProfileConfig.class, Endpoint.class, PrimaryTestConfig.class })
@AutoConfigureMockMvc
class PrimaryIntegrationTest {
    @Autowired
    private MockMvc mockMvc;
    @Test
    void givenTestConfiguration_whenPrimaryBeanIsDefined_thenStubOk() throws Exception {
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello primary stub")));
    }
}

5.4. Using spring.main.allow-bean-definition-overriding Property

What if we can’t apply any of the previous options? Spring provides the spring.main.allow-bean-definition-overriding property so we can directly override the main configuration.

Let’s define a test configuration:

@TestConfiguration
public class OverrideBeanDefinitionTestConfig {
    @Bean
    public Service helloWorld() {
        return new OverrideBeanDefinitionServiceStub();
    }
}

Then, we need our Service stub:

public class OverrideBeanDefinitionServiceStub implements Service {
    public String helloWorld() {
        return "hello no profile stub";
    }
}

Again, let’s create a test class. If we want to override the Service bean, we need to set our property to true:

@SpringBootTest(classes = { Application.class, Config.class, Endpoint.class, OverribeBeanDefinitionTestConfig.class }, 
  properties = "spring.main.allow-bean-definition-overriding=true")
@AutoConfigureMockMvc
class OverrideBeanDefinitionIntegrationTest {
    @Autowired
    private MockMvc mockMvc;
    @Test
    void givenNoProfile_whenAllowBeanDefinitionOverriding_thenStubOk() throws Exception {
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello no profile stub")));
    }
}

5.5. Using a Mock Instead of a Stub

So far, while using test configuration, we have seen examples with stubs. However, we can also mock a bean. This will work for any test configuration we have seen previously. However, to demonstrate, we’ll follow the profile example.

This time, instead of a stub, we return a Service using the Mockito mock method:

@TestConfiguration
public class ProfileTestConfig {
    @Bean
    @Profile("mock")
    public Service helloWorldMock() {
        return mock(Service.class);
    }
}

Likewise, we make a test class activating the mock profile:

@SpringBootTest(classes = { Application.class, ProfileConfig.class, Endpoint.class, ProfileTestConfig.class })
@AutoConfigureMockMvc
@ActiveProfiles("mock")
class ProfileIntegrationMockTest {
    @Autowired
    private MockMvc mockMvc;
    @Autowired
    private Service service;
    @Test
    void givenConfigurationWithProfile_whenTestProfileIsActive_thenMockOk() throws Exception {
        when(service.helloWorld()).thenReturn("hello profile mock");
        this.mockMvc.perform(get("/hello"))
          .andExpect(status().isOk())
          .andExpect(content().string(containsString("hello profile mock")));
    }
}

Notably, this works similarly to the @MockBean. However, we use the @Autowired annotation to inject a bean into the test class. Compared to a stub, this approach is more flexible and will allow us to directly use the when/then syntax inside the test cases.

6. Conclusion

In this tutorial, we learned how to override a bean during Spring integration testing.

We saw how to use @MockBean. Furthermore, we created the main configuration using @Profile or @ConditionalOnProperty to switch between different beans during tests. Also, we have seen how to give a higher priority to a test bean using @Primary.

Finally, we saw a straightforward solution using the spring.main.allow-bean-definition-overriding and override a main configuration bean.

As always, the code presented in this article is available over on GitHub.

       

Working with Exceptions in Java CompletableFuture

$
0
0

1. Introduction

Java 8 has introduced a new abstraction based on Future to run asynchronous tasks – CompletableFuture class. It basically came to overcome the issues of the old Future API.

In this tutorial, we’re going to look into the ways to work with exceptions when we use CompletableFuture.

2. CompletableFuture Recap

First, we might need to recap a little bit about what the CompletableFuture is. CompletableFuture is a Future implementation that allows us to run and, most importantly, chain asynchronous operations. In general, there are three possible outcomes for the async operation to complete – normally, exceptionally, or can be canceled from outside. CompletableFuture has various API methods to address all of these possible outcomes.

As with lots of the other methods in CompletableFuture, these methods have non-async, async, and async using specific Executor variations. So, without further delay, let’s look at ways to handle exceptions in CompletableFuture one by one.

3. handle()

First, we have a handle() method. By using this method, we can access and transform the entire result of the CompletionStage regardless of the outcome. That is, the handle() method accepts a BiFunction functional interface. So, this interface has two inputs. In the handle() method case, parameters will be the result of the previous CompletionStage and the Exception that occurred.

The important thing is that both of these parameters are optional, meaning that they can be null. This is obvious in some sense since the previous CompletionStage was completed normally. Then the Exception should be null since there was no any, similarly with CompletionStage result value nullability.

Let’s now look at an example of handle() method usage:

@ParameterizedTest
@MethodSource("parametersSource_handle")
void whenCompletableFutureIsScheduled_thenHandleStageIsAlwaysInvoked(int radius, long expected)
  throws ExecutionException, InterruptedException {
    long actual = CompletableFuture
      .supplyAsync(() -> {
          if (radius <= 0) {
              throw new IllegalArgumentException("Supplied with non-positive radius '%d'");
          }
          return Math.round(Math.pow(radius, 2) * Math.PI);
      })
      .handle((result, ex) -> {
          if (ex == null) {
              return result;
          } else {
              return -1L;
          }
      })
      .get();
    Assertions.assertThat(actual).isEqualTo(expected);
}
static Stream<Arguments> parameterSource_handle() {
    return Stream.of(Arguments.of(1, 3), Arguments.of(1, -1));
}

The thing to notice here is that the handle() method returns a new CompletionStage that will always execute, regardless of the previous CompletionStage result. So, handle() transforms the source value from the previous stage to some output value. Therefore, the value that we’re going to obtain via the get() method is the one returned from the handle() method.

4. exceptionally()

The handle() method is not always convenient, especially if we want to process exceptions only if there is one. Luckily, we have an alternative – exceptionally().

This method allows us to provide a callback to be executed only if the previous CompletionStage ended up with an Exception. If no exceptions were thrown, then the callback is omitted, and the execution chain is continued to the next callback (if any) with the value of the previous one.

To understand, let’s look at a concrete example:

@ParameterizedTest
@MethodSource("parametersSource_exceptionally")
void whenCompletableFutureIsScheduled_thenExceptionallyExecutedOnlyOnFailure(int a, int b, int c, long expected)
  throws ExecutionException, InterruptedException {
    long actual = CompletableFuture
      .supplyAsync(() -> {
          if (a <= 0 || b <= 0 || c <= 0) {
              throw new IllegalArgumentException(String.format("Supplied with incorrect edge length [%s]", List.of(a, b, c)));
          }
          return a * b * c;
      })
      .exceptionally((ex) -> -1)
      .get();
    Assertions.assertThat(actual).isEqualTo(expected);
}
static Stream<Arguments> parametersSource_exceptionally() {
    return Stream.of(
      Arguments.of(1, 5, 5, 25),
      Arguments.of(-1, 10, 15, -1)
    );
}

So here, it works in the same manner as handle(), but we have an Exception instance as a parameter to our callback. This parameter will never be null, so our code is a bit simpler now.

The important thing to notice here is the exceptionally() method’s callback executes only if the previous stage completes with an Exception. It basically means that if the Exception occurred somewhere in the execution chain, and there already was a handle() method that caught it – the excpetionally() callback won’t be executed afterward:

@ParameterizedTest
@MethodSource("parametersSource_exceptionally")
void givenCompletableFutureIsScheduled_whenHandleIsAlreadyPresent_thenExceptionallyIsNotExecuted(int a, int b, int c, long expected)
  throws ExecutionException, InterruptedException {
    long actual = CompletableFuture
      .supplyAsync(() -> {
          if (a <= 0 || b <= 0 || c <= 0) {
              throw new IllegalArgumentException(String.format("Supplied with incorrect edge length [%s]", List.of(a, b, c)));
          }
          return a * b * c;
      })
      .handle((result, throwable) -> {
          if (throwable != null) {
              return -1;
          }
          return result;
      })
      .exceptionally((ex) -> {
          System.exit(1);
          return 0;
      })
      .get();
    Assertions.assertThat(actual).isEqualTo(expected);
}

Here, exceptionally() is not invoked since the handle() method already catches the Exception, if any. Therefore, unless the Exception occurs inside the handle() method, the exceptionally() method here won’t be ever executed.

5. when completed()

We also have a whenComplete() method in the API. It accepts the BiConsumer with two parameters: the result and the exception from the previous stage, if any. This method, however, is significantly different from the ones above.

The difference is that whenComplete() will not translate any exceptional outcomes from the previous stages. So, even considering that whenComplete()‘s callback will always run, the exception from the previous stage, if any, will propagate further:

@ParameterizedTest
@MethodSource("parametersSource_whenComplete")
void whenCompletableFutureIsScheduled_thenWhenCompletedExecutedAlways(Double a, long expected) {
    try {
        CountDownLatch countDownLatch = new CountDownLatch(1);
        long actual = CompletableFuture
          .supplyAsync(() -> {
              if (a.isNaN()) {
                  throw new IllegalArgumentException("Supplied value is NaN");
              }
              return Math.round(Math.pow(a, 2));
          })
          .whenComplete((result, exception) -> countDownLatch.countDown())
          .get();
        Assertions.assertThat(countDownLatch.await(20L, java.util.concurrent.TimeUnit.SECONDS));
        Assertions.assertThat(actual).isEqualTo(expected);
    } catch (Exception e) {
        Assertions.assertThat(e.getClass()).isSameAs(ExecutionException.class);
        Assertions.assertThat(e.getCause().getClass()).isSameAs(IllegalArgumentException.class);
    }
}
static Stream<Arguments> parametersSource_whenComplete() {
    return Stream.of(
      Arguments.of(2d, 4),
      Arguments.of(Double.NaN, 1)
    );
}

As we can see here, callback inside whenCompleted() runs in both test invocations. However, in the second invocation, we completed with the ExecutionException, which has the cause of our IllegalArgumentException. So, as we can see, the exception from the callback propagates to the callee. We’ll cover the reasons why it happens in the next section.

6. Unhandled Exceptions

Finally, we need to touch on unhandled exceptions a bit. In general, if an exception remains uncaught, then the CompletableFuture completes with an Exception that doesn’t propagate to the callee. In our case above, we get the ExecutionException from the get() method invocation. So, this is because we tried to access the result when CompletableFuture ended up with an Exception.

Thus, we need to check the result of the CompletableFuture before the get() invocation. There are a couple of ways to do so. The first and probably the most familiar to all approach is via isCompletedExceptionally()/isCancelled()/isDone() methods. Those methods return a boolean in case CompletableFutre completes with the exception, is cancelled from outside, or is completed successfully.

However, it is worth mentioning that there is also a state() method that returns a State enum instance. This instance represents the state of the CompletableFuture, like RUNNING, SUCCESS, etc. So, this is another way to access the outcome of the CompletableFuture.

7. Conclusion

In this article, we’ve explored the ways to handle exceptions that occur in CompletableFuture stages.

As always, the source code for this article is available over on GitHub.

       

Manage Kafka Consumer Groups

$
0
0

1. Introduction

Consumer groups help to create more scalable Kafka applications by allowing more than one consumer to read from the same topic.

In this tutorial, we’ll understand consumer groups and how they rebalance partitions between their consumers.

2. What Are Consumer Groups?

A consumer group is a set of unique consumers associated with one or more topics. Each consumer can read from zero, one, or more than one partition. Furthermore, each partition can only be assigned to a single consumer at a given time. The partition assignment changes as the group members change. This is known as group rebalancing.

The consumer group is a crucial part of Kafka applications. This allows the grouping of similar consumers and makes it possible for them to read in parallel from a partitioned topic. Hence, it improves the performance and scalability of Kafka applications.

2.1. The Group Coordinator and the Group Leader

When we instantiate a consumer group, Kafka also creates the group coordinator. The group coordinator regularly receives requests from the consumers, known as heartbeats. If a consumer stops sending heartbeats, the coordinator assumes that the consumer has either left the group or crashed. That’s one possible trigger for a partition rebalance.

The first consumer who requests the group coordinator to join the group becomes the group leader. When a rebalance occurs for any reason, the group leader receives a list of the group members from the group coordinator. Then, the group leader reassigns the partitions among the consumers in that list using a customizable strategy set in the partition.assignment.strategy configuration.

2.2. Committed Offsets

Kafka uses the committed offset to keep track of the last position read from a topic. The committed offset is the position in the topic to which a consumer acknowledges having successfully processed. In other words, it’s the starting point for itself and other consumers to read events in subsequent rounds.

Kafka stores the committed offsets from all partitions inside an internal topic named __consumer_offsets. We can safely trust its information since topics are durable and fault-tolerant for replicated brokers.

2.3. Partition Rebalancing

A partition rebalance changes the partition ownership from one consumer to another. Kafka executes a rebalance automatically when a new consumer joins the group or when a consumer member of the group crashes or unsubscribes.

To improve scalability, when a new consumer joins the group, Kafka fairly shares the partitions from the other consumers with the newly added consumer. Additionally, when a consumer crashes, its partitions must be assigned to the remaining consumers in the group to avoid the loss of any unprocessed messages.

The partition rebalance uses the __consumer_offsets topic to make a consumer start reading a reassigned partition from the correct position.

During a rebalance, consumers can’t consume messages. In other words, the broker becomes unavailable until the rebalance is done. Additionally, consumers lose their state and need to recalculate their cached values. The unavailability and cache recalculation during partition rebalance make the event consumption slower.

3. Setting up the Application

In this section, we’ll configure the basics to get a Spring Kafka application up and running.

3.1. Creating the Basic Configurations

First, let’s configure the topic and its partitions:

@Configuration
public class KafkaTopicConfiguration {
    @Value(value = "${spring.kafka.bootstrap-servers}")
    private String bootstrapAddress;
    public KafkaAdmin kafkaAdmin() {
        Map<String, Object> configs = new HashMap<>();
        configs.put(AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
        return new KafkaAdmin(configs);
    }
    public NewTopic celciusTopic() {
        return TopicBuilder.name("topic-1")
            .partitions(2)
            .build();
    }
}

The above configuration is straightforward. We’re simply configuring a new topic named topic-1 with two partitions.

Now, let’s configure the producer:

@Configuration
public class KafkaProducerConfiguration {
    @Value(value = "${spring.kafka.bootstrap-servers}")
    private String bootstrapAddress;
    @Bean
    public ProducerFactory<String, Double> kafkaProducer() {
        Map<String, Object> configProps = new HashMap<>();
        configProps.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
        configProps.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class);
        configProps.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, DoubleSerializer.class);
        return new DefaultKafkaProducerFactory<>(configProps);
    }
    @Bean
    public KafkaTemplate<String, Double> kafkaProducerTemplate() {
        return new KafkaTemplate<>(kafkaProducer());
    }
}

In the Kafka producer configuration above, we’re setting the broker address and the serializers that they use to write messages.

Finally, let’s configure the consumer:

@Configuration
public class KafkaConsumerConfiguration {
    @Value(value = "${spring.kafka.bootstrap-servers}")
    private String bootstrapAddress;
    @Bean
    public ConsumerFactory<String, Double> kafkaConsumer() {
        Map<String, Object> props = new HashMap<>();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapAddress);
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, DoubleDeserializer.class);
        return new DefaultKafkaConsumerFactory<>(props);
    }
    @Bean
    public ConcurrentKafkaListenerContainerFactory<String, Double> kafkaConsumerContainerFactory() {
        ConcurrentKafkaListenerContainerFactory<String, Double> factory = new ConcurrentKafkaListenerContainerFactory<>();
        factory.setConsumerFactory(kafkaConsumer());
        return factory;
    }
}

3.2. Setting up the Consumers

In our demo application, we’ll start with two consumers that belong to the same group named group-1 from topic-1:

@Service
public class MessageConsumerService {
    @KafkaListener(topics = "topic-1", groupId = "group-1")
    public void consumer0(ConsumerRecord<?, ?> consumerRecord) {
        trackConsumedPartitions("consumer-0", consumerRecord);
    }
    @KafkaListener(topics = "topic-1", groupId = "group-1")
    public void consumer1(ConsumerRecord<?, ?> consumerRecord) {
        trackConsumedPartitions("consumer-1", consumerRecord);
    }
}

The MessageConsumerService class registers two consumers to listen to topic-1 inside group-1 using the @KafkaListener annotation.

Now, let’s also define a field and a method in the MessageConsumerService class to keep track of the consumed partition:

Map<String, Set<Integer>> consumedPartitions = new ConcurrentHashMap<>();
private void trackConsumedPartitions(String key, ConsumerRecord<?, ?> record) {
    consumedPartitions.computeIfAbsent(key, k -> new HashSet<>());
    consumedPartitions.computeIfPresent(key, (k, v) -> {
        v.add(record.partition());
        return v;
    });
}

In the code above, we used ConcurrentHashMap to map each consumer name to a HashSet of all partitions consumed by that consumer.

4. Visualizing Partition Rebalance When a Consumer Leaves

Now that we have all configurations set up and the consumers registered, we can visualize what Kafka does when one of the consumers leaves group-1. To do that, let’s define the skeleton for the Kafka integration test that uses an embedded broker:

@SpringBootTest(classes = ManagingConsumerGroupsApplicationKafkaApp.class)
@EmbeddedKafka(partitions = 2, brokerProperties = {"listeners=PLAINTEXT://localhost:9092", "port=9092"})
public class ManagingConsumerGroupsIntegrationTest {
    private static final String CONSUMER_1_IDENTIFIER = "org.springframework.kafka.KafkaListenerEndpointContainer#1";
    private static final int TOTAL_PRODUCED_MESSAGES = 50000;
    private static final int MESSAGE_WHERE_CONSUMER_1_LEAVES_GROUP = 10000;
    @Autowired
    KafkaTemplate<String, Double> kafkaTemplate;
    @Autowired
    KafkaListenerEndpointRegistry kafkaListenerEndpointRegistry;
    @Autowired
    MessageConsumerService consumerService;
}

In the above code, we inject the necessary beans to produce and consume messages: kafkaTemplate and consumerService. We’ve also injected the bean kafkaListenerEndpointRegistry to manipulate registered consumers.

Finally, we defined three constants that will be used in our test case.

Now, let’s define the test case method:

@Test
public void givenContinuousMessageFlow_whenAConsumerLeavesTheGroup_thenKafkaTriggersPartitionRebalance() throws InterruptedException {
    int currentMessage = 0;
    do {
        kafkaTemplate.send("topic-1", RandomUtils.nextDouble(10.0, 20.0));
        currentMessage++;
        if (currentMessage == MESSAGE_WHERE_CONSUMER_1_LEAVES_GROUP) {
            String containerId = kafkaListenerEndpointRegistry.getListenerContainerIds()
                .stream()
                .filter(a -> a.equals(CONSUMER_1_IDENTIFIER))
                .findFirst()
                .orElse("");
            MessageListenerContainer container = kafkaListenerEndpointRegistry.getListenerContainer(containerId);
            Thread.sleep(2000);
            Objects.requireNonNull(container).stop();
            kafkaListenerEndpointRegistry.unregisterListenerContainer(containerId);
        }
    } while (currentMessage != TOTAL_PRODUCED_MESSAGES);
    Thread.sleep(2000);
    assertEquals(1, consumerService.consumedPartitions.get("consumer-1").size());
    assertEquals(2, consumerService.consumedPartitions.get("consumer-0").size());
}

In the test above, we’re creating a flow of messages, and at a certain point, we remove one of the consumers so Kafka will reassign its partitions to the remaining consumer. Let’s break down the logic to make it more transparent:

  1. The main loop uses kafkaTemplate to produce 50,000 events of random numbers using Apache Commons’ RandomUtils. When an arbitrary number of messages is produced —10,000 in our case — we stop and unregister one consumer from the broker.
  2. To unregister a consumer, we first use a stream to search for the matching consumer in the container and retrieve it using the getListenerContainer() method. Then, we call stop() to stop the container Spring component’s execution. Finally, we call unregisterListenerContainer() to programmatically unregister the listener associated with the container variable from the Kafka Broker.

Before discussing the test assertions, let’s glance at a few log lines that Kafka generated during the test execution.

The first vital line to see is the one that shows the LeaveGroup request made by consumer-1 to the group coordinator:

INFO o.a.k.c.c.i.ConsumerCoordinator - [Consumer clientId=consumer-group-1-1, groupId=group-1] Member consumer-group-1-1-4eb63bbc-336d-44d6-9d41-0a862029ba95 sending LeaveGroup request to coordinator localhost:9092

Then, the group coordinator automatically triggers a rebalance and shows the reason behind that:

INFO  k.coordinator.group.GroupCoordinator - [GroupCoordinator 0]: Preparing to rebalance group group-1 in state PreparingRebalance with old generation 2 (__consumer_offsets-4) (reason: Removing member consumer-group-1-1-4eb63bbc-336d-44d6-9d41-0a862029ba95 on LeaveGroup)

Returning to our test, we’ll assert that the partition rebalance occurred correctly. Since we unregistered the consumer ending in 1, its partitions should be reassigned to the remaining consumer, which is consumer-0. Hence, we’ve used the map of tracked consumed records to check that consumer-1 only consumed from one partition, whereas consumer-0 consumed from two partitions.

4. Useful Consumer Configurations

Now, let’s talk about a few consumer configurations that impact partition rebalance and the trade-offs of setting specific values for them.

4.1. Session Timeouts and Heartbeats Frequency

The session.timeout.ms parameter indicates the maximum time in milliseconds that the group coordinator can wait for a consumer to send a heartbeat before triggering a partition rebalance. Alongside session.timeout.ms, the heartbeat.interval.ms indicates the frequency in milliseconds that a consumer sends heartbeats to the group coordinator.

We should modify the consumer timeout and heartbeat frequency together so that heartbeat.interval.ms is always lower than session.timeout.ms. This is because we don’t want to let a consumer die by timeout before sending their heartbeats. Typically, we set the heartbeat interval to 33% of the session timeout to guarantee that more than one heartbeat is sent before the consumer dies.

The default consumer session timeout is set to 45 seconds. We can modify that value as long as we understand the trade-offs of modifying it.

When we set the session timeout lower than the default, we increase the speed at which the consumer group recovers from a failure, improving the group availability. However, in Kafka versions before 0.10.1.0, if the main thread of a consumer is blocked when consuming a message that takes longer than the session timeout, the consumer can’t send heartbeats. Therefore, the consumer is considered dead, and the group coordinator triggers an unwanted partition rebalance. This was fixed in KIP-62, introducing a background thread that only sends heartbeats.

If we set higher values for the session timeout, we lose at detecting failures faster. However, this might fix the unwanted partition rebalance problem mentioned above for Kafka versions older than o.10.1.0.

4.2. Max Poll Interval Time

Another configuration is the max.poll.interval.ms, indicating the maximum time the broker can wait for idle consumers. After that time passes, the consumer stops sending heartbeats until it reaches the session timeout configured and leaves the group. The default wait time for max.poll.interval.ms is five minutes.

If we set higher values for max.poll.interval.ms, we’re giving more room for consumers to remain idle, which might be helpful to avoid rebalances. However, increasing that time might also increase the number of idle consumers if there are no messages to consume. This can be a problem in a low-throughput environment because consumers can remain idle longer, increasing infrastructure costs.

5. Conclusion

In this article, we’ve looked at the fundamentals of the roles of the group leader and the group coordinator. We’ve also looked into how Kafka manages consumer groups and partitions.

We’ve seen in practice how Kafka automatically rebalances the partitions within the group when one of its consumers leaves the group.

It’s essential to understand when Kafka triggers partition rebalance and tune the consumer configurations accordingly.

As always, the source code for the article is available over on GitHub.

       
Viewing all 4464 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>