Quantcast
Channel: Baeldung
Viewing all 4476 articles
Browse latest View live

Extracting Text Between Parentheses in Java

$
0
0

1. Overview

When we code in Java, there are many scenarios where we need to extract text enclosed within parentheses. Understanding how to retrieve the text between parentheses is an essential skill.

In this tutorial, we’ll explore different methods to achieve this, focusing on regular expressions and some popular external libraries.

2. Introduction to the Problem

When our input contains only one pair of parentheses, we can extract the content between them using two replaceAll() method calls:

String myString = "a b c (d e f) x y z";
 
String result = myString.replaceAll(".*[(]", "")
  .replaceAll("[)].*", "");
assertEquals("d e f", result);

As the example above shows, the first replaceAll() removes everything until the ‘(‘ character. The second replaceAll() removes from ‘)‘ until the end of the StringThus, the rest is the text between ‘(‘ and ‘)‘.

However, this approach won’t work if our input has multiple “(…)” pairs. For example, let’s say we have another input:

static final String INPUT = "a (b c) d (e f) x (y z)";

There are three pairs of parentheses in INPUT. Therefore, we expect to see extracted values in the following String List:

static final List<String> EXPECTED = List.of("b c", "e f", "y z");

Next, let’s see how to extract these String values from the INPUT String.

For simplicity, we’ll leverage unit test assertions to verify whether each approach works as expected.

3. Greedy vs Non-greedy Regex Pattern

Regular expressions (regex) provide a powerful and flexible method for pattern matching and text extraction. So, let’s use regex to do the job.

Some of us may come up with this pattern to extract text between ‘(‘ and ‘)’: “[(](.*)[)]“. This pattern is pretty straightforward:

  • [(] and [)] matches the literal ‘(‘ and ‘)’
  • (.*) is a capturing group that matches anything between ‘(‘ and ‘)’

Next, let’s check if this pattern solves the problem correctly:

String myRegex = "[(](.*)[)]";
Matcher myMatcher = Pattern.compile(myRegex)
  .matcher(INPUT);
List<String> myResult = new ArrayList<>();
while (myMatcher.find()) {
    myResult.add(myMatcher.group(1));
}
assertEquals(List.of("b c) d (e f) x (y z"), myResult);

As the above test shows, using this pattern, we only have one String element in the result List: “b c) d (e f) x (y z”. This is because the ‘*’ quantifier applies a greedy match. In other words, “[(](.*)[)]” matches the first ‘(‘ in the input and then everything up to the last ‘)’ character, even if the content includes other “(…)” pairs.

This isn’t what we expected. To solve the problem, we need non-greedy matching, which means the pattern must match each “(…)” pair.

To make the ‘*’ quantifier non-greedy, we can add a question mark ‘?’ after it: “[(](*?)[)]“.

Next, let’s test if this pattern can extract the expected String elements:

String regex = "[(](.*?)[)]";
List<String> result = new ArrayList<>();
Matcher matcher = Pattern.compile(regex)
  .matcher(INPUT);
while (matcher.find()) {
    result.add(matcher.group(1));
}
assertEquals(EXPECTED, result);

As we can see, the non-greedy regex pattern “[(](.*?)[)]” does the job.

4. Using the Negated Character Class

Apart from using the non-greedy quantifier (*?), we can also solve the problem using regex’s negated character class:

String regex = "[(]([^)]*)";
List<String> result = new ArrayList<>();
Matcher matcher = Pattern.compile(regex)
  .matcher(INPUT);
while (matcher.find()) {
    result.add(matcher.group(1));
}
assertEquals(EXPECTED, result);

As the code shows, our regex pattern to extract texts between parentheses is “[(]([^)]*)“. Let’s break it down to understand how it works:

  • [(] – Matches the literal ‘(‘ character
  • [^)]* – Matches any characters if it isn’t ‘)’; As it follows [(], it only matches characters inside the parentheses.
  • ([^)]*) – We create a capturing group to extract the text between parentheses without including any opening or closing parenthesis.

Alternatively, we can replace the “[(]” character class with a positive lookbehind assertion “(?<=[(])“. Lookbehind assertions allow us to match a group of characters only if a specified pattern precedes them. In this example, (?<=[(]) asserts that what immediately precedes the current position is an opening parenthesis ‘(‘:

String regex = "(?<=[(])[^)]*";
List<String> result = new ArrayList<>();
Matcher matcher = Pattern.compile(regex)
    .matcher(INPUT);
while (matcher.find()) {
    result.add(matcher.group());
}
assertEquals(EXPECTED, result);

It’s worth noting that since lookaround is a zero-width assertion, the ‘(‘ character won’t be captured. Thus, we don’t need to create a capturing group to extract the expected text.

5. Using StringUtils From Apache Commons Lang 3

Apache Commons Lang 3 is a widely used library. Its StringUtils class offers a rich set of convenient methods for manipulating String values.

If we have only one pair of parentheses in the input, the StringUtils.substringBetween() method allows us to extract the String between them straightforwardly:

String myString = "a b c (d e f) x y z";
 
String result = StringUtils.substringBetween(myString, "(", ")");
assertEquals("d e f", result);

When the input has multiple pairs of parentheses, StringUtils.substringsBetween() returns texts inside parentheses pairs in an array:

String[] results = StringUtils.substringsBetween(INPUT, "(", ")");
assertArrayEquals(EXPECTED.toArray(), results);

If we’re using the Apache Commons Lang 3 library already in our project, these two methods are good choices for this task.

6. Conclusion

In this article, we’ve explored different ways to extract text between parentheses in Java. By understanding and applying these techniques, we can efficiently parse and process text in our Java applications.

As always, the complete source code for the examples is available over on GitHub.

       

Reading CSV Headers Into a List

$
0
0

1. Overview

In this short tutorial, we’ll explore different ways of reading CSV headers into a list in Java.

First, we’ll learn how to do this using JDK classes. Then, we’ll see how to achieve the same objective using external libraries such as OpenCSV and Apache Commons CSV.

2. Using BufferedReader

The BufferedReader class provides the easiest solution to tackle our challenge. It offers a fast and efficient way to read a CSV file as it reduces the number of IO operations by reading the content chunk by chunk.

So, let’s see it in action:

class CsvHeadersAsListUnitTest {
    private static final String CSV_FILE = "src/test/resources/employees.csv";
    private static final String COMMA_DELIMITER = ",";
    private static final List<String> EXPECTED_HEADERS = List.of("ID", "First name", "Last name", "Salary");
    @Test
    void givenCsvFile_whenUsingBufferedReader_thenGetHeadersAsList() throws IOException {
        try (BufferedReader reader = new BufferedReader(new FileReader(CSV_FILE))) {
            String csvHeadersLine = reader.readLine();
            List<String> headers = Arrays.asList(csvHeadersLine.split(COMMA_DELIMITER));
            assertThat(headers).containsExactlyElementsOf(EXPECTED_HEADERS);
        }
    }
}

As we can see, we use try-with-resources to create a BufferedReader instance. That way, we make sure that the file is closed afterward. Furthermore, we invoke the readLine() method once to extract the first line which denotes the headers. Finally, we use the split() method alongside Arrays#asList to get the headers as a list.

3. Using Scanner

The Scanner class provides another solution to achieve the same outcome. As the name implies, it scans and reads the content of a given file. So, let’s add another test case to see how to use Scanner to read CSV file headers:

@Test
void givenCsvFile_whenUsingScanner_thenGetHeadersAsList() throws IOException {
    try(Scanner scanner = new Scanner(new File(CSV_FILE))) {
        String csvHeadersLine = scanner.nextLine();
        List<String> headers = Arrays.asList(csvHeadersLine.split(COMMA_DELIMITER));
        assertThat(headers).containsExactlyElementsOf(EXPECTED_HEADERS);
    }
}

Similarly, the Scanner class has the nextLine() method that we can use to get the first line of the input file. Here, the first line represents the headers of our CSV file.

4. Using OpenCSV

Alternatively, we can use the OpenCSV library to read the headers of a particular CSV file. Before getting into the nitty-gritty, let’s add its Maven dependency to the pom.xml file:

<dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>5.9</version>
</dependency>

Typically, OpenCSV comes with a set of ready-to-use classes and methods for reading and parsing CSV files. So, let’s exemplify the use of this library with a practical example:

@Test
void givenCsvFile_whenUsingOpenCSV_thenGetHeadersAsList() throws CsvValidationException, IOException {
    try (CSVReader csvReader = new CSVReader(new FileReader(CSV_FILE))) {
        List<String> headers = Arrays.asList(csvReader.readNext());
        assertThat(headers).containsExactlyElementsOf(EXPECTED_HEADERS);
    }
}

As we see above, OpenCSV offers the class CSVReader to read a given file’s content. The CSVReader class provides the readNext() method to retrieve the next first line directly as a String array.

5. Using Apache Commons CSV

Another solution is to use the Apache Commons CSV library. As the name suggests, it offers several handy features for creating and reading CSV files.

To start, we need to add the latest version of its dependency to pom.xml:

<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-csv</artifactId>
    <version>1.11.0</version>
</dependency>

In short, the CSVParser class of Apache Commons CSV provides the getHeaderNames() method to return a read-only list of header names:

@Test
void givenCsvFile_whenUsingApacheCommonsCsv_thenGetHeadersAsList() throws IOException {
    CSVFormat csvFormat = CSVFormat.DEFAULT.builder()
      .setDelimiter(COMMA_DELIMITER)
      .setHeader()
      .build();
    try (BufferedReader reader = new BufferedReader(new FileReader(CSV_FILE));
        CSVParser parser = CSVParser.parse(reader, csvFormat)) {
        List<String> headers = parser.getHeaderNames();
        assertThat(headers).containsExactlyElementsOf(EXPECTED_HEADERS);
    }
}

Here, we use the CSVParser class to parse the input file according to the format specified. The headers are parsed automatically from the input file with the help of the setHeader() method.

6. Conclusion

In this short article, we explored different solutions for reading CSV file’s headers as a list.

First, we learned how to do this using JDK. Then, we saw how to achieve the same objective using external libraries.

As always, the code used in this article can be found over on GitHub.

       

How to Convert org.w3c.dom.Document to String in Java

$
0
0
start here featured

1. Overview

When handling XML in Java, we’ll often have an instance of a org.w3c.dom.Document that we need to convert to a String. Typically we might want to do this for a number of reasons, such as serialization, logging, and working with HTTP requests or responses.

In this quick tutorial, we’ll see how to convert a Document to a String. To learn more about working with XML in Java, check out our comprehensive series on XML.

2. Creating a Simple Document

Throughout this tutorial, the focus of our examples will be a simple XML document describing some fruit:

<fruit>
    <name>Apple</name>
    <color>Red</color>
    <weight unit="grams">150</weight>
    <sweetness>7</sweetness>
</fruit>

Let’s go ahead and create an XML Document object from that string:

private static final String FRUIT_XML = "<fruit><name>Apple</name><color>Red</color><weight unit=\"grams\">150</weight><sweetness>7</sweetness></fruit>"; 
public static Document getDocument() throws SAXException, IOException, ParserConfigurationException {
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    Document document = factory.newDocumentBuilder()
      .parse(new InputSource(new StringReader(FRUIT_XML)));
    return document;
}

As we can see we create a factory for building a new Document, and then we call the parse method with the content of the given input source. In this case, our input source is a StringReader object containing our Fruit XML string payload.

3. Conversion Using XML Transformation APIs

The javax.xml.transform package contains a set of generic APIs for performing transformations from a source to a result. In our case, the source is the XML document and the result is the output string:

public static String toString(Document document) throws TransformerException {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    StringWriter stringWriter = new StringWriter();
    transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
    return stringWriter.toString();
}

Let’s walk through the key parts of our toString method:

First, we start by creating our TransformerFactory. We’ll use this factory to create the transformer, and in this example, the transformer will simply use the platform’s default.

Now, we can specify the source and result of the transformation. Here, we’ll use our Document to construct a DOM source and a StringWriter to hold the result.

Finally, we call toString on our StringWriter object, which returns the character stream’s current value as a string.

4. Unit Testing

Now we have a simple way to convert XML documents to strings, let’s go ahead and test it works properly:

@Test
public void givenXMLDocument_thenConvertToStringSuccessfully() throws Exception {
    Document document = XmlDocumentToString.getDocument();
    String expectedDeclartion = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>";
    assertEquals(expectedDeclartion + XmlDocumentToString.FRUIT_XML, XmlDocumentToString.toString(document));
}

Note that our conversion adds the standard XML declaration to the start of the string by default. In our test, we simply check that the converted string matches the original fruit XML, including the standard declaration.

5. Customizing the Output

Now, let’s take a look at our output. By default, our transformer doesn’t apply any kind of output formatting:

<?xml version="1.0" encoding="UTF-8" standalone="no"?><fruit><name>Apple</name><color>Red</color><weight unit="grams">150</weight><sweetness>7</sweetness></fruit>

Obviously, it doesn’t take long for our XML documents to become difficult to read using this one-line formatting, especially for large documents. Fortunately, the Transformer interface provides a variety of output properties to help us

Let’s refactor our transformation code a little bit using some of these output properties:

public static String toStringWithOptions(Document document) throws TransformerException {
    Transformer transformer = getTransformer();
    transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
    transformer.setOutputProperty(OutputKeys.INDENT, "yes");
    transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
    StringWriter stringWriter = new StringWriter();
    transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
    return stringWriter.toString();
}
private static Transformer getTransformer() throws TransformerConfigurationException {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    return transformerFactory.newTransformer();
}

Sometimes, we might want to exclude the XML declaration. We can configure our transformer to do this by setting the OutputKeys.OMIT_XML_DECLARATION property.

Now, to apply some indentation, we can use two properties: OutputKeys.INDENT and the indent-amount property to specify the amount of indentation. This will indent the output correctly, as by default, the indentation uses zero spaces.

With the above properties set, we get a much nicer-looking output:

<fruit>
    <name>Apple</name>
    <color>Red</color>
    <weight unit="grams">150</weight>
    <sweetness>7</sweetness>
</fruit>

6. Conclusion

In this short article, we learned how to create an XML Document from a Java String object, and then we saw how to convert this Document back into a String using the javax.xml.transform package.

In addition to this, we also saw several ways we can customize the output of the XML, which can be useful when logging the XML to the console.

As always, the full source code of the article is available over on GitHub.

       

gRPC Authentication in Java Using Application Layer Transport Security (ALTS)

$
0
0
start here featured

1.  Overview

In this tutorial, we’ll explore the role of ALTS (Application Layer Transport Security) in gRPC applications. As we know, ensuring authentication and data security is difficult but essential in a distributed architecture.

ALTS is a custom built-in mutual authentication and transport encryption solution from Google that is available exclusively in Google’s cloud infrastructure. ALTS simplifies authentication and data encryption between gRPC services and can be enabled with minimal code changes. Hence, it’s popular among developers as they can focus more on writing business logic.

2. Key Differences Between ALTS and TLS

ALTS is similar to TLS but has a different trust model optimized for Google’s infrastructure. Let’s quickly take a look at the key differences between them:

Features ALTS TLS
Trust Model Identity-based relying on GCP IAM Service Accounts Certificate-based, requires certificate management, including renewal and revocation
Design Simpler Complex
Usage Context Used for securing gRPC services running on Google data centers Used for securing Web browsing (HTTPS), email, instant messaging, VoIP, etc.
Message Serialization Uses Protocol Buffers Uses X.509 certificates encoded with ASN.1
Performance Designed for general use Optimized for low-latency, high-throughput communications in Google’s data centers

3. Sample Application Using ALTS

The ALTS feature is enabled by default on the Google Cloud Platform (GCP). It uses GCP service accounts to secure RPC calls between gRPC services. Specifically, it runs on Google Compute Engine or Kubernetes Engine (GKE) within Google’s infrastructure.

Let’s assume there’s an Operation Theater (OT) booking system in a hospital that consists of a front-end and a backend service:

The OT Booking system comprises two services running in the Google Cloud Platform (GCP). A front-end service makes remote procedure calls to the backend service. We’ll develop the services using the gRPC framework. Considering the sensitive nature of the data, we’ll utilize the built-in ALTS feature in GCP to enable authentication and encryption for the transit data.

First, let’s define the protobuf ot_booking.proto file:

syntax = "proto3";
package otbooking;
option java_multiple_files = true;
option java_package = "com.baeldung.grpc.alts.otbooking";
service OtBookingService {
  rpc getBookingInfo(BookingRequest) returns (BookingResponse) {}
}
message BookingRequest {
  string patientID = 1;
  string doctorID = 2;
  string description = 3;
}
message BookingResponse {
  string bookingDate = 1;
  string condition = 2;
}

Basically, we declared a service OtBookingService with the RPC getBookingInfo(), and two DTOs BookingRequest and BookingResponse in the protobuf file.

Next, let’s have a look at the important classes of this application:

 

The Maven plugin compiles the protobuf file and auto-generates some classes such as OtBookingServiceGrpc, OtBookingServiceImplBase, BookingRequest, and BookingResponse. We’ll use the gRPC library class AltsChannelBuilder to enable ALTS to create the ManagedChannel object on the client side. Finally, we’ll use OtBookingServiceGrpc to generate the OtBookingServiceBlockingStub to call the RPC getBookingInfo() method running on the server side.

Like AltsChannelBuilder, the AltsServerBuilder class helps enable ALTS on the server side. We register the interceptor ClientAuthInterceptor to help authenticate the client. Finally, we register the OtBookingService to the io.grpc.Server object and then start the service.

Furthermore, we’ll discuss the implementation in the next section.

4. Application Implementation Using ALTS

Let’s implement the classes we discussed earlier. Then, we’ll demonstrate by running the services on the GCP virtual machines.

4.1. Prerequisite

Since ALTS is a built-in feature in GCP, we’ll have to provision a few cloud resources for running the sample application.

First, we’ll create two IAM service accounts to associate them with the front-end and back-end servers respectively:

 

Then, we’ll create two virtual machines hosting the front-end and back-end services respectively:

The virtual machine prod-booking-client-vm is associated with prod-ot-booking-client-svc service account. Similarly, prod-booking-service-vm is associated with prod-ot-booking-svc service account. The service accounts serve as the servers’ identities and ALTS uses them for authorization and encryption.

4.2. Implementation

Let’s first start with an entry into the pom.xml file to resolve the Maven dependency:

<dependency>
    <groupId>io.grpc</groupId>
    <artifactId>grpc-alts</artifactId>
    <version>1.63.0</version>
</dependency>

Then, we’ll implement the backend, starting with the AltsBookingServer class:

public class AltsOtBookingServer {
    public static void main(String[] args) throws IOException, InterruptedException {
        final String CLIENT_SERVICE_ACCOUNT = args[0];
        Server server = AltsServerBuilder.forPort(8080)
          .intercept(new ClientAuthInterceptor(CLIENT_SERVICE_ACCOUNT))
          .addService(new OtBookingService())
          .build();
        server.start();
        server.awaitTermination();
    }
}

gRPC provides a special class AltsServerBuilder for configuring the server in ALTS mode. We’ve registered the ClientAuthInterceptor on the server to intercept all the RPCs before they hit the endpoints in the OtBookingService class.

Let’s take a look at the ClientAuthInterceptor class:

public class ClientAuthInterceptor implements ServerInterceptor {
    String clientServiceAccount = null;
    public ClientAuthInterceptor(String clientServiceAccount) {
        this.clientServiceAccount = clientServiceAccount;
    }
    @Override
    public <ReqT, RespT> ServerCall.Listener<ReqT> interceptCall(ServerCall<ReqT, RespT> serverCall, Metadata metadata,
        ServerCallHandler<ReqT, RespT> serverCallHandler) {
        Status status = AuthorizationUtil.clientAuthorizationCheck(serverCall,
            Lists.newArrayList(this.clientServiceAccount));
        if (!status.isOk()) {
            serverCall.close(status, new Metadata());
        }
        return serverCallHandler.startCall(serverCall, metadata);
    }
}

All the RPCs hit the intercept() method in ClientAuthInterceptor. Then, we invoke the clientAuthorizationCheck() method of the gRPC library class AuthorizationUtil to authorize the client service account. Finally, the RPC moves further only when the authorization is successful.

Next, let’s take a look at the front-end service:

public class AltsOtBookingClient {
    public static void main(String[] args) {
        final String SERVER_ADDRESS = args[0];
        final String SERVER_ADDRESS_SERVICE_ACCOUNT = args[1];
        ManagedChannel managedChannel = AltsChannelBuilder.forTarget(SERVER_ADDRESS)
          .addTargetServiceAccount(SERVER_ADDRESS_SERVICE_ACCOUNT)
          .build();
        OtBookingServiceGrpc.OtBookingServiceBlockingStub OTBookingServiceStub = OtBookingServiceGrpc
          .newBlockingStub(managedChannel);
        BookingResponse bookingResponse = OTBookingServiceStub.getBookingInfo(BookingRequest.newBuilder()
          .setPatientID("PT-1204")
          .setDoctorID("DC-3904")
          .build());
        managedChannel.shutdown();
    }
}

Similar to AltsServerBuilder, gRPC offers an AltsChannelBuilder class for enabling ALTS on the client side. We can call the addTargetServiceAccount() method multiple times to add more than one potential target service account. Further, we initiate the RPC by calling the getBookingInfo() method on the stub.

The same service account can be associated with multiple virtual machines. Hence, it provides a certain degree of flexibility and agility to scale the services horizontally.

4.3. Run on Google Compute Engine

Let’s login to both servers and then clone the GitHub repository hosting the source code of the demo gRPC service:

git clone https://github.com/eugenp/tutorials.git

After cloning, we’ll compile the code in the tutorials/grpc directory:

mvn clean compile

Post successful compilation, we’ll start the backend service in prod-booking-service-vm:

mvn exec: java -Dexec.mainClass="com.baeldung.grpc.alts.server.AltsOtBookingServer" \
-Dexec.arguments="prod-ot-booking-client-svc@grpc-alts-demo.iam.gserviceaccount.com"

We ran the AltsOtBookingServer class with the service account of the front-end client as an argument.

Once the service is up and running, we’ll initiate an RPC from the front-end service running on the virtual machine prod-booking-client-vm:

mvn exec:java -Dexec.mainClass="com.baeldung.grpc.alts.client.AltsOtBookingClient" \
-Dexec.arguments="10.128.0.2:8080,prod-ot-booking-svc@grpc-alts-demo.iam.gserviceaccount.com"

We ran the AltsOtBookingClient class with two arguments. The first argument is the target server where the backend service is running and the second is the service account associated with the backend server.

The command runs successfully and the service returns a response after authenticating the client:

Let’s suppose we disable the client service account:

As a result, the ALTS prevents the RPC from reaching the backend service:

The RPC fails with the status UNAVAILABLE.

Now, let’s disable the service account of the backend server:

Surprisingly, the RPC goes through but after restarting the servers it fails like the earlier scenario:

It seems that ALTS was caching the service account status earlier, but after the server restart, the RPC failed with the status UNKNOWN.

5. Conclusion

In this article, we delved into the gRPC Java library supporting ALTS. With minimal code, ALTS can be enabled in gRPC services. It also provides greater flexibility in controlling the authorization of gRPC services with the help of GCP IAM service accounts.

However, it works only in GCP infrastructure as it’s provided out of the box. Hence, to run gRPC services outside of GCP infrastructure, TLS support in gRPC is crucial and must be manually configured.

As usual, the code used here is available over on GitHub.

       

Autowiring an Interface With Multiple Implementations

$
0
0

1. Introduction

In this article, we’ll explore autowiring an interface with multiple implementations in Spring Boot, ways to do that, and some use cases. This is a powerful feature that allows developers to inject different implementations of the interface into the application dynamically.

2. Default Behavior

Usually, when we have multiple interface implementations and try to autowire that interface into the component, we’ll get an error – “required a single bean, but X were found”. The reason is simple: Spring doesn’t know which implementation we want to see in that component. Fortunately, Spring provides multiple tools to be more specific.

3. Introducing Qualifiers

With the @Qualifier annotation, we specify which bean we want to autowire among multiple candidates. We can apply it to the component itself to give it a custom qualifier name:

@Service
@Qualifier("goodServiceA-custom-name")
public class GoodServiceA implements GoodService {
    // implemantation
}

After that, we annotate parameters with @Qualifier to specify which implementation we want:

@Autowired
public SimpleQualifierController(
    @Qualifier("goodServiceA-custom-name") GoodService niceServiceA,
    @Qualifier("goodServiceB") GoodService niceServiceB,
    GoodService goodServiceC
) {
        this.goodServiceA = niceServiceA;
        this.goodServiceB = niceServiceB;
        this.goodServiceC = goodServiceC;
}

In the example above, we can see that we used our custom qualifier to autowire GoodServiceA. At the same time, for GoodServiceB, we do not have a custom qualifier:

@Service
public class GoodServiceB implements GoodService {
    // implementation
}

In this case, we autowired the component by class name. The qualifier for such autowiring should be in the camel case, for example “myAwesomeClass” is a valid qualifier if the class name was “MyAwesomeClass.

The third parameter in the above code is even more interesting. We didn’t even need to annotate it with @Qualifier, because Spring will try to autowire the component by parameter name by default, and if GoodServiceC exists we’ll avoid the error:

@Service 
public class GoodServiceC implements GoodService { 
    // implementation 
}

4. Primary Component

Furthermore, we can annotate one of the implementations with @Primary. Spring will use this implementation if there are multiple candidates and autowiring by parameter name or a qualifier is not applicable:

@Primary
@Service
public class GoodServiceC implements GoodService {
    // implementation
}

It is useful when we frequently use one of the implementations and helps to avoid the “required a single bean” error.

5. Profiles

It is possible to use Spring profiles to decide which component to autowire. For example, we may have a FileStorage interface with two implementations – S3FileStorage and AzureFileStorage. We can make S3FileStorage active only on the prod profile and AzureFileStorage only for the dev profile.

@Service
@Profile("dev")
public class AzureFileStorage implements FileStorage {
    // implementation
}
@Service
@Profile("prod")
public class S3FileStorage implements FileStorage {
    // implementation
}

6. Autowire Implementations Into a Collection

Spring allows us to inject all available beans of a specific type into a collection. Here is how we autowire all implementations of the GoodService into a list:

@Autowired
public SimpleCollectionController(List<GoodService> goodServices) {
    this.goodServices = goodServices;
}

Also, we can autowire implementations into a set, a map, or an array. When using a map, the format typically is Map<String, GoodService>, where the keys are the names of the beans, and the values are the bean instances themselves:

@Autowired
public SimpleCollectionController(Map<String, GoodService> goodServiceMap) {
        this.goodServiceMap = goodServiceMap;
}
public void printAllHellos() {
    String messageA = goodServiceMap.get("goodServiceA").getHelloMessage();
    String messageB = goodServiceMap.get("goodServiceB").getHelloMessage();
    // print messages
}

Important note: Spring will autowire all candidate beans into a collection regardless of qualifiers or parameter names, as long as they are active. It ignores beans annotated with @Profile that do not match the current profile. Similarly, Spring includes beans annotated with @Conditional only if the conditions are met (more details in the next section).

7. Advanced Control

Spring allows us to have additional control over which candidates are selected for autowiring.

For more precise conditions on which bean becomes a candidate for autowiring, we can annotate them with @Conditional. It should have a parameter with a class that implements the Condition (it is a functional interface). For example, here is the Condition that checks if the operating system is Windows:

public class OnWindowsCondition implements Condition {
    @Override 
    public boolean matches(ConditionContext context, AnnotatedTypeMetadata metadata) {
        return context.getEnvironment().getProperty("os.name").toLowerCase().contains("windows");
    } 
}

Here is how we annotate our component with @Conditional:

@Component 
@Conditional(OnWindowsCondition.class) 
public class WindowsFileService implements FileService {
    @Override 
    public void readFile() {
        // implementation
    } 
}

In this example, WindowsFileService will become a candidate for autowiring only if matches() in OnWindowsCondition returns true.

We should be careful with @Conditional annotations for non-collection autowiring since multiple beans that match the condition will cause an error.

Also, we will get an error if no candidates are found. Because of this, when integrating @Conditional with autowiring, it makes sense to set an optional injection. This ensures that the application can still proceed without throwing an error if it does not find a suitable bean. There are two approaches to achieve this:

@Autowired(required = false)
private GoodService goodService; // not very safe, we should check this for null
@Autowired
private Optional<GoodService> goodService; // safer way

When we autowire into the collection, we can specify the order of the components by using @Order annotation:

@Order(2) 
public class GoodServiceA implements GoodService { 
    // implementation
 } 
@Order(1) 
public class GoodServiceB implements GoodService {
    // implementation 
}

If we try to autowire List<GoodService>, GoodServiceB will be placed before GoodServiceA. Important note: @Order doesn’t work when we are autowiring into the Set.

8. Conclusion

In this article, we discussed the tools Spring provides for the management of the multiple implementations of the interface during autowiring. These tools and techniques enable a more dynamic approach when designing a Spring Boot application. However, like with every instrument, we should ensure their necessity, as careless use can introduce bugs and complicate long-term support.

As always, the examples are available over on GitHub.

       

Removing Bracket Characters in a Java String

$
0
0
start here featured

1. Overview

When working with String values in Java, there are times when we need to clean up our data by removing specific characters. One common scenario is removing bracket characters. With the right approach, removing these characters can be straightforward.

In this tutorial, we’ll explore how to achieve this.

2. Introduction to the Problem

First, let’s make the requirement clear: what are bracket characters?

If we focus on ASCII characters, there are three pairs of bracket characters:

  • Parentheses/round brackets – ‘(‘ and ‘)’
  • Square brackets – ‘[‘ and ‘]’
  • Curly brackets – ‘{‘ and ‘}’

Apart from these three pairs, we often use ‘<‘ and ‘>’ as angle brackets in practice, such as in XML tags.

However, ‘<‘ and ‘>’  actually aren’t bracket characters. They’re defined as “less than” and “greater than” characters. But we’ll treat them as the fourth pair of bracket characters, as they’re often used as angle brackets.

Therefore, we aim to remove the four pairs of characters from a given String.

Let’s say we have a String value:

 static final String INPUT = "This (is) <a> [nice] {string}!";

As we can see, the INPUT String contains all eight bracket characters. After removing all bracket characters, we expect to get this result:

"This is a nice string!"

Of course, our input may contain Unicode characters. This tutorial also addresses the Unicode String scenario.

Next, let’s take INPUT as an example and see how to remove characters.

3. Using the StringUtils.replaceChars() Method

Apache Commons Lang 3 is a widely used library. The StringUtils class from this library provides a rich set of helper methods that allow us to manipulate strings conveniently.

For example, we can solve our problem using the replaceChars() method. This method allows us to replace multiple characters in one go. Further, we can employ it to delete characters:

String result = StringUtils.replaceChars(INPUT, "(){}[]<>", null);
assertEquals("This is a nice string!", result);

As the code above shows, we pass the String “(){}[]<>” as the searchChars argument and a null value as the replaceChars argument. This is because when replaceChars is null, replaceChars() deletes all characters contained in searchChars from the input String. Therefore, replaceChars() does the job.

4. Using the Regex-Based replaceAll() Method

Regular expressions (regex) are powerful tools for matching patterns within strings, allowing us to efficiently search, replace, and manipulate text based on defined criteria.

Next, let’s see how to remove bracket characters using the regex-based replaceAll() method from the Java standard library:

String regex = "[(){}<>\\[\\]]";
String result = INPUT.replaceAll(regex, "");
assertEquals("This is a nice string!", result);

The regex pattern looks pretty straightforward. It has only one character class, which includes the bracket characters.

Sharp eyes might have noticed that we only escaped the ‘[‘ and ‘]‘ characters in the character class while leaving ‘(){}<>‘ as they are. This is because regex matches characters in a character class literally, meaning all characters within a character class lose their special meanings and don’t need to be escaped.

However, since ‘[‘ and ‘]‘ are used to define the character class itself, we must escape them to distinguish between their roles as delimiters of the character class and as literal characters within the class.

5. Removing Unicode Bracket Characters

We’ve seen how to delete bracket characters from a String input that includes only ASCII characters. Next, let’s see how to remove Unicode bracket characters.

Let’s say we have another String input containing Unicode and ASCII bracket characters:

static final String INPUT_WITH_UNICODE = "⟨T⟩❰h❱「i」⦇s⦈ (is) <a> [nice] {string}!";

As the example shows, apart from ASCII bracket characters “(){}[]<>” it contains the following Unicode characters:

  • and – mathematical angle brackets U27E8 and U27E9
  • and – heavy angle brackets U2770 and U2771
  • and – corner brackets U300C and U300D
  • and – image brackets U2987 and U2988

There are still many more Unicode bracket characters that our example doesn’r cover. Fortunately, regex supports Unicode category matching.

We can use \p{Ps} and \p{Pe} to match all opening and closing bracket characters.

Next, let’s see if these categories can tell replaceAll() to delete all bracket characters:

String regex = "\\p{Ps}|\\p{Pe}";
 
String result = INPUT.replaceAll(regex, "");
assertEquals("This is <a> nice string!", result);
 
String resultWithUnicode = INPUT_WITH_UNICODE.replaceAll(regex, "");
assertEquals("This is <a> nice string!", resultWithUnicode);

The test above shows most character brackets have been removed. However, the ASCII characters ‘<‘ and ‘>‘ remain. This is because ‘<‘ and ‘>‘ are defined as “less than” and “greater than” rather than angle brackets. That is to say, they don’t belong to the bracket category and aren’t matched by the regex.

If we want to remove ‘<‘ and ‘>‘, we can add the character class “[<>]” to the pattern:

String regex = "\\p{Ps}|\\p{Pe}|[<>]";
 
String result = INPUT.replaceAll(regex, "");
assertEquals("This is a nice string!", result);
 
String resultWithUnicode = INPUT_WITH_UNICODE.replaceAll(regex, "");
assertEquals("This is a nice string!", resultWithUnicode);

As we can see, this time, we got the expected result.

6. Conclusion

In this article, we’ve explored different ways to remove bracket characters from an input String and discussed how to remove Unicode brackets through an example.

As always, the complete source code for the examples is available over on GitHub.

       

Effective Scaling of Hot Application Instances with OpenJDK CRaC Help in Containers

$
0
0
Contact Us Featured

1. Introduction

In this tutorial, we’ll learn about Coordinated Restore at Checkpoint (CRaC), an OpenJDK project that allows us to start Java programs with a shorter time to the first transaction. Further, we’ll understand how Alpaquita Containers can make it easy for us to achieve CRaC in a Spring Boot application.

2. How Does OpenJDK CRaC Approach the Slow Warmup Problem in Java?

Java applications historically have received their fair share of criticism for slow startup and longer warmup time, the time they need to reach stable peak performance. Moreover, they consume more computing resources during warm-up than they need during stable operation.

This behavior can largely be attributed to how the HotSpot Java Virtual Machine (JVM) works fundamentally. When an application starts, JVM looks for hotspots in the code and compiles them for better performance. But, this requires time and computing resources to achieve:

 

Moreover, this has to be repeated for every instance of the application. The problem is more exacerbated in a cloud-native architecture like microservices and serverless. Here, we need the warm-up time to be as low as possible with a fairly stable resource consumption.

What if we can run an application to its peak performance and checkpoint that state? Then, we can use this checkpoint to start multiple instances of the application without having to spend that much time on the warm-up. This is fundamentally what the OpenJDK CRaC API promises us:

 

CRaC is based on Checkpoint & Restore In Userspace (CRIU), a project to implement checkpoint and restore functionality for Linux. CRIU allows freezing a container or an individual application and restoring it from the saved checkpoint files.

However, CRaC takes the generic approach of CRIU and adds several enhancements and adjustments to make it suitable for Java applications. For instance, CRaC imposes certain restrictions on the state of the application to guarantee the consistency and safety of the checkpoint.

3. Challenges with CRaC Adoption

CRaC opens new opportunities for Java-based applications to be more efficient in the cloud environment. Here, Spring is one of the popular frameworks to develop Java-based applications. With the release of Spring Boot 3.2, we now have initial support for CRaC in the Spring framework.

But, CRaC is not as portable a solution as it may seem. As we already discussed, CRaC works only on Linux as CRIU is a Linux-specific feature. On other operating systems, CRaC has a no-op implementation for creating and loading snapshots.

Moreover, CRaC requires all files and network connections to be closed before taking a snapshot. These files and network connections have to be re-opened after restoring the checkpoint. This requires support in the Java runtime and the framework.

So, it’s not only necessary that we’ve support from Spring, we also need a CRaC-enabled version of the JDK, like Liberica JDK provided by BellSoft. Moreover, we need to run our Spring application on a Linux distribution, for instance, Alpaquita Linux by BellSoft.

So, if we can package our application with a CRaC-enabled JDK running on a Linux-like environment as a portable container, it makes the solution quite portable and plug-and-play. This is quite the promise that BellSoft delivers for modern Java applications!

4. CRaC with Alpaquita Containers

BellSoft is an OpenJDK vendor that provides end-to-end solutions for cloud-native Java applications. As part of this, it offers a suite of containers highly optimized for running Java applications. They package Alpaquita Linux and Liberica JDK, both of which are BellSoft offerings.

Alpaquita Linux is the only Linux distribution purpose-built for Java and optimized for the deployment of cloud-native applications. It features better performance through kernel optimizations, memory management, and optimized mallocs. It has a base image size of just 3.28 MB!

Liberica JDK is an open-source Java runtime for cloud-native Java deployments. With the support for the widest range of architectures and operating systems, it’s truly a unified Java runtime. Apart from being secure and compliant, it helps in building cost and time-efficient containers.

BellSoft manages several public images, offering various combinations of JDK type (jre, jdk, or jdk-all), Java version (includes support for the latest LTS release, Java 21), and libc type (glibc or musl). Now, BellSoft also offers images that provide CRaC and CDS (Class Data Sharing).

These ready-to-use images allow us to integrate CRaC in a Spring Boot application seamlessly. This is available for JDK 17 and 21 with x86_64 architecture as of now. BellSoft claims that Alpaquita Containers with CRaC provide up to 164 times faster startup time and 1.1 times smaller images.

The reduction in image size is largely attributed to the decrease in the Resident Set Size (RSS), the portion of memory occupied by a process that is held in the main memory (RAM). One of the key factors for this is that Liberica JDK with CRaC performs full garbage collection before the checkpoint.

5. Getting Things to Work!

BellSsoft’s offerings are a great fit for Spring Boot-based Java applications. Spring recommends using BellSsoft Liberica JDK and it’s the default Java runtime in Spring Boot. For our tutorial, we’ll be using a Spring Boot application and will perform CRaC with an Alpaquita Container.

5.1. Preparing the Application

For this tutorial, we’ll create a simple Spring Boot application to explore CRaC. We’ll just reuse the application we created for our last tutorial. We’ll be using Java 21 and Spring Boot 3.2.5 for this tutorial. CRaC works well under this combination.

However, to be able to use CRaC, we need to add the crac package available at the Maven central repository as a dependency in our Spring Boot application:

implementation("org.crac:crac:1.4.0")

Now, we’ve to build the application using Gradle to generate an executable JAR in the directory “./build/libs“:

$ ./gradlew clean build

Now that we’ve created a simple Spring Boot application with CRaC dependency, we need to run it using a JDK that supports CRaC. For this, we’ll use an Alpaquita Container that supports CRaC. BellSoft manages multiple images on its Docker Hub repository.

Thankfully, all the images that support CRaC have the tag ‘crac‘. We’ll pull one such image on our machine for this tutorial:

$ docker pull bellsoft/liberica-runtime-container:jdk-21-crac-slim-glibc

Here, “jdk-21-crac-slim-glibc” is the tag of the image. With this, we are all set to experiment with checkpoint and restore features of CRaC. We’ll see how Alpaquita Containers make this effortless and portable.

5.2. Starting the Application

Let’s first create a directory called “checkpoint” inside “./build/libs” to hold the application dump. Now we’ll use the Alpaquita Container image that we had pulled previously to run the application JAR that we created in the previous subsection:

$ docker run -p 8080:8080 \
  --rm --privileged \
  -v $(pwd)/build/libs:/crac/ \
  -w /crac \
  -n fibonacci-crac \
  bellsoft/liberica-runtime-container:jdk-21-crac-slim-glibc \
  java -Xmx512m -XX:CRaCCheckpointTo=/crac/checkpoint \
  -jar spring-bellsoft-0.0.1-SNAPSHOT.jar

Let’s spend some time to understand this command. Here, we’ve mapped container port 8080 to the host machine port 8080. We’ve also used the “privileged” mode as this is necessary for the underlying CRIU to work properly.

Further, we’ve mapped the directory where our application JAR is present as a volume within the container and used that as the working directory. Lastly, we’ve provided the Java command to run the JAR with some necessary parameters.

If everything goes smoothly, we should be able to check the container log and verify that our application has indeed started:

2024-04-22T15:27:39.730Z  INFO 129 --- [main] 
  com.baeldung.demo.Application : Started Application in 3.203 seconds (process running for 4.727)

Now, we should perform some requests to the application so that the JVM can get the compiled hot code for better performance. Although, for our simple application, these effects would be negligible.

5.3. Performing the Checkpoint

We are ready to perform the checkpoint of the application at this moment. But before we do that, let’s check the size of RSS to compare this with what we see after the restore. We would require the Process ID (PID) of the application to do so:

$ docker exec fibonacci-crac ps -a | pgrep spring-bellsoft

Once we’ve got the PID, we can use the ‘pmap‘ command to find the size of the RSS:

$ docker exec fibonacci-crac pmap -x <PID> | tail -1
total            4845016  134128  118736       0

The output of this command shows the size of the RSS in kilobytes, the second value here (134128).

Now, let’s perform the checkpoint of the application at this state. We can do this by using the ‘jcmd‘ command that sends a command to the JVM to perform the checkpoint:

$ docker exec fibonacci-crac jcmd <PID> JDK.checkpoint

Please note that ‘fibonacci-crac‘ is the container’s name we used while starting it. As a result of this command, the Java instance is dumped and the container is stopped. The application dump consisted of multiple files at the location that we mentioned:

$ ls
core-129.img  core-139.img  core-149.img  core-198.img   pagemap-129.img
core-130.img  core-140.img  core-150.img  core-199.img   pages-1.img
core-131.img  core-141.img  core-151.img  core-200.img   pstree.img
core-132.img  core-142.img  core-152.img  dump4.log      seccomp.img
core-133.img  core-143.img  core-154.img  fdinfo-2.img   stats-dump
core-134.img  core-144.img  core-155.img  files.img      timens-0.img
core-135.img  core-145.img  core-156.img  fs-129.img
core-136.img  core-146.img  core-158.img  ids-129.img
core-137.img  core-147.img  core-159.img  inventory.img
core-138.img  core-148.img  core-160.img  mm-129.img

This dump includes the exact state of the running Java application and the information about the heap, JIT-compiled code, etc. But, as we discussed earlier, the Liberica JDK we are using here performs a full garbage collection just before the checkpoint.

5.4. Starting the Application from the Dump

Now, what is left for us to do is to use the application dump we created earlier to restore an instance of our application. This is as easy as starting the application regularly:

docker run -p 8080:8080 \
  --rm --privileged \
  -v $(pwd)/build/libs:/crac/ \
  -w /storage \
  -n fibonacci-crac-from-checkpoint \
  bellsoft/liberica-runtime-container:jdk-21-crac-slim-glibc \
  java -XX:CRaCRestoreFrom=/crac/checkpoint
Like before, if everything goes smoothly, we should be able to verify this from the application log:
2024-04-22T16:02:21.582Z  INFO 129 --- [Attach Listener] 
  o.s.c.support.DefaultLifecycleProcessor : 
  Spring-managed lifecycle restart completed (restored JVM running for 1494 ms)

As we can see, the application has been restored to the state at which this checkpoint was created. We can notice the restore happening much faster, however, it’s less noticeable for this simple application.

5.5. Results Overview

As we did before taking the checkpoint, let’s again check the size of the RSS after the restore and preferably after a few requests to the application:

$ docker exec fibonacci-crac-from-checkpoint pmap -x 129 | tail -1
total            5044580  120261   62728       0

As we can see, the value (120261) is less than the one we noticed before the checkpoint. Although, this is less pronounced for the nature of the application we are using for the tutorial.

We may also notice that the RSS just after the restore increases after the first request and then reaches some steady state. However, this value is still lower than the RSS we observed before taking the application dump.

This reduction in RSS is largely attributed to Liberica JDK with CRaC performing full garbage collection before the checkpoint. On restore, the HotSpot virtual machine returns part of the native memory to the OS, which includes pages freed during GC.

6. CRaC vs. GraalVM Native Image

The problems we discussed with Java have been there since its inception. But, only recently we’ve the stringent requirements to be as cost-efficient as possible on the cloud. One of the key enablers for this is Scale-to-Zero, meaning automatically scaling all resources to zero when not in use.

Of course, this requires our applications to be blazing fast to come to life and start responding to requests. So, solutions before CRaC were also proposed in response to this need. Of these, GraalVM Native Image addressed wider objectives including slow start-up time.

Hence, it’s worth comparing CRaC with GraalVM Native Image. GraalVM Native Image is an Ahead-of-Time (AOT) compiler that creates native executables for Linux, Windows, and macOS. BellSoft provides a Liberica Native Image Kit to generate native images based on GraalVM:

Like CRaC, GraalVM Native Image can help reduce start-up time significantly. But GraalVM fares better in terms of lesser memory usage, better security, and lower application file size. Moreover, we can generate GraalVM Native Image for multiple operating systems.

However, with GraalVM, we can not use some Java features like loading arbitrary classes at runtime. Moreover, many observability and testing frameworks do not support GraalVM as it does not allow for dynamic code generation at runtime and we can not run Java agents.

So which one is better, CRaC or GraalVM native Image? Well, both technologies have their own space. However, GraalVM Native Image solves the same problems as CRaC but with more constraints and a potentially more expensive troubleshooting experience.

7. Conclusion

In this tutorial, we understood what CRaC is and how we can use this to our advantage in a cloud-native environment. Further, we reviewed BellSoft’s offerings like Alpaquita Containers that support CRaC. Lastly, we developed a Spring Boot application and saw CRaC in action.

       

Introduction to Apache Nutch

$
0
0

1. Introduction

In this tutorial, we’re going to have a look at Apache Nutch. We’ll see what it is, what we can do with it, and how to use it.

Apache Nutch is a ready-to-go web crawler that we can use out of the box, and that integrates with other tools from the Apache ecosystem, such as Apache Hadoop and Apache Solr.

2. Setting up Nutch

Before we can start using Nutch, we’ll need to download the latest version. We can find this at https://nutch.apache.org/download/ and just download the latest binary version, which at the time of writing was 1.20. Once downloaded, we need to unzip it into an appropriate directory.

Once unzipped, we need to configure the user agent that Nutch will be using when it accesses other sites. We do this by editing conf/nutch-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>http.agent.name</name>
        <value>MyNutchCrawler</value>
    </property>
</configuration>

This configures Nutch so that all HTTP requests made to retrieve files use a value of MyNutchCrawler for the user agent. Obviously, the exact value to use here will depend on the crawler setup that we’re configuring.

3. Crawling Our First Site

Now that we have Nutch installed, we’re ready to crawl our first URL. The crawling process in Nutch consists of several stages, which allows us a lot of flexibility when necessary.

We manage the entire process using the bin/nutch command-line tool. This tool allows us to execute various parts of the Nutch suite.

3.1. Injecting Seed URLs

Before we can crawl any URLs, we first need to seed some base URLs. We do this by creating some text files containing the URLs and then injecting them into our crawl database using the inject command:

$ mkdir -p urls
$ echo https://www.baeldung.com > urls/seed.txt
$ bin/nutch inject crawl/crawldb urls

This injects the URL https://www.baeldung.com into our crawl database, which will be crawl/crawldb. Because this is our first URL, this will also create the crawl database from scratch.

Let’s check the URLs that are in our database to make sure:

$ bin/nutch readdb crawl/crawldb -dump crawl/log
$ cat crawl/log/part-r-00000
https://www.baeldung.com/	Version: 7
Status: 1 (db_unfetched)
Fetch time: Sat May 18 09:31:09 BST 2024
Modified time: Thu Jan 01 01:00:00 GMT 1970
Retries since fetch: 0
Retry interval: 2592000 seconds (30 days)
Score: 1.0
Signature: null
Metadata:

Here, we see that we’ve got a single URL and that it’s never been fetched.

3.2. Generating Crawl Segments

The next step in our crawl process is to generate a segment into which we’ll create the crawl data. This is done using the generate command, telling it where our crawl database is and where to create the segments:

$ bin/nutch generate crawl/crawldb crawl/segments
.....
2024-05-18 09:48:00,281 INFO o.a.n.c.Generator [main] Generator: Partitioning selected urls for politeness.
2024-05-18 09:48:01,288 INFO o.a.n.c.Generator [main] Generator: segment: crawl/segments/20240518100617
2024-05-18 09:48:02,645 INFO o.a.n.c.Generator [main] Generator: finished, elapsed: 3890 ms

In this case, we’ve just generated a new segment located in crawl/segments/20240518100617. The segment name is always the current timestamp, meaning they’re always unique and incrementing.

By default, this will generate segment data for every URL that’s ready to fetch, including every URL we’ve never fetched or where the fetch interval has expired.

If desired, we can instead generate data for a limited set of URLs using the -topN parameter. This will then restrict the crawl phase to only fetching that many URLs:

$ bin/nutch generate crawl/crawldb crawl/segments -topN 20

At this point, let’s query the segment and see what it looks like:

$ bin/nutch readseg -list crawl/segments/20240518100617
NAME		GENERATED	FETCHER START		FETCHER END		FETCHED	PARSED
20240518100617	1		?		?	?	?

This tells us that we’ve got one URL but that nothing has yet been fetched.

3.3. Fetching and Parsing URLs

Once we’ve generated our crawl segment, we’re ready to fetch the URLs. We do this using the fetch command, pointing it toward the segment that it needs to fetch:

$ bin/nutch fetch crawl/segments/20240518100617

This command will start up the fetcher, running a number of concurrent threads to fetch all of our outstanding URLs.

Let’s query the segment again to see what’s changed:

$ bin/nutch readseg -list crawl/segments/20240518100617
NAME		GENERATED	FETCHER START		FETCHER END		FETCHED	PARSED
20240518100617	1		2024-05-18T10:11:16	2024-05-18T10:11:16	1	?

Now, we can see that we’ve actually fetched our URL, but we’ve not yet parsed it. We do this with the parse command, again pointing it towards the segment that’s just been fetched:

$ bin/nutch parse crawl/segments/20240518100617

Once it’s finished, we’ll query our segment and see that the URLs have now been parsed:

$ bin/nutch readseg -list crawl/segments/20240518100617
NAME		GENERATED	FETCHER START		FETCHER END		FETCHED	PARSED
20240518100617	1		2024-05-18T10:11:16	2024-05-18T10:11:16	1	1

3.4. Updating the Crawl Database

The final step in our crawl process is to update our crawl database. Up to this point, we’ve fetched our set of URLs and parsed them but haven’t done anything with that data.

Updating our crawl database will merge our parsed URLs into our database, including the actual page contents, but it will also inject any discovered URLs so that the next crawl round will use them. We achieve this with the updatedb command, pointing to both our crawl database and the segment that we wish to update it from:

$ bin/nutch updatedb crawl/crawldb crawl/segments/20240518100617

After we’ve done this, our database is updated with all of our crawl results.

Let’s check it again to see how it’s looking:

$ bin/nutch readdb crawl/crawldb -stats
2024-05-18 10:21:42,675 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics start: crawl/crawldb
2024-05-18 10:21:44,344 INFO o.a.n.c.CrawlDbReader [main] Statistics for CrawlDb: crawl/crawldb
2024-05-18 10:21:44,344 INFO o.a.n.c.CrawlDbReader [main] TOTAL urls:	59
.....
2024-05-18 10:21:44,352 INFO o.a.n.c.CrawlDbReader [main] status 1 (db_unfetched):	58
2024-05-18 10:21:44,352 INFO o.a.n.c.CrawlDbReader [main] status 2 (db_fetched):	1
2024-05-18 10:21:44,352 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics: done

Here, we see that we now have 59 URLs in our database, of which we’ve fetched one, with another 58 we haven’t yet fetched.

In addition to updating the crawl database, we can also maintain an inverted link database.

Our crawl data so far includes all of the pages that we’ve crawled, and for each of those, a set of “outlinks” – pages that each of these links out to.

In addition to this, we can also generate a database of “inlinks” – for each of our crawled pages, the set of pages that link to it. We use the invertlinks command for this, pointing to our link database and the segment that we wish to include:

$ bin/nutch invertlinks crawl/linkdb crawl/segments/20240518100617

Note that this database of “inlinks” only includes cross-domain links, so it will only contain any links from one page to another that come from a different domain.

3.6. Crawling Again

Now that we’ve crawled one page and discovered 58 new URLs, we can run the entire process again and crawl all of these new pages. We do this by repeating the process that we did before, starting with generating a new segment, and working all the way through to updating our crawl database with it:

$ bin/nutch generate crawl/crawldb crawl/segments
$ bin/nutch fetch crawl/segments/20240518102556
$ bin/nutch parse crawl/segments/20240518102556
$ bin/nutch updatedb crawl/crawldb crawl/segments/20240518102556
$ bin/nutch invertlinks crawl/linkdb crawl/segments/20240518102556

Unsurprisingly, this time the fetch process took a lot longer. This is because we’re now fetching a lot more URLs than before.

If we again query the crawl database, we’ll see that we have a lot more data fetched now:

$ bin/nutch readdb crawl/crawldb -stats
2024-05-18 10:33:15,671 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics start: crawl/crawldb
2024-05-18 10:33:17,344 INFO o.a.n.c.CrawlDbReader [main] Statistics for CrawlDb: crawl/crawldb
2024-05-18 10:33:17,344 INFO o.a.n.c.CrawlDbReader [main] TOTAL urls:	900
.....
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] status 1 (db_unfetched):	841
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] status 2 (db_fetched):	52
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] status 3 (db_gone):	1
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] status 4 (db_redir_temp):	1
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] status 5 (db_redir_perm):	5
2024-05-18 10:33:17,351 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics: done

We now have 900 total URLs, of which we’ve fetched 52. The reason that only 52 URLs were processed when we had 59 in our list before is that not all of the URLs in our list could be fetched and parsed. Some of them were images, or JSON files, or other resources that Nutch is unable to parse out of the box.

We can now repeat this process as much as we wish, on whatever cadence we wish.

4. Restricting Domains

One issue that we have with the crawler so far is that it will follow any URLs, regardless of where they go. For example, if we dump the list of URLs from our crawl database – that is, URLs that either we have fetched or else that we’re going to on the next round – then we’ll see there are 60 different hosts, including:

  • www.baeldung.com
  • courses.baeldung.com
  • github.com
  • www.linkedin.com

Depending on our desired result, this might not be good. If we want a generic web crawler that will scan the entire web, this is ideal. If we want to only scan a single site or a set of sites, then this is problematic.

Usefully, Nutch has a built-in mechanism for exactly this case. We can configure a set of regular expressions to either include or exclude URLs. These are found in the conf/regex-urlfilter.txt file.

Every non-comment line in this file is a regular expression prefixed with either a “-” (meaning exclude) or a “+” (meaning include). If we get to the end of the file without a match, then the URL is excluded.

We’ll see that the very last line is currently “+.“. This will include every single URL that none of the earlier rules excluded.

If we change this line to instead read “+^https?://www\.baeldung\.com“, then this will now only match URLs that start with either http://www.baeldung.com or https://www.baeldung.com.

Note that we can’t retroactively apply these rules. Only crawls that happen after they’re configured are affected. However, if we delete all of our crawl data and start again with these rules in place, after two passes, we end up with:

$ bin/nutch readdb crawl/crawldb -stats
2024-05-18 17:57:34,921 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics start: crawl/crawldb
2024-05-18 17:57:36,595 INFO o.a.n.c.CrawlDbReader [main] Statistics for CrawlDb: crawl/crawldb
2024-05-18 17:57:36,596 INFO o.a.n.c.CrawlDbReader [main] TOTAL urls:	670
.....
2024-05-18 17:57:36,607 INFO o.a.n.c.CrawlDbReader [main] status 1 (db_unfetched):	613
2024-05-18 17:57:36,607 INFO o.a.n.c.CrawlDbReader [main] status 2 (db_fetched):	51
2024-05-18 17:57:36,607 INFO o.a.n.c.CrawlDbReader [main] status 4 (db_redir_temp):	1
2024-05-18 17:57:36,607 INFO o.a.n.c.CrawlDbReader [main] status 5 (db_redir_perm):	5
2024-05-18 17:57:36,607 INFO o.a.n.c.CrawlDbReader [main] CrawlDb statistics: done

We get a total of 670 URLs instead of 900. So, we can see that, without this exclusion rule, we’d have had an extra 230 URLs that were outside the site we wanted to crawl.

5. Indexing with Solr

Once we’ve got our crawl data, we need to be able to use it. The obvious approach is to query it with a search engine, and Nutch comes with standard support for integrating with Apache Solr.

First, we need a Solr server to use. If we don’t already have one installed, the Solr quickstart guide will show us how to install one.

Once we’ve got this, we need to create a new Solr collection to index our crawled sites into:

# From the Solr install
$ bin/solr create -c nutch

Once we’ve done this, we need to configure Nutch to know about this. We do this by adding some configuration to our conf/nutch-site.xml file within the Nutch install:

<property>
   <name>storage.data.store.class</name>
   <value>org.apache.gora.solr.store.SolrStore</value>
</property>
<property>
   <name>solr.server.url</name>
   <value>http://localhost:8983/solr/nutch</value>
</property>

The storage.data.store.class setting configures the storage mechanism to use, and the solr.server.url setting configures the URL of the Solr collection we want to index our crawl data into.

At this point, we can index our crawl data using the index command:

# From the Nutch install
bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20240518100617 -filter -normalize -deleteGone
2024-05-19 11:12:12,502 INFO o.a.n.i.s.SolrIndexWriter [pool-5-thread-1] Indexing 1/1 documents
2024-05-19 11:12:12,502 INFO o.a.n.i.s.SolrIndexWriter [pool-5-thread-1] Deleting 0 documents
2024-05-19 11:12:13,730 INFO o.a.n.i.IndexingJob [main] Indexer: number of documents indexed, deleted, or skipped:
2024-05-19 11:12:13,732 INFO o.a.n.i.IndexingJob [main] Indexer: 1 indexed (add/update)
2024-05-19 11:12:13,732 INFO o.a.n.i.IndexingJob [main] Indexer: finished, elapsed: 2716 ms

We need to run this every time we do a crawl, on the segment that we’ve just generated for that crawl.

Once we’ve done this, we can now use Solr to query our index data:

 

Here, we can see that searching our crawl data for any pages with the title containing “Spring” has returned 19 documents.

6. Automating the Crawl Process

So far, we’ve successfully crawled our site. However, many steps were needed to achieve this result.

Thankfully, Nutch comes with a script that does all of this for us automatically – bin/crawl. We can use this to perform all of our steps in the correct order, get the segment IDs correct every time, and run the process for as many rounds as we want. This can also include injecting seed URLs at the start and sending the results to Solr after each round.

For example, to run the entire process that we’ve just described for two rounds, we can execute:

$ ./bin/crawl -i -s urls crawl 2

Let’s break down the command:

  • “-i” tells it to index the crawled data in our configured search index.
  • “-s urls” tells it where to find our seed URLs.
  • “crawl” tells it where to store our crawl data.
  • “2” tells it the number of crawl rounds to run.

If we run this on a clean Nutch install – having first configured our conf/nutch-site.xml and conf/regex-urlfilter.txt files – then the end result will be exactly the same as if we ran all of our previous steps by hand.

7. Conclusion

We’ve seen here an introduction to Nutch, how to set it up and crawl our first website, and how to index the data into Solr so that we can search it. However, this only scratches the surface of what we can achieve with Nutch, so why not explore more for yourself?

       

Return Auto Generated ID From Insert With MyBatis and Spring

$
0
0

1. Overview

MyBatis is an open-source Java persistence framework that can be used as an alternative to JDBC and Hibernate. It helps us reduce code and simplifies the retrieval of the result, allowing us to focus solely on writing custom SQL queries or stored procedures.

In this tutorial, we’ll learn how to return an auto-generated ID when inserting data using MyBatis and Spring Boot.

2. Dependency Setup

Before we start, let’s add the mybatis-spring-boot-starter dependency in the pom.xml:

<dependency>
    <groupId>org.mybatis.spring.boot</groupId>
    <artifactId>mybatis-spring-boot-starter</artifactId>
    <version>3.0.3</version>
</dependency>

3. Example Setup

Let’s start by creating a simple example we’ll use throughout the article.

3.1. Defining Entity

Firstly, let’s create a simple entity class representing a car:

public class Car {
    private Long id;
    private String model;
    // getters and setters
}

Secondly, let’s define an SQL statement that creates a table and place it in the car-schema.sql file:

CREATE TABLE IF NOT EXISTS CAR
(
    ID    INTEGER PRIMARY KEY AUTO_INCREMENT,
    MODEL VARCHAR(100) NOT NULL
);

3.2. Defining DataSource

Next, let’s specify a data source. We’ll use the H2 embedded database:

@Bean
public DataSource dataSource() {
    EmbeddedDatabaseBuilder builder = new EmbeddedDatabaseBuilder();
    return builder
      .setType(EmbeddedDatabaseType.H2)
      .setName("testdb")
      .addScript("car-schema.sql")
      .build();
}
@Bean
public SqlSessionFactory sqlSessionFactory() throws Exception {
    SqlSessionFactoryBean factoryBean = new SqlSessionFactoryBean();
    factoryBean.setDataSource(dataSource());
    return factoryBean.getObject();
}

Now that we’re all set up, let’s see how to retrieve auto-generated identity using annotation-based and XML-based approaches.

4. Using Annotations

Let’s define the Mapper, which represents an interface MyBatis uses to bind methods to the corresponding SQL statements:

@Mapper
public interface CarMapper {
    // ...
}

Next, let’s add an insert statement:

@Insert("INSERT INTO CAR(MODEL) values (#{model})")
void save(Car car);

Instinctively, we may be tempted just to return Long and expect MyBatis to return an ID of the created entity. However, this isn’t accurate. If we do so, it returns 1, indicating the insert statement was successful.

To retrieve the generated ID, we can use either @Options or @SelectKey annotations.

4.1. The @Options Annotation

One way we can extend our insert statement is by using the @Options annotation:

@Insert("INSERT INTO CAR(MODEL) values (#{model})")
@Options(useGeneratedKeys = true, keyColumn = "ID", keyProperty = "id")
void saveUsingOptions(Car car);

Here, we set three properties:

  • useGeneratedKeys – indicates whether we want to use the generated keys feature
  • keyColumn – sets the name of the column that holds a key
  • keyProperty – represents the name of the field that will hold a key value

Additionally, we can specify multiple key properties by separating them with commas.

In the background, MyBatis uses reflection to map the value from the ID column into the id field of the Car object.

Next, let’s create a test to confirm everything is working as expected:

@Test
void givenCar_whenSaveUsingOptions_thenReturnId() {
    Car car = new Car();
    car.setModel("BMW");
    carMapper.saveUsingOptions(car);
    assertNotNull(car.getId());
}

4.2. The @SelectKey Annotation

Another way to return an ID is to use the @SelectKey annotation. This annotation can be useful when we want to use sequences or identity functions to retrieve the identifier.

Moreover, if we decorate our method with the @SelectKey annotation, MyBatis ignores annotations such as @Options.

Let’s create a new method inside CarMapper to retrieve an identity value after an insert:

@Insert("INSERT INTO CAR(MODEL) values (#{model})")
@SelectKey(statement = "CALL IDENTITY()", before = false, keyColumn = "ID", keyProperty = "id", resultType = Long.class)
void saveUsingSelectKey(Car car);

Let’s examine the properties we used:

  • statement – holds a statement that will be executed after the insert statement
  • before – indicates whether the statement should execute before or after the insert
  • keyColumn – holds the name of the column that represents a key
  • keyProperty – specifies the name of the field that will hold the value the statement returns
  • resultType – represents the type of the keyProperty

Furthermore, we should note the IDENTITY() function was removed from the H2 database. More details can be found here.

To be able to execute CALL IDENTITY() on the H2 database, we need to set the mode to LEGACY:

"testdb;MODE=LEGACY"

Let’s test our method to confirm it works correctly:

@Test
void givenCar_whenSaveUsingSelectKey_thenReturnId() {
    Car car = new Car();
    car.setModel("BMW");
    carMapper.saveUsingSelectKey(car);
    assertNotNull(car.getId());
}

5. Using XML

Let’s see how to achieve the same functionality, but this time, we’ll use the XML-based approach.

First, let’s define the CarXmlMapper interface:

@Mapper
public interface CarXmlMapper {
     // ...
}

Unlike the annotation-based approach, we won’t write SQL statements directly in the Mapper interface. Instead, we’ll define the XML mapper file and put all the queries in it:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" 
  "http://mybatis.org/dtd/mybatis-3-mapper.dtd" >
<mapper namespace="com.baeldung.mybatis.generatedid.CarXmlMapper">
</mapper>

Additionally, in the namespace property, we specify the fully-qualified name of the CarXmlMapper interface.

5.1. The UseGeneratedKeys Attribute

Moving forward, let’s define a method inside the CarXmlMapper interface:

void saveUsingOptions(Car car);

Additionally, let’s use the XML mapper to define the insert statement and map it to the saveUsingOptions() method we placed inside the CarXmlMapper interface:

<insert id="saveUsingOptions" parameterType="com.baeldung.mybatis.generatedid.Car"
  useGeneratedKeys="true" keyColumn="ID" keyProperty="id">
    INSERT INTO CAR(MODEL)
    VALUES (#{model});
</insert>

Let’s explore the attributes we used:

  • id – binds the query to the specific method in the CarXmlMapper class
  • parameterType – the type of the parameter of the saveUsingOptions() method
  • useGeneratedKeys – indicates we want to use the generated ID feature
  • keyColumn – specifies the name of the column that represents a key
  • keyProperty – specifies the name of the field of the Car object that will hold the key

In addition, let’s test our solution:

@Test
void givenCar_whenSaveUsingOptions_thenReturnId() {
    Car car = new Car();
    car.setModel("BMW");
    carXmlMapper.saveUsingOptions(car);
    assertNotNull(car.getId());
}

5.2. The SelectKey Element

Next, let’s add a new method inside the CarXmlMapper interface to see how to retrieve the identity using the selectKey element:

void saveUsingSelectKey(Car car);

Furthermore, let’s specify the statement inside the XML mapper file and bind it to the method:

<insert id="saveUsingSelectKey" parameterType="com.baeldung.mybatis.generatedid.Car">
    INSERT INTO CAR(MODEL)
    VALUES (#{model});
    <selectKey resultType="Long" order="AFTER" keyColumn="ID" keyProperty="id">
        CALL IDENTITY()
    </selectKey>
</insert>

Here, we defined the selectKey element using the following attributes:

  • resultType – specified the type the statement returns
  • order – indicates whether the statement CALL IDENTITY() should be called before or after insert statement
  • keyColumn – holds the name of the column representing an identifier
  • keyProperty – holds the name of the field to which the key should be mapped

Lastly, let’s create a test:

@Test
void givenCar_whenSaveUsingSelectKey_thenReturnId() {
    Car car = new Car();
    car.setModel("BMW");
    carXmlMapper.saveUsingSelectKey(car);
    assertNotNull(car.getId());
}

6. Conclusion

In this article, we learned how to retrieve the auto-generated ID from the insert statement using MyBatis and Spring.

To sum up, we explored how to retrieve the ID using the annotation-based approach and the @Options and @SelectKey annotations. Furthermore, we examined how to return the ID using the XML-based approach.

As always, the entire source code can be found over on GitHub.

       

concat() vs. merge() Operators in RxJava Observables

$
0
0
Contact Us Featured

1. Overview

concat() and merge() are two powerful operators used to combine multiple Observable instances in RxJava.

concat() emits items from each Observable sequentially, waiting for each to complete before moving to the next, while merge() concurrently emits items from all Observable instances as they’re produced.

In this tutorial, we’ll explore scenarios in which concat() and merge() show similar and different behaviors.

2. Synchronous Sources

The concat() and merge() operators behave exactly the same when both sources are synchronous. Let’s simulate this scenario to understand it better.

2.1. Scenario Setup

Let’s start by using the Observable.just() factory method to create three synchronous sources:

Observable<Integer> observable1 = Observable.just(1, 2, 3);
Observable<Integer> observable2 = Observable.just(4, 5, 6);
Observable<Integer> observable3 = Observable.just(7, 8, 9);

Further, let’s create two subscribers, namely, testSubscriberForConcat and testSubscriberForMerge:

TestSubscriber<Integer> testSubscriberForConcat = new TestSubscriber<>();
TestSubscriber<Integer> testSubscriberForMerge = new TestSubscriber<>();

Great! We’ve got everything we need to test the scenario.

2.2. concat() and merge()

First, let’s apply the concat() operator and subscribe the resultant Observable with testSubscriberForConcat:

Observable.concat(observable1, observable2, observable3)
  .subscribe(testSubscriberForConcat);

Further, let’s verify that the emissions are in order where items from observable1 appear before observable2, and observable2 before observable3:

testSubscriberForConcat.assertValues(1, 2, 3, 4, 5, 6, 7, 8, 9);
Scenario-1: concat()

Similarly, we can apply the merge() operator and subscribe the outcome with the testSubscriberForMerge:

Observable.merge(observable1, observable2, observable3).subscribe(testSubscriberForMerge);

Next, let’s verify that the emissions through merge follow the same order as that from the concatenation:

testSubscriberForMerge.assertValues(1, 2, 3, 4, 5, 6, 7, 8, 9);
Scenario-1: merge()

Lastly, we must note that synchronous Observable instances emit all items immediately and then signal completion. Further, each Observable completes its emission before the next one starts. Consequently, both operators process each Observable sequentially, producing the same output.

As such, whether the sources are synchronous or asynchronous, the general rule is that if we need to maintain the order of emissions by source, we  should use concat(). On the other hand, if we want to combine items as they’re emitted from multiple sources, we should use merge().

3. Predictable Asynchronous Sources

In this section, let’s simulate a scenario with asynchronous sources where the order of emissions is predictable.

3.1. Scenario Setup

Let’s create two asynchronous sources, namely, observable1 and observable2:

Observable<Integer> observable1 = Observable.interval(100, TimeUnit.MILLISECONDS)
  .map(i -> i.intValue() + 1)
  .take(3);
Observable<Integer> observable2 = Observable.interval(30, TimeUnit.MILLISECONDS)
  .map(i -> i.intValue() + 4)
  .take(7);

We must notice that emissions from observable1 arrive after 100ms, 200ms, and 300ms, respectively. On the other hand, emissions from observable2 arrive at intervals of 30ms.

Now, let’s create the testSubscriberForConcat and testSubscriberforMerge as well:

TestSubscriber<Integer> testSubscriberForConcat = new TestSubscriber<>();
TestSubscriber<Integer> testSubscriberForMerge = new TestSubscriber<>();

Fantastic! We’re ready to test this scenario.

3.2. concat() vs. merge()

First, let’s apply the concat() operator and call subscribe() with testSubscribeForConcat:

Observable.concat(observable1, observable2)
  .subscribe(testSubscriberForConcat);

Next, we must call the awaitTerminalEvent() method to ensure that all emissions are received:

testSubscriberForConcat.awaitTerminalEvent();

Now, we can validate that the result contains all items from observable1 followed by all items from observable2:

testSubscriber.assertValues(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
Scenario-2: concat()

Further, let’s apply the merge() operator and call subscribe() with testSubscriberForMerge:

Observable.merge(observable1, observable2)
  .subscribe(testSubscriberForMerge);

Lastly, let’s wait for the emissions and check the emitted values:

testSubscriberForMerge.awaitTerminalEvent();
testSubscriberForMerge.assertValues(4, 5, 6, 1, 7, 8, 9, 2, 10, 3);
Scenario-2: merge()

The result contains all items interleaved together in the order of their actual emission from observer1 and observer2.

4. Asynchronous Sources With Race Conditions

In this section, we’ll simulate a scenario with two asynchronous sources in which the order of combined emission is fairly unpredictable.

4.1. Scenario Setup

First, let’s create the two asynchronous sources with exactly the same delay:

Observable<Integer> observable1 = Observable.interval(100, TimeUnit.MILLISECONDS)
  .map(i -> i.intValue() + 1)
  .take(3);
Observable<Integer> observable2 = Observable.interval(100, TimeUnit.MILLISECONDS)
  .map(i -> i.intValue() + 4)
  .take(3);

We know that one emission from each source arrives after 100ms, 200ms, and 300ms. However, we can’t predict the exact order because of the race conditions.

Next, let’s create two test subscribers:

TestSubscriber<Integer> testSubscriberForConcat = new TestSubscriber<>();
TestSubscriber<Integer> testSubscriberForMerge = new TestSubscriber<>();

Perfect! We’re good to go now.

4.2. concat() vs. merge()

First, let’s apply the concat() operator, followed by a subscription to testSubscribeForConcat:

Observable.concat(observable1, observable2)
  .subscribe(testSubscriberForConcat);
testSubscriberForConcat.awaitTerminalEvent();

Now, let’s verify that the outcome from the concat() operator remains unchanged:

testSubscriberForConcat.assertValues(1, 2, 3, 4, 5, 6);
Scenario-3: concat()

Further, let’s merge() and subscribe with testSubscriberForMerge:

Observable.merge(observable1, observable2)
  .subscribe(testSubscriberForMerge);
testSubscriberForMerge.awaitTerminalEvent();

Next, let’s accumulate all the emissions in a list and verify that it contains all values:

List<Integer> actual = testSubscriberForMerge.getOnNextEvents();
List<Integer> expected = Arrays.asList(1, 2, 3, 4, 5, 6);
assertTrue(actual.containsAll(expected) && expected.containsAll(actual));

Lastly, let’s also log the emissions to see it in action:

21:05:43.252 [main] INFO actual emissions: [4, 1, 2, 5, 3, 6]
Scenario-3: merge()

We can receive a different order for different runs.

5. Conclusion

In this article, we saw how the concat() and merge() operators in RxJava handle synchronous and asynchronous data sources. Further, we compared scenarios involving predictable and unpredictable patterns of emissions, emphasizing the differences between the two operators.

As always, the code from this article is available over on GitHub.

       

Introduction to BitcoinJ

$
0
0

1. Overview

Cryptocurrency is a secure and decentralized value store. It adopts a peer-to-peer (P2P) network for propagation and verification of transactions.

BitcoinJ is a Java library that simplifies the process of creating Bitcoin applications that enable users to perform cryptocurrency transactions seamlessly.

In this tutorial, we’ll explore BitcoinJ by delving into its key features and components. Also, we’ll explore how to create a wallet, fund the wallet, and send some coins to another wallet.

2. What Is BitcoinJ?

BitcoinJ is a Java library to simplify the process of creating Bitcoin applications.  It provides tools to create and manage Bitcoin wallets, send and receive transactions, and integrate with Bitcoin’s main network mainnet, testnet, and regtest networks.

Also, it provides Simplified Payment Verification (SPV) to interact with the Bitcoin network without downloading the whole blockchain.

3. Features of BitcoinJ

BitcoinJ allows us to easily create Bitcoin wallets, including generating addresses, managing private and public keys, and handling seed phrases for wallet recovery.

Furthermore, it provides functionality for sending and receiving Bitcoin transactions, enabling us to build an application that can handle Bitcoin transfers.

Additionally, it supports integration with the Bitcoin main network (mainnet) where real-world Bitcoin transactions occur. Also, it supports the testnet and regtest network for testing and prototyping.

Finally, it allows event listeners to respond to various events, such as incoming transactions or changes in the blockchain.

4. Basic Setup

To start interacting with the library, let’s add bitcoinj-core dependency to the pom.xml:

<dependency>
    <groupId>org.bitcoinj</groupId>
    <artifactId>bitcoinj-core</artifactId>
    <version>0.17-alpha4</version>
</dependency>

This dependency provides the Wallet and WalletKitApp classes to create a Bitcoin wallet.

Also, let’s add the slf4j-api and slf4j-simple  dependencies for logging:

<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-api</artifactId>
    <version>2.1.0-alpha1</version>
</dependency>
<dependency>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-simple</artifactId>
    <version>2.1.0-alpha1</version>
</dependency>

These dependencies are essential to log the activities of the application.

5. The Wallet Class

The wallet is an important component in cryptocurrency because it provides functionality to manage transactions. The BitcoinJ library provides the Wallet and WalletKitApp classes to create a wallet.

5.1. Creating Wallet

Before creating a wallet, we need to define the network the wallet interacts with.

BitcoinJ supports three types of networks – mainnet for production, testnest for testing, and regtest for local regression testing.

Let’s create the NetworkParameters object to connect to a test network:

 NetworkParameters params = TestNet3Params.get();

Next, let’s create a Wallet instance that accepts the NetworkParameters as a parameter:

void createWallet() throws IOException {
    Wallet wallet = Wallet.createDeterministic(params, Script.ScriptType.P2PKH);
    File walletFile = new File("baeldung.dat");
    wallet.saveToFile(walletFile);
}

In the method above, we create a Wallet that interacts with the test network. We use the P2PKH script type that represents a Pay-to-Pubkey-Hash address, which is a common type of Bitcoin address. Finally, we save the wallet as baeldung.dat.

Notably, the wallet address, seed phrase, and public and private keys are generated while creating the wallet.

5.2. Retrieving Wallet Details

After creating a wallet, we can retrieve essential details like the address, public key, private key, and the seed phrase:

Wallet loadWallet() throws IOException, UnreadableWalletException {
    File walletFile = new File("baeldung.dat");
    Wallet wallet = Wallet.loadFromFile(walletFile);
    logger.info("Address: " + wallet.currentReceiveAddress().toString());
    logger.info("Seed Phrase: " + wallet.getKeyChainSeed().getMnemonicString());
    logger.info("Balance: " + wallet.getBalance().toFriendlyString());
    logger.info("Public Key: " + wallet.findKeyFromAddress(wallet.currentReceiveAddress()).getPublicKeyAsHex());
    logger.info("Private Key: " + wallet.findKeyFromAddress(wallet.currentReceiveAddress()).getPrivateKeyAsHex());
    return wallet;
}

Here, we load the wallet from the baeldung.dat file by invoking the loadFromFile() method on the Wallet instance. Then, we log the address, seed phrase, and balance of the wallet to the console.

Additionally, we log the public and private keys to the console.

5.3. Recovering Wallet

In case where we lose access to the wallet file but have the seed phrase, we can recover the wallet using the seed phrase:

Wallet loadUsingSeed(String seedWord) throws UnreadableWalletException {
    DeterministicSeed seed = new DeterministicSeed(seedWord, null, "", Utils.currentTimeSeconds());
    return Wallet.fromSeed(params, seed);
}

In the code above, we create a DeterministicSeed object that accepts the seed phrases and time as arguments. Then we call the Wallet.fromSeed() method, passing params and seed as arguments. This creates a new Wallet instance from the provided seed phrase and network parameter.

5.4. Securing Wallet

We need to keep the private key and seed phrase private from unauthorized access. Gaining access to the private key can allow attackers to access our wallets and spend the available funds.

BitcoinJ provides the encrypt() method to set a password for our wallet:

// ...
wallet.encrypt("password");
wallet.saveToFile(walletFile);
// ...

Here, we invoke the encrypt() method on the Wallet object. It accepts the intended password as an argument. With the wallet encrypted, no one can access the private key without decrypting the wallet first:

// ...
Wallet wallet = Wallet.loadFromFile(walletFile);
wallet.decrypt("password");
// ...

Here, we decrypt the wallet with the password we used to encrypt it. Notably, it’s important to keep the wallet file, seed phrase, and private key from external sources.

5.5. Connecting to a Peer Group

Currently, our wallet is in isolation and isn’t aware of transactions because it’s not in synchronization with the blockchain. We need to connect to a peer group network:

void connectWalletToPeer() throws BlockStoreException, UnreadableWalletException, IOException {
    Wallet wallet = loadWallet();
    BlockStore blockStore = new MemoryBlockStore(params);
    BlockChain chain = new BlockChain(params, wallet, blockStore);
    PeerGroup peerGroup = new PeerGroup(params, chain);
    peerGroup.addPeerDiscovery(new DnsDiscovery(params));
    peerGroup.addWallet(wallet);
    peerGroup.start();
    peerGroup.downloadBlockChain();
}

Here, we first load the wallet and then create a BlockStore instance which stores the blockchain in memory. Also, we create a BlockChain instance, which manages the data structure behind Bitcoin.

Finally, we create a PeerGroup instance to establish network connections. These classes are necessary to interact with testnet and synchronize our wallet with the network.

However, downloading the whole blockchain may be resource-intensive. Hence, WalletKitApp class uses SPV by default to simplify this process.

6. The WalletKitApp Class

BitcoinJ provides the WalletKitApp class which simplifies the process of setting up a wallet. The class abstracts away creating BlockStore, BlockChain and PeerGroup instances, making it easier to work with the Bitcoin network.

Let’s create a wallet using the WalletKitApp and log the wallet details after creation:

NetworkParameters params = TestNet3Params.get();
WalletAppKit kit = new WalletAppKit(params, new File("."), "baeldungkit") {
    @Override
    protected void onSetupCompleted() {
        logger.info("Wallet created and loaded successfully.");
        logger.info("Receive address: " + wallet().currentReceiveAddress());
        logger.info("Seed Phrase: " + wallet().getKeyChainSeed());
        logger.info("Balance: " + wallet().getBalance().toFriendlyString());
        logger.info("Public Key: " + wallet().findKeyFromAddress(wallet().currentReceiveAddress())
          .getPublicKeyAsHex());
        logger.info("Private Key: " + wallet().findKeyFromAddress(wallet().currentReceiveAddress())
          .getPrivateKeyAsHex());
        wallet().encrypt("password");
    }
};
kit.startAsync();
kit.awaitRunning();
kit.setAutoSave(true);;

Here, we set up a new wallet with testnet parameters and we specify the directory to store the wallet data. We start the WalletAppKit object asynchronously. We enable auto-save to locally store the latest wallet information.

Here’s the wallet details:

[ STARTING] INFO com.baeldung.bitcoinj.Kit - Wallet created and loaded successfully.
[ STARTING] INFO com.baeldung.bitcoinj.Kit - Receive address: moqVLcdRFjyXehgRAK5bJBK6rDN2vq14Wc
[ STARTING] INFO com.baeldung.bitcoinj.Kit - Seed Phrase: DeterministicSeed{unencrypted}
[ STARTING] INFO com.baeldung.bitcoinj.Kit - Balance: 0

The wallet currently has zero Bitcoin.

6.1. Adding an Event Listener

We can add an event listener to the wallet to respond to various events, such as incoming transactions:

kit.wallet()
  .addCoinsReceivedEventListener((wallet, tx, prevBalance, newBalance) -> {
      logger.info("Received tx for " + tx.getValueSentToMe(wallet));
      logger.info("New balance: " + newBalance.toFriendlyString());
  });

The event listener above logs incoming transactions with transaction ID and the amount received.

Moreover, let’s add an event listener to log the current wallet balance when we send a coin to another wallet:

kit.wallet()
  .addCoinsSentEventListener((wallet, tx, prevBalance, newBalance) -> logger.info("new balance: " + newBalance.toFriendlyString()));

The code above listens to an event to send coins out of the wallet and log the new balance.

6.2. Receiving Bitcoin

Let’s fund our wallet by requesting test bitcoins from a Bitcoin testnet faucet:

test bitcoin from test faucet

In the image above, we sent 0.0001238 bitcoins to our wallet address. However, this undergoes some confirmation before our wallet receives it. Notably, our application must be running to be in sync with the test blockchain.

Next, let’s verify the transaction on a blockchain explorer like BlockCypher:

sending history on block cypher

Finally, let’s re-execute our program and check the log for the received coin:

The wallet balance is now updated to reflect the received Bitcoin.

6.3. Sending Bitcoin

We can easily send a Bitcoin to another address by creating a Coin instance and specifying the amount to send:

String receiveAddress = "n1vb1YZXyMQxvEjkc53VULi5KTiRtcAA9G";
Coin value = Coin.valueOf(200);
final Coin amountToSend = value.subtract(Transaction.REFERENCE_DEFAULT_MIN_TX_FEE);
final Wallet.SendResult sendResult = kit.wallet()
  .sendCoins(kit.peerGroup(), Address.fromString(params, receiveAddress), amountToSend);

In the code above, we specify the address to send 200 satoshi. Also, we subtract the transaction fee for the miner. Finally, we call sendCoins(), which initiates the process of transferring the coin.

7. Conclusion

In this article, we learned how to use the BitcoinJ library by interacting with a Bitcoin test network. We discussed key features such as wallet management, transaction handling, network integration, and event handling.

As always, the complete source code for the examples is available over on GitHub.

       

Java Weekly, Issue 544

$
0
0

1. Spring and Java

>> Model Data, the Whole Data, and Nothing but the Data – Data Oriented Programming v1.1 [inside.java]

An interesting look at how we can use Records and Sealed types to model aggregates and interfaces to express different alternatives for those models.

>> JEP 477 Enhances Beginner Experience with Implicitly Declared Classes and Instance Main Methods and the JEP (Third Preview) [infoq.com] [openjdk.org]

No need to declare a proper public static main inside a class anymore.

>> Implement your primary key as a Record using an IdClass [thorben-janssen.com]

IdClass to represent compound primary keys: now it’s possible to use Java records elegantly and concisely. Good stuff.

Also worth reading:

Webinars and presentations:

Time to upgrade:

2. Technical & Musings

>> Data Fetching Patterns in Single-Page Applications [martinfowler.com]

Optimizing data fetching in Single Page Applications: strategies and best practices!

>> PostgreSQL COPY result set to file [vladmihalcea.com]

Using the PostgreSQL COPY command to export a large result set to an external file.

Also worth reading:

3. Pick of the Week

>> Extreme brainstorming questions to trigger new, better ideas [asmartbear.com]

       

Removing BOM Characters When Reading from File

$
0
0
start here featured

1. Introduction

The Byte Order Mark (BOM) indicates the encoding of a file but can cause issues if we don’t handle it correctly, especially when processing text data. Besides, it isn’t uncommon to encounter files that start with a BOM character when reading text files.

In this tutorial, we’ll explore how to detect and remove BOM characters when reading from a file in Java, focusing specifically on UTF-8 encoding.

2. Understanding BOM Characters

A BOM character is a special Unicode character that signals a text file or stream’s endianness (byte order). For UTF-8, the BOM is EF BB BF (0xEF 0xBB 0xBF).

While useful for encoding detection, BOM characters can interfere with text processing if not properly removed.

3. Using InputStream and Reader

The traditional approach to handling BOMs involves using InputStream and Reader in Java. This approach lets us manually detect and remove BOMs from the input stream before processing the file’s content.

First, we should read the content of a file completely, as follows:

private String readFully(Reader reader) throws IOException {
    StringBuilder content = new StringBuilder();
    char[] buffer = new char[1024];
    int numRead;
    while ((numRead = reader.read(buffer)) != -1) {
        content.append(buffer, 0, numRead);
    }
    return content.toString();
}

Here, we utilize a StringBuilder to accumulate the content read from the Reader. By repeatedly reading chunks of characters into a buffer array and appending them to the StringBuilder, we ensure that the entire content of the file is captured. Finally, the accumulated content is returned as a string.

Now, let’s apply the readFully() method within a test case to demonstrate how we can effectively handle BOMs using InputStream and Reader:

@Test
public void givenFileWithBOM_whenUsingInputStreamAndReader_thenRemoveBOM() throws IOException {
    try (InputStream is = new FileInputStream(filePath)) {
        byte[] bom = new byte[3];
        int n = is.read(bom, 0, bom.length);
        Reader reader;
        if (n == 3 && (bom[0] & 0xFF) == 0xEF && (bom[1] & 0xFF) == 0xBB && (bom[2] & 0xFF) == 0xBF) {
            reader = new InputStreamReader(is, StandardCharsets.UTF_8);
        } else {
            reader = new InputStreamReader(new FileInputStream(filePath), StandardCharsets.UTF_8);
        }
        assertEquals(expectedContent, readFully(reader));
    }
}

In this method, we first set up the file path using the class loader’s resource and handle potential URI syntax exceptions. Then, we utilize the FileInputStream to open an InputStream to the file and create a Reader with UTF-8 encoding using the InputStreamReader.

Additionally, we utilize the read() method of the input stream to read the first 3 bytes into a byte array to check for the presence of a BOM.

If the system detects a UTF-8 BOM (0xEF, 0xBB, 0xBF), it skips it and asserts the content using the readFully() method we defined earlier. Otherwise, we reset the stream by creating a new InputStreamReader with UTF-8 encoding and performing the same assertion.

4. Using Apache Commons IO

An alternative to the manual detection and removal of BOMs is provided by Apache Commons IO, a library offering various utilities for common I/O operations. Among these utilities is the BOMInputStream class, which simplifies handling BOMs by automatically detecting and removing them from an input stream.

Here’s how we implement this approach:

@Test
public void givenFileWithBOM_whenUsingApacheCommonsIO_thenRemoveBOM() throws IOException {
    try (BOMInputStream bomInputStream = new BOMInputStream(new FileInputStream(filePath));
         Reader reader = new InputStreamReader(bomInputStream, StandardCharsets.UTF_8)) {
        assertTrue(bomInputStream.hasBOM());
        assertEquals(expectedContent, readFully(reader));
    }
}

In this test case, we wrap the FileInputStream with a BOMInputStream, automatically detecting and removing any BOM in the input stream. Moreover, we use the assertTrue() method to check if a BOM was detected and removed successfully using the hasBOM() method.

We then create a Reader using the BOMInputStream and assert the content using the readFully() method to ensure that the content matches the expected content without being affected by the BOM.

5. Using NIO (New I/O)

Java’s NIO (New I/O) package provides efficient file-handling capabilities, including support for reading file contents into memory buffers. Leveraging NIO, we can detect and remove BOMs from a file using ByteBuffer and Files classes.

Here’s how we can implement a test case using NIO for BOM handling:

@Test
public void givenFileWithBOM_whenUsingNIO_thenRemoveBOM() throws IOException, URISyntaxException {
    byte[] fileBytes = Files.readAllBytes(Paths.get(filePath));
    ByteBuffer buffer = ByteBuffer.wrap(fileBytes);
    if (buffer.remaining() >= 3) {
        byte b0 = buffer.get();
        byte b1 = buffer.get();
        byte b2 = buffer.get();
        if ((b0 & 0xFF) == 0xEF && (b1 & 0xFF) == 0xBB && (b2 & 0xFF) == 0xBF) {
            assertEquals(expectedContent, StandardCharsets.UTF_8.decode(buffer).toString());
        } else {
            buffer.position(0);
            assertEquals(expectedContent, StandardCharsets.UTF_8.decode(buffer).toString());
        }
    } else {
        assertEquals(expectedContent, StandardCharsets.UTF_8.decode(buffer).toString());
    }
}

In this test case, we read the file’s contents into a ByteBuffer using the readAllBytes() method. We then check for a BOM’s presence by inspecting the buffer’s first three bytes. If a UTF-8 BOM is detected, we skip it; otherwise, we reset the buffer position.

6. Conclusion

In conclusion, by employing different Java libraries and techniques, handling BOMs in file reading operations becomes straightforward and ensures smooth text processing.

As always, the complete code samples for this article can be found over on GitHub.

       

A Guide to Micrometer in Quarkus

$
0
0
start here featured

1. Introduction

Monitoring and observability are indispensable aspects of modern application development, especially in cloud-native and microservices architectures.

Quarkus has emerged as a popular choice for building Java-based applications and is known for its lightweight and fast nature. Integrating Micrometer into our Quarkus applications provides a robust solution for monitoring various aspects of our application’s performance and behavior.

In this tutorial, we’ll explore advanced monitoring techniques for using Micrometer in Quarkus.

2. Maven Dependency

To use Micrometer with Quarkus, we need to include the quarkus-micrometer-registry-prometheus dependency:

<dependency>
    <groupId>io.quarkus</groupId>
    <artifactId>quarkus-micrometer-registry-prometheus</artifactId>
    <version>3.11.0</version>
</dependency>

This dependency provides the necessary interfaces and classes for instrumenting our code and includes a specific registry implementation. Specifically, the micrometer-registry-prometheus is a popular choice that implements the Prometheus REST endpoint to expose metrics in our Quarkus application.
This also transitively includes the core quarkus-micrometer dependency. In addition to the metrics registry we’ll use for custom metrics, this provides out-of-the-box metrics from the JVM, thread pools, and HTTP requests.

3. Counters

Now that we’ve seen how to include Micrometer in our Quarkus application, let’s implement custom metrics. First, we’ll look at adding basic counters to our application to track the usage of various operations.

Our Quarkus application implements a simple endpoint to determine whether a given string is a palindrome. Palindromes are strings that read the same backward and forward, such as “radar” or “level”. We specifically want to count each time this palindrome check is invoked.

Let’s create a Micrometer counter:

@Path("/palindrome")
@Produces("text/plain")
public class PalindromeResource {
    private final MeterRegistry registry;
    public PalindromeResource(MeterRegistry registry) {
        this.registry = registry;
    }
    @GET
    @Path("check/{input}")
    public boolean checkPalindrome(String input) {
        registry.counter("palindrome.counter").increment();
        boolean result = internalCheckPalindrome(input);
        return result;
    }
    private boolean internalCheckPalindrome(String input) {
        int left = 0;
        int right = input.length() - 1;
        while (left < right) {
            if (input.charAt(left) != input.charAt(right)) {
                return false;
            }
            left++;
            right--;
        }
        return true;
    }
}

We can execute our palindrome check with a GET request to ‘/palindrome/check/{input}‘, where input is the word we want to check.

To implement our counter, we injected the MeterRegistry into our PalindromeResource. Notably, we increment() the counter before every palindrome check. Finally, after calling the endpoint several times, we can call the ‘/q/metrics‘ endpoint to see the counter metric. We’ll find the number of times we called our operation as the palindrome_counter_total entry.

4. Timers

We can also track the duration of palindrome checks. Therefore to achieve this, we’ll add a Micrometer Timer to our PalindromeResource:

@GET
@Path("check/{input}")
public boolean checkPalindrome(String input) {
    Timer.Sample sample = Timer.start(registry);
    boolean result = internalCheckPalindrome(input);
    sample.stop(registry.timer("palindrome.timer"));
    return result;
}

First, we start the timer, which creates a Timer.Sample that tracks the operation duration. We then call our internalCheckPalindrome() method after starting the timer. Finally, we stop the timer and record the elapsed time. By incorporating this timer, we can monitor the duration of each palindrome check, which also enables us to identify performance bottlenecks and optimize the efficiency of our application.

Micrometer follows Prometheus conventions for timer metrics, converting measured durations into seconds and including this unit in the metric name.

After calling the endpoint multiple times we can see the following metrics at the same metrics endpoint:

  • palindrome_timer_seconds_count – how many times the counter was called
  • palindrome_timer_seconds_sum – the total duration of all method calls
  • palindrome_timer_seconds_max – the maximum observed duration within a decaying interval

Finally, looking at the data produced by the timer we can use the sum and the count to calculate how long (on average) it takes to determine if a string is palindrome.

5. Gauges

A gauge is a metric representing a single numerical value that can arbitrarily go up and down. Gauges allow us to monitor real-time metrics, providing insights into dynamic values and helping us quickly respond to changing conditions. They’re particularly useful for tracking frequently fluctuating values, such as queue sizes and thread counts.

Let’s say we want to keep all the checked words in the program memory and save them to the database or send them to another service. We’ll want to keep track of the number of elements to monitor our program’s memory. Let’s implement a gauge for this.

We’ll initialize a gauge in our constructor after injecting the registry and we’ll declare an empty list to store the inputs:

private final LinkedList<String> list = new LinkedList<>();
public PalindromeResource(MeterRegistry registry) {
    this.registry = registry;
    registry.gaugeCollectionSize("palindrome.list.size", Tags.empty(), list);
}

Now we’ll add elements to our list whenever we receive input and check the palindrome_list_size value to see the size of our gauge:

list.add(input);

The gauge effectively gives us a snapshot of the current program state.

We can also simulate the emptying of the list and reset our gauge:

@DELETE
@Path("empty-list")
public void emptyList() {
    list.clear();
}

This shows that gauges are real-time measurements. After clearing the list, our palindrome_list_size gauge is reset to zero until we check more palindromes.

6. Conclusion

In our journey with Micrometer in Quarkus, we’ve learned to track how often we perform specific operations using counters, the duration of operations with timers, and monitor real-time metrics with gauges. These tools provide valuable insights into our application’s performance, enabling us to make informed decisions for optimization.

As always, the full implementation code of this article can be found over on GitHub.

       

Embedded PostgreSQL for Spring Boot Tests

$
0
0
start here featured

1. Overview

Writing integration tests with databases offers several options for test databases. One effective option is to use a real database, ensuring that our integration tests closely mimic production behavior.

In this tutorial, we’ll demonstrate how to use Embedded PostgreSQL for Spring Boot tests and review a few alternatives.

2. Dependencies and Configuration

We’ll start by adding the Spring Data JPA dependency, as we’ll use it to create our repositories:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>

To write integration tests for a Spring Boot application, we need to include the Spring Test dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <scope>test</scope>
</dependency>

Finally, we need to include the Embedded Postgres dependency:

<dependency>
    <groupId>com.opentable.components</groupId>
    <artifactId>otj-pg-embedded</artifactId>
    <version>1.0.3</version>
    <scope>test</scope>
</dependency>

Also, let’s set the basic configuration for our international tests:

spring.jpa.properties.hibernate.dialect = org.hibernate.dialect.PostgreSQLDialect
spring.jpa.hibernate.ddl-auto=create-drop

We’ve specified the PostgreSQLDialect and enabled schema recreation before our test execution.

3. Usage

First things first, let’s create the Person entity that we’ll use in our tests:

@Entity
public class Person {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    @Column
    private String name;
    // getters and setters
}

Now, let’s create a Spring Data Repository for our entity:

public interface PersonRepository extends JpaRepository<Person, Long> {
}

After that, let’s create a test configuration class:

@Configuration
@EnableJpaRepositories(basePackageClasses = PersonRepository.class)
@EntityScan(basePackageClasses = Person.class)
public class EmbeddedPostgresConfiguration {
    private static EmbeddedPostgres embeddedPostgres;
    @Bean
    public DataSource dataSource() throws IOException {
        embeddedPostgres = EmbeddedPostgres.builder()
          .setImage(DockerImageName.parse("postgres:14.1"))
          .start();
        return embeddedPostgres.getPostgresDatabase();
    }
    public static class EmbeddedPostgresExtension implements AfterAllCallback {
        @Override
        public void afterAll(ExtensionContext context) throws Exception {
            if (embeddedPostgres == null) {
                return;
            }
            embeddedPostgres.close();
        }
    }
}

Here, we’ve specified the path to our repository and entity. We’ve created the data source using the EmbeddedPostgres builder, selecting the version of the Postgres database to use during the tests. Additionally, we’ve added the EmbeddedPostgresExtension to ensure that the embedded Postgres connection is closed after executing the test class. Finally, let’s create the test class:

@DataJpaTest
@ExtendWith(EmbeddedPostgresExtension.class)
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@ContextConfiguration(classes = {EmbeddedPostgresConfiguration.class})
public class EmbeddedPostgresIntegrationTest {
    @Autowired
    private PersonRepository repository;
    @Test
    void givenEmbeddedPostgres_whenSavePerson_thenSavedEntityShouldBeReturnedWithExpectedFields(){
        Person person = new Person();
        person.setName("New user");
        Person savedPerson = repository.save(person);
        assertNotNull(savedPerson.getId());
        assertEquals(person.getName(), savedPerson.getName());
    }
}

We’ve used the @DataJpaTest annotation to set up a basic Spring test context. We’ve extended the test class with our EmbeddedPostgresExtension and attached our EmbeddedPostgresConfiguration to the test context. After that, we successfully created a Person entity and saved it in the database.

4. Flyway Integration

Flyway is a popular migration tool that helps manage schema changes. When we use it, it’s important to include it in our international tests. In this section, we’ll see how it can be done using the embedded Postgres. Let’s start with the dependencies:

<dependency>
    <groupId>org.flywaydb</groupId>
    <artifactId>flyway-core</artifactId>
</dependency>

After that, let’s specify the database schema in the flyway migration script:

CREATE SEQUENCE IF NOT EXISTS person_seq INCREMENT 50;
;
CREATE TABLE IF NOT EXISTS person(
    id bigint NOT NULL,
    name character varying(255)
)
;

Now we can create the test configuration:

@Configuration
@EnableJpaRepositories(basePackageClasses = PersonRepository.class)
@EntityScan(basePackageClasses = Person.class)
public class EmbeddedPostgresWithFlywayConfiguration {
    @Bean
    public DataSource dataSource() throws SQLException {
        return PreparedDbProvider
          .forPreparer(FlywayPreparer.forClasspathLocation("db/migrations"))
          .createDataSource();
    }
}

We’ve specified the data source bean, where using the PreparedDbProvider and FlywayPreparer we’ve defined the location of the migrations scripts. Finally, here’s our test class:

@DataJpaTest(properties = { "spring.jpa.hibernate.ddl-auto=none" })
@ExtendWith(EmbeddedPostgresExtension.class)
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@ContextConfiguration(classes = {EmbeddedPostgresWithFlywayConfiguration.class})
public class EmbeddedPostgresWithFlywayIntegrationTest {
    @Autowired
    private PersonRepository repository;
    @Test
    void givenEmbeddedPostgres_whenSavePerson_thenSavedEntityShouldBeReturnedWithExpectedFields(){
        Person person = new Person();
        person.setName("New user");
        Person savedPerson = repository.save(person);
        assertNotNull(savedPerson.getId());
        assertEquals(person.getName(), savedPerson.getName());
        List<Person> allPersons = repository.findAll();
        Assertions.assertThat(allPersons).contains(person);
    }
}

We’ve disabled the spring.jpa.hibernate.ddl-auto property to allow Flyway to handle schema changes. After that, we saved our Person entity in the database and successfully retrieved it.

5. Alternatives

5.1. Testcontainers

The latest versions of the embedded Postgres project use TestContainers under the hood. Therefore, one alternative is to use the TestContainers library directly. Let’s start by adding the necessary dependencies:

<dependency>
    <groupId>org.testcontainers</groupId>
    <artifactId>postgresql</artifactId>
    <version>1.19.8</version>
    <scope>test</scope>
</dependency>

Now we’ll create the initializer class, where we configure the PostgreSQLContainer for our tests:

public class TestContainersInitializer implements
  ApplicationContextInitializer<ConfigurableApplicationContext>, AfterAllCallback {
    private static final PostgreSQLContainer postgreSQLContainer = new PostgreSQLContainer(
      "postgres:14.1")
      .withDatabaseName("postgres")
      .withUsername("postgres")
      .withPassword("postgres");
    @Override
    public void initialize(ConfigurableApplicationContext applicationContext) {
        postgreSQLContainer.start();
        TestPropertyValues.of(
          "spring.datasource.url=" + postgreSQLContainer.getJdbcUrl(),
          "spring.datasource.username=" + postgreSQLContainer.getUsername(),
          "spring.datasource.password=" + postgreSQLContainer.getPassword()
        ).applyTo(applicationContext.getEnvironment());
    }
    @Override
    public void afterAll(ExtensionContext context) throws Exception {
        if (postgreSQLContainer == null) {
            return;
        }
        postgreSQLContainer.close();
    }
}

We’ve created the PostgreSQLContainer instance and implemented the ApplicationContextInitializer interface to set the configuration properties for our test context. Additionally, we’ve implemented the AfterAllCallback to close the Postgres container connection after the tests. Now, let’s create the test class:

@DataJpaTest
@ExtendWith(TestContainersInitializer.class)
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@ContextConfiguration(initializers = TestContainersInitializer.class)
public class TestContainersPostgresIntegrationTest {
    @Autowired
    private PersonRepository repository;
    @Test
    void givenTestcontainersPostgres_whenSavePerson_thenSavedEntityShouldBeReturnedWithExpectedFields() {
        Person person = new Person();
        person.setName("New user");
        Person savedPerson = repository.save(person);
        assertNotNull(savedPerson.getId());
        assertEquals(person.getName(), savedPerson.getName());
    }
}

Here, we’ve extended the tests by using our TestContainersInitializer and specified the initializer for the test configuration with the @ContextConfiguration annotation. We’ve created the same test cases as in the previous section and successfully saved our Person entity in the Postgres database running in a test container.

5.2. Zonky Embedded Database

Zonky Embedded Database was created as a fork of Embedded Postgres and continues supporting options for a test database without Docker. Let’s add the dependencies that we need to use this library:

<dependency>
    <groupId>io.zonky.test</groupId>
    <artifactId>embedded-postgres</artifactId>
    <version>2.0.7</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>io.zonky.test</groupId>
    <artifactId>embedded-database-spring-test</artifactId>
    <version>2.5.1</version>
    <scope>test</scope>
</dependency>

After that, we’re able to write the test class:

@DataJpaTest
@AutoConfigureEmbeddedDatabase(provider = ZONKY)
public class ZonkyEmbeddedPostgresIntegrationTest {
    @Autowired
    private PersonRepository repository;
    @Test
    void givenZonkyEmbeddedPostgres_whenSavePerson_thenSavedEntityShouldBeReturnedWithExpectedFields(){
        Person person = new Person();
        person.setName("New user");
        Person savedPerson = repository.save(person);
        assertNotNull(savedPerson.getId());
        assertEquals(person.getName(), savedPerson.getName());
    }
}

Here, we’ve specified the @AutoConfigureEmbeddedDatabase annotation using the ZONKY provider, enabling us to use the embedded Postgres database without Docker. This library also supports other providers such as Embedded and Docker. Finally, we’ve successfully saved our Person entity in the database.

6. Conclusion

In this article, we’ve explored how to use the Embedded Postgres database for testing purposes and reviewed some alternatives. There are various ways to incorporate the Postgres database in tests, both with and without Docker containers. The best choice depends on your specific use case.

As usual, the full source code can be found over on GitHub.

       

Converting java.sql.Timestamp to java.util.Calendar

$
0
0
start here featured

1. Introduction

In this tutorial we’ll learn how to convert a java.sql.Timestamp object to a java.util.Calendar object.

First, we’ll see how Java’s two classes, Timestamp and Calendar, handle time. We’ll then explore the common use-cases for performing the conversion. Finally, we’ll examine how to convert from one object to the other.

2. Time in Java

In general, we deal with time expressed in milliseconds, where one millisecond equals a thousandth of a second. If we need more precision, another commonly used precision is nanoseconds (billionths of a second).

The java.sql.Timestamp class is a subclass of the java.util.Date object with an integer representation of nanoseconds. It’s a class that is intended to be used with timestamp data types coming from SQL databases. Time is expressed as the number of milliseconds since January 1, 1970, 00:00:00 GMT, with the aforementioned fractional seconds field added to increase precision.

On the other hand, the java.util.Calendar class allows us to extract day, month, or year information from a Timestamp. We can also use it to get data about time in the future, or to extend our own Calendar implementation. Calendar also doesn’t handle precision on a level higher than milliseconds, which we’ll explore later.

3. Converting Time Objects

The prime use case for converting the two types is when we need to extract a timestamp field from a database. Our application might need to display a value to a user or take action when a timestamp falls between a certain time range. Normally, we’d try and use the newer, more functional java.time classes introduced in Java 8, but with some legacy APIs that isn’t always possible.

3.1. Timestamp to Calendar

In either case, once we have a Timestamp object, we can convert it to a Calendar:

Timestamp timestamp = new Timestamp(1713544200801L);
Calendar calendar = Calendar.getInstance();
calendar.setTimeInMillis(timestamp.getTime());

As we see above, we can set the Calendar‘s time using the Timestamp‘s getTime() method. We’ve created a Timestamp using a sample a millisecond value for simplicity’s sake. To ensure that we’ve done the proper conversion, we can even test it:

assertEquals(calendar.getTimeInMillis(), timestamp.getTime());

Once we’ve verified that our conversion is correct, we can continue to use our Calendar as needed.

3.2. Calendar to Timestamp

Our test ensures the milliseconds for both objects match. However, it’s important to note that Timestamp is precise to the nanosecond, and when we convert the Calendar back to a Timestamp we lose that precision:

int nanos = 801789562;
int losslessNanos = 801000000;
Timestamp timestamp = new Timestamp(1713544200801L);
timestamp.setNanos(nanos);
assertEquals(nanos, timestamp.getNanos());
Calendar calendar = SqlTimestampToCalendarConverter.timestampToCalendar(timestamp);
timestamp = SqlTimestampToCalendarConverter.calendarToTimestamp(calendar);
assertEquals(losslessNanos, timestamp.getNanos());

Here, when calling the getTime() method of the Timestamp to convert it to milliseconds, its nanoseconds field is internally divided by 1,000,000 using integer division. In doing so, we end up with 801 milliseconds reduced from the original 801789562 nanoseconds value.

So, we should use another solution if our task requires nanosecond precision.

4. Conclusion

In this article, we learned that Timestamp and Calendar both handle time in terms of milliseconds since epoch time. In addition, we discovered that the Timestamp class tracks nanoseconds, which might be useful based on our requirements.

By exploring the usage of these objects, we learned how to convert a Timestamp to a Calendar, which is useful when interacting with specific calendar fields, like day or month.

As always, the full source code of our examples is available over on GitHub.

       

UDP Messaging Using Aeron

$
0
0

1. Introduction

In this article, we’re going to look at Aeron, a multi-language library maintained by Adaptive Financial Consulting, designed for efficient UDP messaging between applications. It’s designed for performance, aiming for high throughput, low latency, and fault tolerance.

2. Dependencies

Before we can use Aeron, we need to include the latest version in our build, which is 1.44.1 at the time of writing.

If we’re using Maven, we can include its dependency in pom.xml:

<dependency>
    <groupId>io.aeron</groupId>
    <artifactId>aeron-all</artifactId>
    <version>1.44.1</version>
</dependency>

Or if we’re using Gradle, we can include it in build.gradle:

implementation("io.aeron:aeron-all:1.44.1")

At this point, we’re ready to start using it in our application.

Note that, at present, some parts of Aeron don’t work out of the box with Java 16 or newer. This is due to specific interactions that JPMS blocks.

3. Media Driver

Aeron works with a level of indirection between the application and the transport. This is known as the Media Driver because it’s the interaction between our application and the transmission media.

Every Aeron process interacts with a media driver and, through that, can interact with other processes – either on the same machine or remotely. It performs this interaction via the file system. We need to point the media driver and all applications to the same directory on disk, where it stores various aspects. Note that we can only have a single media driver running for any given directory simultaneously. Attempting to run more than one will fail.

We’re able to run the media driver embedded within our application when we want to keep things simple:

MediaDriver mediaDriver = MediaDriver.launch();

This will launch a media driver with all of the default settings. In particular, this will run with the default media driver directory.

We also have an alternative launch method that’s designed for embedded use. This acts exactly as before, only it generates a random directory to ensure that multiple instances on the same machine won’t clash:

MediaDriver mediaDriver = MediaDriver.launchEmbedded();

In both of these cases, we can also provide a MediaDriver.Context object to further configure the media driver:

MediaDriver.Context context = new MediaDriver.Context();
context.threadingMode(ThreadingMode.SHARED);
MediaDriver mediaDriver = MediaDriver.launch(context);

When doing this, we need to close the media driver when we’ve finished with it. The interface implements AutoCloseable, so we can use the try-with-resources pattern to manage this.

Alternatively, we can run the media driver as an external application. We can do this using the aeron-all.jar JAR file that we’ve included as our dependency:

$ java -cp aeron-all-1.44.1.jar io.aeron.driver.MediaDriver

This will function precisely the same as MediaDriver.launch() above.

4. Aeron API Client

We perform all API interactions using Aeron via the Aeron class. We need to create a new instance of this and point it at our media driver. Simply creating a new instance will point at the media driver in the default location – precisely as if we’d launched it with MediaDriver.launch():

Aeron aeron = Aeron.connect();

Alternatively, we can provide an Aeron.Context object to configure the connection, including specifying the directory that the media driver is running in:

Aeron.Context ctx = new Aeron.Context();
ctx.aeronDirectoryName(mediaDriver.aeronDirectoryName());
Aeron aeron = Aeron.connect(ctx);

If our media driver is in a non-standard directory, including if we started it with MediaDriver.launchEmbedded(), we must do this. If the directory that we’re pointing at doesn’t have a running media driver, the Aeron.connect() call will block until it does.

We can connect as many Aeron clients as we need to the same media driver. Typically, these would be from different applications, but they can be from the same one if needed. However, if we do this, then we need to use new instances of Aeron.Context as well:

Aeron.Context ctx1 = new Aeron.Context();
ctx1.aeronDirectoryName(mediaDriver.aeronDirectoryName());
aeron1 = Aeron.connect(ctx1);
System.out.println("Aeron 1 connected: " + aeron1);
Aeron.Context ctx2 = new Aeron.Context();
ctx2.aeronDirectoryName(mediaDriver.aeronDirectoryName());
aeron2 = Aeron.connect(ctx2);
System.out.println("Aeron 2 connected: " + aeron2);

As with the MediaDriver, the Aeron instance is AutoCloseable. This means we can wrap it with the try-with-resources pattern to ensure that we close it correctly.

5. Sending and Receiving Messages

Now that we’ve got our Aeron API client, we’re ready to use it to send and receive messages.

5.1. Buffers

Aeron represents all messages – both sending and receiving – as DirectBuffer instances. Ultimately, these are nothing more than a set of bytes, but they provide us with a set of methods to work with a standard set of types.

When we’re sending messages, we need to construct the buffer ourselves from our own data. For this, we’re best off using an UnsafeBuffer instance – named because it uses sun.misc.Unsafe to read and write the values from our underlying buffer. Creating this requires either a byte array or a ByteBuffer instance, and we can then use BufferUtil.allocateDirectAligned() to help with making this most efficiently:

UnsafeBuffer buffer = new UnsafeBuffer(BufferUtil.allocateDirectAligned(256, 64));

Once we’ve got our buffer, we then have a whole range of getXyz() and putXyz() methods that we can use to manipulate the data in our buffer:

// Put a string into the buffer starting at index 0.
int length = buffer.putStringWithoutLengthUtf8(0, message); 
// Read a string of the given length from the buffer starting from the given offset.
String message = buffer.getStringWithoutLengthUtf8(offset, length); 

Note that we need to manage the offsets in the buffer ourselves. Whenever we put data into the buffer, it returns the length of the written data so we can calculate the next offset. When we read from the buffer, we need to know what the length will be.

5.2. Channels and Streams

Sending and Receiving data with Aeron is done using identified streams transmitted over specific channels.

We specify channels as URIs in a particular format, telling Aeron how to transmit the messages. Our media driver then uses this to interact with our transmission media, ensuring that it sends and receives the messages correctly. Streams are identified simply as numbers. The only requirement is that the two ends of the same communication use the same stream ID.

The simplest such channel is aeron:ipc, which transmits and receives using shared memory within the media driver. Note that this can only work if both sides use the same media driver and don’t allow for networking.

More usefully, we can use aeron:udp to send and receive using UDP. This allows us to communicate with any other application anywhere we can connect. In particular, our application will communicate with the media driver, and then the media drivers will communicate with each other:

When specifying a UDP channel, we need to include at least the host and port. On the receiving side, this is where we’ll be listening, and on the sending side, this is where we’ll be sending messages. For example, aeron:udp?endpoint=localhost:20121 will send and receive messages via UDP on localhost:20121.

5.3. Subscriptions

Once our media driver and Aeron client are set up, we’re ready to receive messages. We do this by creating a subscription to a particular stream on a particular channel and then polling this for messages.

Adding a subscription is enough for the media driver to set up everything to be able to receive our messages. We do this with the addSubscription() method on our Aeron instance:

Subscription subscription = aeron.addSubscription("aeron:udp?endpoint=localhost:20121", 1001);

As before, we need to close this when we no longer use it so the media driver knows to stop listening for messages. As always, this is AutoCloseable, so we can use try-with-resources to manage it.

When we have our subscription, we need to receive messages. Aeron performs this with a polling mechanism, giving us complete control over when it processes messages. To poll for messages, we need to provide a FragmentHandler that will process the message received. We can implement this with a lambda if we want to have all of the code inline or as a separate class implementing the interface if we want to reuse it:

FragmentHandler fragmentHandler = (buffer, offset, length, header) -> {
    String data = buffer.getStringWithoutLengthUtf8(offset, length);
    System.out.printf("Message from session %d (%d@%d) <<%s>>%n",
            header.sessionId(), length, offset, data);
};

Aeron calls this with a buffer, the offset into which the data starts, and the length of the data received. We can then process this buffer however we need for our application.

When we’re ready to poll for a new message, we use the Subscription.poll() method:

int fragmentsRead = subscription.poll(fragmentHandler, 10);

Here, we’ve provided our FragmentHandler instance and the number of message fragments to consider when trying to receive a single message. Note that we’ll receive up to one message at a time, even if many are available in the media driver. However, if no messages are available, this will immediately return, and if the messages received are too large, we might receive only part of them.

5.4. Publications

The other side of our messaging is sending messages. We do this with a Publication, which can send messages to a particular stream on a particular channel.

We can add a new publication with the Aeron.addPublication() method. We then need to wait for it to connect, which requires that a subscription is on the receiving end ready to receive the messages:

ConcurrentPublication publication = aeron.addPublication("aeron:udp?endpoint=localhost:20121", 1001);
while (!publication.isConnected()) {
    TimeUnit.MILLISECONDS.sleep(100);
}

If there’s no connection, it will immediately fail to send the messages rather than waiting for someone to add a subscription.

As before, we need to close this when we’re no longer using it so that the media driver can free up any allocated resources. As always, this is AutoCloseable, so we can use try-with-resources to manage it.

Once we’ve got a connected publication, we can offer it messages. These are always provided as populated buffers, which will then be sent to the connected subscriber:

UnsafeBuffer buffer = new UnsafeBuffer(BufferUtil.allocateDirectAligned(256, 64));
buffer.putStringWithoutLengthUtf8(0, message);
long result = publication.offer(buffer, 0, message.length());

If the message was sent, we’ll be returned a value indicating the number of bytes transmitted, which might be smaller than the number of bytes we expected to send if the buffer was too large. Alternatively, it might return one of a set of error codes to us, all of which are negative numbers and, therefore, easily distinguishable from the success case:

  • Publication.NOT_CONNECTED – The publication wasn’t connected to a subscriber.
  • Publication.BACK_PRESSURED – Back pressure from the subscribers means that we can’t send any more messages right now.
  • Publication.ADMIN_ACTION – Some administrative actions, such as log rotation, caused the send to fail. In this case, it’s typically safe to immediately retry.
  • Publication.CLOSED – The Publication instance has been closed.
  • Publication.MAX_POSITION_EXCEEDED – The buffer within the media driver is full. Typically, we can solve this by closing the Publication and creating a new one instead.

6. Conclusion

We’ve seen a quick overview of Aeron, how to set it up, and how to use it for messaging between applications. This library can do much more, so why not try it out and see?

All of the examples are available over on GitHub.

       

Using Enum in Spring Data JPA Queries

$
0
0

1. Overview

When building our persistence layer with Spring Data JPA, we often work with entities with enum fields. These enum fields represent a fixed set of constants, such as the status of an order, the role of a user, or the stage of an article in a publishing system.

Querying entities based on their enum fields is a common requirement, and Spring Data JPA provides several ways to accomplish this.

In this tutorial, we’ll explore how we can query enum fields declared in our entity classes using standard JPA methods and native queries.

2. Application Setup

2.1. Data Model

First, let’s define our data model, including an enum field. The central entity in our example is the Article class, which declares an enum field ArticleStage to represent the different stages an article can be in:

public enum ArticleStage {
    TODO, IN_PROGRESS, PUBLISHED;
}

The ArticleStage enum holds three possible stages, representing the lifecycle of an article from its initial creation to its final published state.

Next, let’s create the Article entity class with the ArticleStage enum field:

@Entity
@Table(name = "articles")
public class Article {
    @Id
    private UUID id;
    private String title;
    private String author;
    @Enumerated(EnumType.STRING)
    private ArticleStage stage;
    // standard constructors, getters and setters
}

We map our Article entity class to the articles database table. Additionally, we use the @Enumerated annotation to specify that the stage field should be persisted as a string in the database.

2.2. Repository Layer

With our data model defined, we can now create a repository interface that extends JpaRepository to interact with our database:

@Repository
public interface ArticleRepository extends JpaRepository<Article, UUID> {
}

In the upcoming sections, we’ll be adding query methods to this interface to explore different ways of querying our Article entity by its enum field.

3. Standard JPA Query Methods

Spring Data JPA allows us to define derived query methods in our repository interfaces using method names. This approach works perfectly for simple queries.

Let’s examine how this can be used to query the enum field in our entity class.

3.1. Querying by a Single Enum Value

We can find articles by a single ArticleStage enum value by defining a method in our ArticleRepository interface:

List<Article> findByStage(ArticleStage stage);

Spring Data JPA will generate the appropriate SQL query based on the method name.

We can also combine the stage parameter with other fields to create more specific queries. For example, we can declare a method to find an article by its title and stage:

Article findByTitleAndStage(String title, ArticleStage stage);

We’ll use Instancio to generate test Article data and test these queries:

Article article = Instancio.create(Article.class);
articleRepository.save(article);
List<Article> retrievedArticles = articleRepository.findByStage(article.getStage());
assertThat(retrievedArticles).element(0).usingRecursiveComparison().isEqualTo(article);
Article article = Instancio.create(Article.class);
articleRepository.save(article);
Article retrievedArticle = articleRepository.findByTitleAndStage(article.getTitle(), article.getStage());
assertThat(retrievedArticle).usingRecursiveComparison().isEqualTo(article);

3.2. Querying by Multiple Enum Values

We can also find articles by multiple ArticleStage enum values:

List<Article> findByStageIn(List<ArticleStage> stages);

Spring Data JPA will generate an SQL query that uses the IN clause to find articles whose stage matches any of the provided values.

To verify that our declared method works as expected, let’s test it:

List<Article> articles = Instancio.of(Article.class).stream().limit(100).toList();
articleRepository.saveAll(articles);
List<ArticleStage> stagesToQuery = List.of(ArticleStage.TODO, ArticleStage.IN_PROGRESS);
List<Article> retrievedArticles = articleRepository.findByStageIn(stagesToQuery);
assertThat(retrievedArticles)
  .isNotEmpty()
  .extracting(Article::getStage)
  .doesNotContain(ArticleStage.PUBLISHED)
  .hasSameElementsAs(stagesToQuery);

4. Native Queries

In addition to the standard JPA methods we explored in the previous section, Spring Data JPA also supports native SQL queries. Native queries are useful for executing complex SQL queries and allow us to invoke database-specific functions.

Moreover, we can use SpEL (Spring Expression Language) with the @Query annotation to construct dynamic queries based on method parameters.

Let’s see how we can use native queries with the SpEL to query our entity class Article by its ArticleStage enum value.

4.1. Querying by a Single Enum Value

To query article records by a single enum value using a native query, we can define a method in our ArticleRepository interface and annotate it with the @Query annotation:

@Query(nativeQuery = true, value = "SELECT * FROM articles WHERE stage = :#{#stage?.name()}")
List<Article> getByStage(@Param("stage") ArticleStage stage);

We set the nativeQuery attribute to true to indicate that we’re using a native SQL query instead of the default JPQL definition.

We use a SpEL expression :#{#stage?.name()} in the query to refer to the enum value that is passed to the method parameter. The ? operator in the expression is used to handle null input gracefully.

Let’s verify that our native query method works as expected:

Article article = Instancio.create(Article.class);
articleRepository.save(article);
List<Article> retrievedArticles = articleRepository.getByStage(article.getStage());
assertThat(retrievedArticles).element(0).usingRecursiveComparison().isEqualTo(article);

4.2. Querying by Multiple Enum Values

To query article records by multiple enum values using a native query, we can define another method in our ArticleRepository interface:
@Query(nativeQuery = true, value = "SELECT * FROM articles WHERE stage IN (:#{#stages.![name()]})")
List<Article> getByStageIn(@Param("stages") List<ArticleStage> stages);

To achieve this scenario, we use the IN clause in our SQL query to fetch articles whose stage matches any of the provided values.

The SpEL expression #stages.![name()] transforms the list of enum values into a list of strings representing their names.

Let’s see the behavior of this method:

List<Article> articles = Instancio.of(Article.class).stream().limit(100).toList();
articleRepository.saveAll(articles);
List<ArticleStage> stagesToQuery = List.of(ArticleStage.TODO, ArticleStage.IN_PROGRESS);
List<Article> retrievedArticles = articleRepository.findByStageIn(stagesToQuery);
assertThat(retrievedArticles)
  .isNotEmpty()
  .extracting(Article::getStage)
  .doesNotContain(ArticleStage.PUBLISHED)
  .hasSameElementsAs(stagesToQuery);

5. Conclusion

In this article, we explored how to query enum fields in our entity classes using Spring Data JPA. We’ve looked at both standard JPA methods and native queries with SpEL to achieve this.

We’ve learned how to query entities using both single and multiple enum values. The standard JPA methods provide a clean and straightforward way to query enum fields, while native queries offer more control and flexibility to execute complex SQL queries.

As always, all the code examples used in this article are available over on GitHub.

       

Injecting a Mock as a Spring Bean in a Spock Spring Test

$
0
0

1. Introduction

When we test Spring applications using Spock, we sometimes want to change the behavior of a Spring-managed component. In this tutorial, we’ll learn how to inject our own Stub, Mock, or Spy in place of a Spring auto-wired dependency. We’ll use a Spock Stub for most examples, but the same techniques apply when we use a Mock or Spy.

2. Setup

Let’s begin by adding our dependencies and creating a class with a dependency we can replace.

2.1. Dependencies

First, let’s add our Maven compile dependency for Spring Boot 3:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter</artifactId>
    <version>3.3.0</version>
</dependency>

Now, let’s add our Maven test dependencies for spring-boot-starter-test and spock-spring. Since we’re using Spring Boot 3 / Spring 6, we need Spock v2.4-M1 or later to get Spock’s compatible Spring annotations, so let’s use 2.4-M4-groovy-4.0:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-test</artifactId>
    <version>3.3.0</version>
    <scope>test</scope>
</dependency>
<dependency>
    <groupId>org.spockframework</groupId>
    <artifactId>spock-spring</artifactId>
    <version>2.4-M4-groovy-4.0</version>
    <scope>test</scope>
</dependency>

2.2. Our Subject

With our dependencies in place, let’s create an AccountService class with Spring-managed DataProvider dependency that we can use for our tests.

First, let’s create our AccountService:

@Service
public class AccountService {
    private final DataProvider provider;
    public AccountService(DataProvider provider) {
        this.provider = provider;
    }
    public String getData(String param) {
        return "Fetched: " + provider.fetchData(param);
    }
}

Now, let’s create a DataProvider that we’ll substitute later:

@Component
public class DataProvider {
    public String fetchData(final String input) {
        return "data for " + input;
    }
}

3. Basic Test Class

Now, let’s create a basic test class that validates our AccountService using its usual components.

We’ll use Spring’s @ContextConfiguration to bring our AccountService and DataProvider into scope and autowire in our two classes:

@ContextConfiguration(classes = [AccountService, DataProvider])
class AccountServiceTest extends Specification {
    @Autowired
    DataProvider dataProvider
    @Autowired
    @Subject
    AccountService accountService
    def "given a real data provider when we use the real bean then we get our usual response"() {
        when: "we fetch our data"
        def result = accountService.getData("Something")
        then: "our real dataProvider responds"
        result == "Fetched: data for Something"
    }
}

4. Using Spock’s Spring Annotations

Now that we have our basic test let’s explore the options for stubbing our subject’s dependencies.

4.1. @StubBeans Annotation

We often write tests where we want a dependency stubbed out and don’t care about customizing its response. We can use Spock’s @StubBeans annotation to create Stubs for each dependency in our class.

So, let’s create a test Specification annotated with Spock’s @StubBeans annotation to stub our DataProvider class:

@StubBeans(DataProvider)
@ContextConfiguration(classes = [AccountService, DataProvider])
class AccountServiceStubBeansTest extends Specification {
    @Autowired
    @Subject
    AccountService accountService


// ... }

Notice that we don’t need to declare a separate Stub for our DataProvider since the StubBeans annotation creates one for us.

Our generated Stub will return an empty string when our AccountService‘s getData method calls its fetchData method. Let’s create a test to assert that:

def "given a Service with a dependency when we use a @StubBeans annotation then a stub is created and injected to the service"() {
    when: "we fetch our data"
    def result = accountService.getData("Something")
    then: "our StubBeans gave us an empty string response from our DataProvider dependency"
    result == "Fetched: "
}

Our generated DataProvider stub returned an empty String from fetchData, causing our AccountService‘s getData to return “Fetched: ” with nothing appended.

4.2. @SpringBean Annotation

When we need to customize responses, we create a Stub, Mock, or Spy. So let’s use Spock’s SpringBean annotation inside our test, instead of the StubBeans annotation, to replace our DataProvider with a Spock Stub:

@SpringBean
DataProvider mockProvider = Stub()

Note that we can’t declare our SpringBean as a def or Object, we need to declare a specific type like DataProvider.

Now, let’s create an AccountServiceSpringBeanTest Specification with a test that sets up the stub to return “42” when its fetchData method is called:

@ContextConfiguration(classes = [AccountService, DataProvider])
class AccountServiceSpringBeanTest extends Specification {

// ...
def "given a Service with a dependency when we use a @SpringBean annotation then our stub is injected to the service"() { given: "a stubbed response" mockProvider.fetchData(_ as String) >> "42" when: "we fetch our data" def result = accountService.getData("Something") then: "our SpringBean overrode the original dependency" result == "Fetched: 42" } }

The @SpringBean annotation ensures our Stub is injected into the AccountService so that we get our stubbed “42” response. Our @SpringBean-annotated Stub wins even when there’s a real DataProvider in the context.

4.3. @SpringSpy Annotation

Sometimes, we need a Spy to access the real object and modify some of its responses. So let’s use Spock’s SpringSpy annotation to wrap our DataProvider with a Spock Spy:

@SpringSpy
DataProvider mockProvider

First, let’s create a test that verifies our spied object’s fetchData method was invoked and returned the real “data for Something” response:

@ContextConfiguration(classes = [AccountService, DataProvider])
class AccountServiceSpringSpyTest extends Specification {
    @SpringSpy
    DataProvider dataProvider
    @Autowired
    @Subject
    AccountService accountService
    def "given a Service with a dependency when we use @SpringSpy and override a method then the original result is returned"() {
        when: "we fetch our data"
        def result = accountService.getData("Something")
        then: "our SpringSpy was invoked once and allowed the real method to return the result"
        1 * dataProvider.fetchData(_)
        result == "Fetched: data for Something"
    }
}

The @SpringSpy annotation wrapped a Spy around the auto-wired DataProvider and ensured our Spy was injected into the AccountService. Our Spy verified our DataProvider’s fetchData method was invoked without changing its result.

Now let’s add a test where our Spy overrides the result with “spied”:

def "given a Service with a dependency when we use @SpringSpy and override a method then our spy's result is returned"() {
    when: "we fetch our data"
    def result = accountService.getData("Something")
    then: "our SpringSpy was invoked once and overrode the original method"
    1 * dataProvider.fetchData(_) >> "spied"
    result == "Fetched: spied"
}

This time, our injected Spy bean verified our DataProvider‘s fetchData method was invoked and replaced its response with “spied”.

4.4. @SpringBean in a @SpringBoot Test

Now that we’ve seen the SpringBean annotation in our @ContextConfiguration test, let’s create another test class but use @SpringBootTest:

@SpringBootTest
class AccountServiceSpringBootTest extends Specification {
    // ...
}

The test in our new class is identical to the one we created in AccountServiceSpringBeanTest, so we won’t repeat it here!

However, our @SpringBootTest test class won’t run unless it has a SpringBootApplication, so let’s create a TestApplication class:

@SpringBootApplication
class TestApplication {
    static void main(String[] args) {
        SpringApplication.run(TestApplication, args)
    }
}

When we run our test, Spring Boot tries to initialize a DataSource. In our case, since we’re only using spring-boot-starter we don’t have a DataSource, so our test fails to initialize: 

Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.

So, let’s exclude DataSource auto-configuration by adding a spring.autoconfigure.exclude property to our SpringBootTest annotation:

@SpringBootTest(properties = ["spring.autoconfigure.exclude=org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration"])

Now, when we run our test, it runs successfully!

4.4. Context Caching

When using the @SpringBean annotation, we should note its impact on the Spring test framework’s context caching. Usually, when we run multiple tests annotated with @SpringBootTest, our Spring context gets cached rather than created each time. This makes our tests quicker to run.

However, Spock’s @SpringBean attaches a mock to a specific test instance similar to the @MockBean annotation in the spring-boot-test module. This prevents context-caching, which can slow down our overall test execution when we have a lot of tests that use them. 

5. Conclusion

In this tutorial, we learned how to stub our Spring-managed dependencies using Spock’s @StubBeans annotation. Next, we learned how to replace a dependency with a Stub, Mock, or Spy using Spock’s @SpringBean or @SpringSpy annotation. Finally, we noted that excessive use of the @SpringBean annotation can slow down the overall execution of our tests by interfering with the Spring test framework’s context caching. So, we should be judicious in our use of this feature!

As usual, the code for this article is available over on GitHub.

List Private Keys From a Keystore

$
0
0

1. Overview

Managing and securing private keys is a critical aspect of many applications. Java Keystore (JKS) is a popular format for storing cryptographic keys and certificates.

In this tutorial, we’ll explore two methods for listing and exporting private keys from a keystore: one using the command line and another using Java.

2. Using Command Line

First, we use the keytool utility provided by the JDK to list all entries in the keystore, each including private key, certificate, and alias:

keytool -list -keystore mykeystore.jks -storepass mypassword

In the above command, mykeystore.jks is our keystore file name, and mypassword is its password. The output will look something like:

Keystore type: PKCS12
Keystore provider: SUN
Your keystore contains 2 entries
Alias name: privatekey1
Creation date: May 29, 2024
Entry type: PrivateKeyEntry
Certificate chain length: 1
Certificate[1]:
Owner: CN=Example, OU=Development, O=Company, L=City, ST=State, C=Country
...
Alias name: privatekey2
Creation date: May 29, 2024
...

Next, we export the private key we want (-srcalias option) from the JKS file to a PKCS12 (.p12) file:

keytool -importkeystore -srckeystore mykeystore.jks -destkeystore mykeystore.p12 -srcstoretype JKS -deststoretype PKCS12 
  -srcalias privatekey1 -srcstorepass mypassword -deststorepass mypassword

Then we use the openssl command to extract the private key from the PKCS12 keystore:

openssl pkcs12 -in mykeystore.p12 -nocerts -nodes -out privatekey.pem -passin pass:mypassword

This extracts the private key in PEM format. The -nocerts option tells openssl not to output the certificates, and -nodes prevent the private key from being encrypted.

Finally, we convert the PEM private key to PKCS8 format:

openssl pkcs8 -in privatekey.pem -topk8 -nocrypt -out privatekey-pkcs8.pem

The -topk8 option converts the key to PKCS8 format, and the -nocrypt option ensures the key is not encrypted.

The final result will look like:

-----BEGIN PRIVATE KEY-----
MIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQCymad+US28aEBs
hj5nPJyiPotlyafiJSIKwbOu1rHcUYQukDxzRiKgp/j5dzneWhd7BUKDGLUNPL21
...
k7x6oTwzOTJsWsED69ZOC1E=
-----END PRIVATE KEY-----

3. Using Java

We can also use Java to list private keys from a keystore:

try (InputStream is = new FileInputStream("mykeystore.jks")) {
    // Load the keystore
    KeyStore keystore = KeyStore.getInstance(KeyStore.getDefaultType());
    char[] passwordCharArr = "mypassword".toCharArray();
    keystore.load(is, passwordCharArr);
    for (String alias : Collections.list(keystore.aliases())) {
        if (keystore.isKeyEntry(alias)) {
            KeyStore.PrivateKeyEntry pkEntry = (KeyStore.PrivateKeyEntry) keystore.getEntry(
              alias, new KeyStore.PasswordProtection(passwordCharArr));
            PrivateKey privateKey = pkEntry.getPrivateKey();
            System.out.println("Alias: " + alias);
            System.out.println("-----BEGIN PRIVATE KEY-----");
            System.out.println(Base64.getMimeEncoder(64, "\n".getBytes())
              .encodeToString(privateKey.getEncoded()));
            System.out.println("-----END PRIVATE KEY-----");
        }
    }
}

Let’s break down our code steps:

  • Load the keystore from a file.
  • Iterate through all aliases in the keystore.
  • Check if it is a key entry (including private keys).
  • Retrieve and print the private key in PKCS8 format.

Additionally, let’s use try-with-resources so we don’t have to worry about closing the InputStream manually.

4. Conclusion

Listing private keys from a keystore can be done using both command line tools and Java programs. The command-line approach is straightforward, while Java allows more flexible and programmatic access to keystore contents.

The example code from this article can be found over on GitHub.

       
Viewing all 4476 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>