Quantcast
Channel: Baeldung
Viewing all 4468 articles
Browse latest View live

Spring Kafka Trusted Packages Feature

$
0
0

1. Introduction

In this tutorial, we’ll review the Spring Kafka trusted packages feature. We’ll see the motivation behind it, along with its usage. All with practical examples, as always.

2. Prerequisite

In general, the Spring Kafka module allows us, as users, to specify some metadata about the POJO we’re sending. It usually takes the form of the Kafka message headers. For instance, if we’ll configure the ProducerFactory in this way:

@Bean
public ProducerFactory<Object, SomeData> producerFactory() {
    JsonSerializer<SomeData> jsonSerializer = new JsonSerializer<>();
    jsonSerializer.setAddTypeInfo(true);
    return new DefaultKafkaProducerFactory<>(
      producerFactoryConfig(),
      new StringOrBytesSerializer(),
      jsonSerializer
    );
}
@Data
@AllArgsConstructor
static class SomeData {
    private String id;
    private String type;
    private String status;
    private Instant timestamp;
}

Then we’ll produce a new message into a topic, for example, using KafkaTemplate configured with producerFactory above:

public void sendDataIntoKafka() {
    SomeData someData = new SomeData("1", "active", "sent", Instant.now());
    kafkaTemplate.send(new ProducerRecord<>("sourceTopic", null, someData));
}

Then, in this case, we’ll get the following message in the console of the Kafka consumer:

CreateTime:1701021806470 __TypeId__:com.baeldung.example.SomeData null {"id":"1","type":"active","status":"sent","timestamp":1701021806.153965150}

As we can see, the type information of the POJO that is inside the message is in the headers. This is, of course, the Spring Kafka feature recognized by Spring only. Meaning, these headers are just metadata from Kafka or other framework’s points of view. Therefore, we can assume here that both the consumer and the producer use Spring to handle Kafka messaging.

3. Trusted Packages Feature

Having said that, we may say that, in some cases, this is quite a useful feature. When messages in the topic have different payload schema, then hinting at the payload type for the consumer will be great.

However, in general, we know what messages in terms of their schemas can occur in the topic. So, this might be a great idea, to restrict the possible payload schemas consumer will accept. This is what the Spring Kafka trusted packages feature is about.

4. Usages Samples

Trusted packages Spring Kafka feature is configured on the deserializer level. If trusted packages are configured, then Spring will make a lookup into the type headers of the incoming message. Then, it will check that all of the provided types in the message are trusted – both key and value.

It essentially means that Java classes of key and value, specified in the corresponding headers, must reside inside trusted packages. If everything is ok, then Spring passes the message into further deserialization. If the headers are not present, then Spring will just deserialize the object and won’t check the trusted packages:

@Bean
public ConsumerFactory<String, SomeData> someDataConsumerFactory() {
    JsonDeserializer<SomeData> payloadJsonDeserializer = new JsonDeserializer<>();
    payloadJsonDeserializer.addTrustedPackages("com.baeldung.example");
    return new DefaultKafkaConsumerFactory<>(
      consumerConfigs(),
      new StringDeserializer(),
      payloadJsonDeserializer
    );
}

It also may be worth mentioning, that Spring can trust all packages if we substitute the concrete packages with star (*):

JsonDeserializer<SomeData> payloadJsonDeserializer = new JsonDeserializer<>();
payloadJsonDeserializer.trustedPackages("*");

However, in such cases, the usage of trusted packages does not do anything and just incurs additional overhead. Let’s now jump into the motivation behind the feature we just saw.

5.1. First Motivation: Consistency

This feature is great because of two major reasons. First, we can fail fast if something goes wrong in the cluster. Imagine that a particular producer will accidentally publish messages in a topic that he is not supposed to publish. It can cause a lot of problems, especially if we succeed at deserializing the incoming message. In this case, the whole system behavior can be undefined.

So if the producer publishes messages with type information included and the consumer knows what types it trusts, then this all can be avoided. This, of course, assumes that the producer’s message type is different from the one that the consumer expects. But this assumption is pretty fair since this producer shouldn’t publish messages into this topic at all.

5.2. Second Motivation: Security

But what is most important is the security concern. In our previous example, we emphasized that the producer has published messages into the topic unintentionally. But that could be an intentional attack as well. The malicious producer might intentionally publish a message into a particular topic in order to exploit the deserialization vulnerabilities. So by preventing the deserialization of unwanted messages, Spring provides additional security measures to reduce security risks.

What is really important to understand here is that the trusted packages feature is not a solution for the “headers spoofing” attack. In this case, the attacker manipulates the headers of a message to deceive the recipient into believing that the message is legitimate and originated from a trusted source. So by providing the correct type headers, the attacker may deceive Spring, and the latter will proceed with message deserialization. But this problem is quite complex and is not a topic of the discussion. In general, Spring merely provides an additional security measure to minimize the risk of the hacker’s success.

6. Conclusion

In this article, we explored the Spring Kafka trusted packages feature. This feature provides additional consistency and security to our distributed messaging system. Still, it is critical to keep in mind, that trusted packages are still vulnerable to header spoofing attacks. Still, Spring Kafka does a great job at providing additional security measures.

As always, the source code for this article is available over on GitHub.

       

Understanding NewSQL Databases

$
0
0

1. Overview

Databases are one of the ways we store collections of data and there are different types of databases available. Different DBMSs, such as SQL and NoSQL databases, became popular based on the requirements (performance, consistency, etc.) and the type of data (structured, schemaless, etc.) to be stored. We now have something called NewSQL Databases which combine the best of both SQL and NoSQL databases.

In this tutorial, we’ll take a look at SQL and NoSQL databases and then understand what NewSQL databases are all about.

2. SQL and NoSQL Databases

For decades, traditional SQL databases have served as the foundation of data storage and retrieval. SQL Databases provide robust ACID compliance, guaranteeing dependability, consistency, and data integrity. Some of the popular use cases include OLTP and OLAP applications and Data Warehousing applications.

On the other hand, database requirements changed along with the digital landscape. With the rise of the internet came an abundance of data, as multiple sources produced huge amounts of data instantly. Despite their consistency and dependability, traditional SQL databases found it difficult to handle the demands of these fast-moving data streams.

As a result, NoSQL databases emerged as viable alternatives. NoSQL databases put performance, scalability, and flexibility above ACID compliance. They use several data models (document, key-value, column-family, etc.) that enable them to perform well in particular use cases, including networked systems, real-time analytics, and unstructured data storage. Social media applications and document-oriented applications are some of the most common use cases.

While NoSQL databases offered a solution to the scalability problem, they came with trade-offs, most notably relaxed consistency models. In scenarios where strong data consistency and transactional guarantees were essential, NoSQL databases fell short. This led to the need for a new kind of database system which could have the best of both, SQL and NoSQL.

3. NewSQL Database

NewSQL databases try to address the limitations of the existing databases. They’re engineered as a relational database with a distributed and fault-tolerant architecture. They aim to provide solutions by providing a database with the following features:

  • Scalability and ACID compliance: Designed to scale horizontally, the NewSQL databases handle large amounts of data by distributing them across nodes/clusters. Additionally, they maintain strict ACID compliance resulting in a system that is highly available and with strong transactional integrity.
  • Performance optimization: Various techniques, such as in-memory processing, indexing, and caching, are implemented to provide low-latency data access and high performance.
  • Distributed architecture: Multiple nodes are used to replicate data so that there is no single point of failure. Consequently, it ensures high availability and fault tolerance
  • SQL compatibility: NewSQL systems are compatible with SQL query language thus avoiding re-learning and migration overheads

3.1. Use-Cases

NewSQL databases are most suited for applications requiring strong transactional consistency along with high performance and scalability. Let’s take a look at some of the use cases below:

  • Financial systems: A large amount of data needs to be processed with low latency and high accuracy
  • E-commerce platforms: Although the load is relatively stable and expected, periodically there may be a surge of data/load that needs to be supported
  • Real-time feedback systems: Many systems that rely on real-time data analysis like airline pricing, fraud detection, etc. can benefit from a high-performant transactional system.
  • Smart devices and cities: With increased automation across multiple sectors and use cases, the continuous data stream can be processed efficiently using a NewSQL database.

However, there are some use cases where the NewSQL database may not be suitable:

  • Well-established data models: Applications with well-established data models with manageable load and/or scalability capabilities may not be suitable candidates for migration to NewSQL databases, e.g., legacy applications
  • Predictable loads: Applications with predictable loads that may not need to leverage dynamic scalability
  • Strict ACID compliance: SQL databases better serve applications with non-negotiable ACID properties

Let’s take a look at some of the popular NewSQL databases:

  • CockroachDB: An open-source distributed database designed to survive different types of failures while still maintaining ACID compliance. It uses distributed architecture and provides strong failover capabilities along with automatic data replication.
  • NuoDB: It uses a patented “elastically scalable databases” architecture to provide NoSQL benefits while retaining ACID compliance.
  • VoltDB: An in-memory database that uses a shared-nothing architecture designed for high-velocity data ingestion.

3.3. Drawbacks

While NewSQL databases address the limitations of SQL and NoSQL databases, they come with their own set of drawbacks:

  • Use case specific: Different NewSQL databases are suitable for different use cases and there is no single solution for all use cases
  • Complex learning curve: Since NewSQL databases tend to be use case specific, each new implementation may require a non-overlapping learning curve
  • Compatibility issues: NewSQL databases may not be always compatible with existing data models and schemas which may lead to considerable migration efforts

4. Conclusion

In this article, we explored the evolution of data storage and retrieval from traditional SQL to NoSQL databases and then finally to NewSQL databases. We also looked at the different use cases for NewSQL databases as well as some of the popular NewSQL databases.

NewSQL databases bridge the gap between SQL and NoSQL databases by combining transactional consistency with scalability and performance. We saw some of the use cases where it may be beneficial to use any of the enumerated NewSQL databases.

       

Verify That Lambda Expression Was Called Using Mockito

$
0
0

1. Overview

In this tutorial, we’ll look at how we can test that our code calls a lambda function. There are two approaches to achieving this goal to consider. We’ll first check that the lambda is invoked with the correct arguments. Then, we’ll look at testing the behavior instead and checking if the lambda code has executed and produced the expected result.

2. Example Class Under Test

To start, let’s create a class LambdaExample that has an ArrayList we’ll call bricksList:

class LambdaExample {
    ArrayList<String> bricksList = new ArrayList<>();
}

Now, let’s add an inner class called BrickLayer, which will be able to add bricks for us:

class LambdaExample {
    BrickLayer brickLayer = new BrickLayer();
    class BrickLayer {
        void layBricks(String bricks) {
            bricksList.add(bricks);
        }
    }
}

BrickLayer doesn’t do much. It has a single method, layBricks() that will add a brick to our List for us. This could have been an external class, but to keep the concepts together and simple an inner class works here.

Finally, we can add a method to LambdaExample to call layBricks() via a lambda:

void createWall(String bricks) {
    Runnable build = () -> brickLayer.layBricks(bricks);
    build.run();
}

Again, we’ve kept things simple. Our real-world applications are more complex but this streamlined example will help explain the test methods.

In the upcoming sections, we’ll test whether calling createWall() results in the expected execution of layBricks() within our lambda.

3. Testing Correct Invocation

The first testing method we’ll look at is based on confirming that the lambda is called when we expect it. Furthermore, we’ll need to confirm that it received the correct arguments. To start we’ll need to create Mocks of both BrickLayer and LambdaExample:

@Mock
BrickLayer brickLayer;
@InjectMocks
LambdaExample lambdaExample;

We’ve applied the @InjectMocks annotation to LambdaExample so that it uses the mocked BrickLayer object. We’ll be able to confirm the call to the layBricks() method because of this.

We can now write our test:

@Test
void whenCallingALambda_thenTheInvocationCanBeConfirmedWithCorrectArguments() {
    String bricks = "red bricks";
    lambdaExample.createWall(bricks);
    verify(brickLayer).layBricks(bricks);
}

In this test, we’ve defined the String we want to add to bricksList and passed it as an argument to createWall(). Let’s keep in mind that we’re using the Mock we created earlier as the instance of LambdaExample.

We’ve then used Mockitos verify() function. Verify() is hugely helpful for this kind of test. It confirms the function layBricks() was called and that the argument was what we expected.

There’s much more we can do with verify(). For example, confirming how many times a method is called. For our purposes, however, it’s sufficient to confirm that our lambda invoked the method as expected.

4. Testing Correct Behaviour

The second route we can go down for testing is to not worry about what gets called and when. Instead, we’ll confirm that the expected behavior of the lambda function occurs. There will almost always be a good reason we’re calling a function. Perhaps to perform a calculation or to get or set a variable.

In our example, the lambda adds a given String to an ArrayList. In this section, let’s verify that the lambda successfully executes that task:

@Test
void whenCallingALambda_thenCorrectBehaviourIsPerformed() {
    LambdaExample lambdaExample = new LambdaExample();
    String bricks = "red bricks";
        
    lambdaExample.createWall(bricks);
    ArrayList<String> bricksList = lambdaExample.getBricksList();
        
    assertEquals(bricks, bricksList.get(0));
}

Here, we’ve created an instance of the LambdaExample class. Next, we’ve called createWall() to add a brick to the ArrayList.

We should now see that bricksList contains the String we just added. Assuming the code correctly executes the lambda. We confirmed that by retrieving bricksList from lambdaExample and checking the contents.

We can conclude that the lambda is executing as expected, as that’s the only way our String could have ended up in the ArrayList.

5. Conclusion

In this article, we’ve looked at two methods for testing lambda calls. The first is useful when we can mock the class containing the function and inject it into the class which calls it as a lambda. In that case, we can use Mockito to verify the call to the function and the correct arguments. This offers no confidence that the lambda went on to do what we expected, however.

The alternative is to test that the lambda produces the expected results when called. This offers more test coverage and is often preferable if it’s simple to access and confirm the correct behavior of the function call.

As always, the full code for the examples is available over on GitHub.

       

Deserializing JSON to Java Record using Gson

$
0
0

1. Introduction

Thе dеsеrialization procеss involvеs convеrting a JSON rеprеsеntation of an objеct (or data) into an еquivalеnt objеct in a programming languagе, such as a Java objеct. Gson, a popular Java library for JSON sеrialization and dеsеrialization, simplifiеs this process.

In this tutorial, we’ll еxplorе how to dеsеrializе JSON data into Java rеcords using Gson.

2. Creating a Java Record

Bеforе diving into thе codе еxamplеs, we need to еnsurе that we have thе Gson library added to our project. We can add it as a dеpеndеncy in our build tool, such as Mavеn or Gradlе. For Mavеn, we add thе following dеpеndеncy:

<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.8.9</version>
</dependency>

Lеt’s start by dеfining a simple Java Rеcord that wе’ll usе for dеsеrialization. For еxamplе, considеr a Pеrson rеcord with namе, agе, and addrеss fiеlds:

public rеcord Pеrson(String namе, int agе, String addrеss) {
    // No nееd to еxplicitly dеfinе constructors, gеttеrs, or othеr mеthods
}

3. Dеsеrializing JSON to Java Rеcord

Now, lеt’s sее how we can usе Gson to dеsеrializе JSON data into our Pеrson rеcord. Assumе wе havе thе following JSON rеprеsеntation of a pеrson:

{ "name": "John Doe", "age": 30, "address": "123 Main St" }

Let’s usе Gson’s fromJson() mеthod to convеrt this JSON string into a Pеrson rеcord:

@Test
public void givenJsonString_whenDeserialized_thenPersonRecordCreated() {
    String json = "{\"name\":\"John Doe\",\"age\":30,\"address\":\"123 Main St\"}";
    Person person = new Gson().fromJson(json, Person.class);
    assertEquals("John Doe", person.name());
    assertEquals(30, person.age());
    assertEquals("123 Main St", person.address());
}

In this еxamplе, thе fromJson() mеthod takеs thе JSON string and thе class typе (Pеrson.class) to which thе JSON should bе convеrtеd. Subsеquеntly, Gson automatically maps thе JSON fiеlds to thе corrеsponding rеcord componеnts.

4. Handling Nested Objects

What if we have a JSON that includes nеstеd objеcts? Gson can handlе thеm as wеll!

Lеt’s еxtеnd our Pеrson rеcord to includе a Contact rеcord for thе pеrson’s contact information:

public record Contact(String email, String phone) {
    // Constructor, getters, and other methods are automatically generated
}
public record Person(String name, int age, String address, Contact contact) {
    // Constructor, getters, and other methods are automatically generated
}

Now, let’s considеr a JSON rеprеsеntation that includеs contact information:

{ "namе": "John Doе", "agе": 30, "addrеss": "123 Main St", "contact": { "еmail": "john.doе@еxamplе.com", "phonе": "555-1234" } }

Thе dеsеrialization codе rеmains almost thе samе, with Gson handling thе nеstеd objеcts:

@Test
public void givenNestedJsonString_whenDeserialized_thenPersonRecordCreated() {
    String json = "{\"name\":\"John Doe\",\"age\":30,\"address\":\"123 Main St\",\"contact\":{\"email\":\"john.doe@example.com\",\"phone\":\"555-1234\"}}";
    Person person = new Gson().fromJson(json, Person.class);
    assertNotNull(person);
    assertEquals("John Doe", person.name());
    assertEquals(30, person.age());
    assertEquals("123 Main St", person.address());
    Contact contact = person.contact();
    assertNotNull(contact);
    assertEquals("john.doe@example.com", contact.email());
    assertEquals("555-1234", contact.phone());
}

5. Conclusion

In conclusion, thе combination of Gson and Java rеcords provides a concisе and еxprеssivе way to handlе JSON dеsеrialization, еvеn with nеstеd structurеs.

As always, the complete code samples for this article can be found over on GitHub.

       

String vs StringBuffer Comparison in Java

$
0
0

1. Overview

String and StringBuffer are two important classes used while working with strings in Java. In simple words, a string is a sequence of characters. For example, “java”, “spring” and so on.

The main difference between a String and a StringBuffer is that a String is immutable, whereas a StringBuffer is mutable and thread-safe.

In this tutorial, let’s compare String and StringBuffer classes and understand the similarities and differences between the two. 

2. String 

The String class represents character strings. Java implements all string literals, such as “baeldung”, as an instance of this class.

Let’s create a String literal:

String str = "baeldung";

Let’s also create a String object:

Char data[] = {‘b’, ‘a’, ‘e’, ‘l’, ‘d’, ‘u’, ‘n’, ‘g’};
String str = new String(data);

We can also do the following:

String str = new String(“baeldung”);

Strings are constants and immutable, making them shareable.

2.1. String Literal vs. String Object

String literals are immutable strings that are stored inside a special memory space called a string pool inside the heap memory. Java doesn’t allocate a new memory space for string literals having the same value. Instead, it uses the string interning.

In contrast, the JVM allocates separate memory in the heap, outside the string pool, for a newly created String object.

Thus, each string object refers to a different memory address, even though both may have the same value. Note that a String literal is still a String object. However, the reverse is not true.

2.2. String Pool

String literals are stored in a reserved memory area of the Java heap called the String Pool.

2.3. String Interning

String interning is an optimization technique the compiler uses to avoid redundant memory allocation. It avoids allocating memory for a new string literal if a similar value already exists. Instead, it works with the existing copy:

string memory allocation

Common operations on String include concatenation, comparison, and searching. The Java language also provides special support for the string concatenation operator (+) and for the conversion of other objects to strings. It is worth noting that String internally uses StringBuffer and its append method to perform concatenation:

String str = "String"; 
str = str.concat("Buffer");
assertThat(str).isEqualTo("StringBuffer");
assertThat(str.indexOf("Buffer")).isEqualTo(6);

3. StringBuffer

A StringBuffer is a sequence of characters just like a String. However, unlike a String, it’s mutable. We can modify a StringBuffer through method calls such as append() and insert(). The append method adds the character sequence at the end of the StringBuffer, while the insert method inserts a sequence of characters at a specified index. The StringBuffer class has both methods overloaded to handle any object. The object is converted to its string representation before appended or inserted into the StringBuffer:

StringBuffer sBuf = new StringBuffer("String");
sBuf.append("Buffer");
assertThat(sBuf).isEqualToIgnoringCase("StringBuffer");
sBuf.insert(0, "String vs ");
assertThat(sBuf).isEqualToIgnoringCase("String vs StringBuffer");

StringBuffer is thread-safe and can work in a multi-threaded environment. The synchronization ensures the correct order of execution of all statements and avoids data-races situations.

Java 1.5 introduced StringBuilder as a replacement for StringBuffer.

4. Performance Comparison

String and StringBuffer have similar performance. However, string manipulation is faster with StringBuffer than String because String requires the creation of a new object each time, and all changes happen to the new String, leading to more time and memory consumption.

Let’s do a quick micro-benchmark with JMH to compare the concatenation performance of String and StringBuffer:

@BenchmarkMode(Mode.SingleShotTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Measurement(batchSize = 100000, iterations = 10)
@Warmup(batchSize = 100000, iterations = 10)
@State(Scope.Thread)
public class ComparePerformance {
    String strInitial = "springframework";
    String strFinal = "";
    String replacement = "java-";
    @Benchmark
    public String benchmarkStringConcatenation() {
        strFinal = "";
        strFinal += strInitial;
        return strFinal;
    }
    @Benchmark
    public StringBuffer benchmarkStringBufferConcatenation() {
        StringBuffer stringBuffer = new StringBuffer(strFinal);
        stringBuffer.append(strInitial);
        return stringBuffer;
    }
}
Benchmark                                              Mode  Cnt   Score    Error  Units
ComparePerformance.benchmarkStringBufferConcatenation    ss   10  16.047 ± 11.757  ms/op
ComparePerformance.benchmarkStringConcatenation          ss   10   3.492 ±  1.309  ms/op

5. Comparison Table

To summarise the differences:

String StringBuffer
A String is a sequence of characters and is immutable A StringBuffer is like a String but can be modified, i.e., it’s mutable
It can be shared easily due to its immutability It can be shared across synchronized threads only
Modification requires the creation of a new string Modification requires a call to certain methods
Modification is slow Modification is faster
It uses the string pool for storing data It uses heap memory

6. Conclusion

In this article, we compared String and StringBuffer classes. As always, the example code is available over on GitHub.

       

Static Final Variables in Java

$
0
0

1. Overview

Simply put, static final variables, also called constants, are key features in Java to create a class variable that won’t change after initialization. However, in the case of a static final object reference, the state of the object may change.

In this tutorial, we’ll learn how to declare and initialize constant variables. Also, we’ll discuss their usefulness.

2. static final Variables

The static keyword associates a variable to a class itself, not to instances of the class.

Furthermore, the final keyword makes a variable immutable. Its value can’t change after initialization.

The combination of the two keywords helps create a constant. They are mostly named using uppercase and underscores to separate words.

2.1. Initializing static final Variables

Here’s an example of how to declare a static final field and assign a value:

class Bike {
    public static final int TIRE = 2;
}

Here, we create a class named Bike with a constant class variable named TIRE and initialize it to two.

Alternatively, we can initialize the variable via a static initializer block:

public static final int PEDAL;
static {
    PEDAL = 5;
}

This will compile without an error:

@Test
void givenPedalConstantSetByStaticBlock_whenGetPedal_thenReturnFive() {
    assertEquals(5, Bike.PEDAL);
}

Here are some key rules for constant variables:

  • We must initialize upon declaration or in a static initializer block
  • We can’t reassign it after initialization

Attempting to initialize it outside the initialization scope will cause an exception.

Also, we can’t initialize it via the constructor because constructors are invoked when we create an instance of a class. Static variables belong to the class itself and not to individual instances.

2.2. static final Objects

We can also create static final object references:

public static final HashMap<String, Integer> PART = new HashMap<>();

Since the PART reference is constant, it can’t be reassigned:

PART = new HashMap<>();

The code above throws an exception because we assign a new reference to an immutable variable.

However, we can modify the state of the object:

@Test
void givenPartConstantObject_whenObjectStateChanged_thenCorrect() {
    Bike.PART.put("seat", 1);
    assertEquals(1, Bike.PART.get("seat"));
    Bike.PART.put("seat", 5);
    assertEquals(5, Bike.PART.get("seat"));
}

Here, we can change the value of the seat despite setting it to one initially. We mutate the contents of PART despite being a constant reference. Only the reference itself is immutable.

Notably, the final keyword only makes primitive types, String, and other immutable types constant. In the case of an object, it only makes the reference constant, but the state of the object can be altered.

3. Why Constants Are Useful

Using static final variables has several advantages. It provides better performance since its values are inlined at compile time instead of a runtime value lookup.

Moreover, declaring reusable values as constant avoids duplicating literals. Constant can be reused anywhere in the code, depending on the access modifier. A constant with a private access modifier will only be usable within the class.

Additionally, a static final variable of primitive or String type is thread-safe. Its value remains unchanged when shared among multiple threads.

Finally, giving semantic names to constant values increases code readability. Also, it makes code self-documenting. For example, the java.math package provides constants like PI:

@Test
void givenMathClass_whenAccessingPiConstant_thenVerifyPiValueIsCorrect() {
    assertEquals(3.141592653589793, Math.PI);
}

The Math.PI encapsulates the mathematical constant value in a reusable way.

4. Conclusion

In this article, we learned how to declare and initialize a constant variable. Also, we highlighted some of its use cases.

A final static variable defines a class-level constant. However, a static final object may still be mutable, even if the reference can’t change.

As always, the complete source code for the examples is available over on GitHub.

       

Differences Between Entities and DTOs

$
0
0

1. Overview

In the realm of software development, there is a clear distinction between entities and DTOs (Data Transfer Objects). Understanding their precise roles and differences can help us build more efficient and maintainable software.

In this article, we’ll explore the differences between entities and DTOs and try to offer a clear understanding of their purpose, and when to employ them in our software projects. While going through each concept, we’ll sketch a trivial application of user management, using Spring Boot and JPA.

2. Entities

Entities are fundamental components that represent real-world objects or concepts within the domain of our application. They often correspond directly to database tables or domain objects. Therefore, their primary purpose is to encapsulate and manage the state and behavior of these objects.

2.1. Entity Example

Let’s create some entities for our project, representing a user that has multiple books. We’ll start by creating the Book entity:

@Entity
@Table(name = "books")
public class Book {
    @Id
    private String name;
    private String author;
    // standard constructors / getters / setters
}

Now, we need to define our User entity:

@Entity
@Table(name = "users")
public class User {
    @Id
    @GeneratedValue(strategy = GenerationType.AUTO)
    private Long id;
    private String firstName;
    private String lastName;
    private String address;
    @OneToMany(cascade=CascadeType.ALL)
    private List<Book> books;
    
    public String getNameOfMostOwnedBook() {
        Map<String, Long> bookOwnershipCount = books.stream()
          .collect(Collectors.groupingBy(Book::getName, Collectors.counting()));
        return bookOwnershipCount.entrySet().stream()
          .max(Map.Entry.comparingByValue())
          .map(Map.Entry::getKey)
          .orElse(null);
    }
    // standard constructors / getters / setters
}

2.2. Entity Characteristics

In our entities, we can identify some distinctive characteristics. In the first place, entities commonly incorporate Object-Relational Mapping (ORM) annotations. For instance, the @Entity annotation marks the class as an entity, creating a direct link between a Java class and a database table.

The @Table annotation is used to specify the name of the database table associated with the entity. Additionally, the @Id annotation defines a field as the primary key. These ORM annotations simplify the process of database mapping.

Moreover, entities often need to establish relationships with other entities, reflecting associations between real-world concepts. A common example is the @OneToMany annotation we’ve used to define a one-to-many relationship between a user and the books he owns.

Furthermore, entities don’t have to serve solely as passive data objects but can also contain domain-specific business logic. For instance, let’s consider a method such as getNameOfMostOwnedBook(). This method, residing within the entity, encapsulates domain-specific logic to find the name of the book the user owns the most. This approach aligns with OOP principles and the DDD approach by keeping domain-specific operations within entities, fostering code organization and encapsulation.

Additionally, entities may incorporate other particularities, such as validation constraints or lifecycle methods.

3. DTOs

DTOs primarily act as pure data carriers, without having any business logic. They’re used to transmit data between different applications or parts of the same application.

In simple applications, it’s common to use the domain objects directly as DTOs. However, as applications grow in complexity, exposing the entire domain model to external clients may become less desirable from a security and encapsulation perspective.

3.1. DTO Example

To keep our application as simple as possible, we will implement only the functionalities of creating a new user and retrieving the current users. To do so, let’s start by creating a DTO to represent a book:

public class BookDto {
    @JsonProperty("NAME")
    private final String name;
    @JsonProperty("AUTHOR")
    private final String author;
    // standard constructors / getters
}

For the user, let’s define two DTOs. One is designed for the creation of a user, while the second one is tailored for response purposes:

public class UserCreationDto {
    @JsonProperty("FIRST_NAME")
    private final String firstName;
    @JsonProperty("LAST_NAME")
    private final String lastName;
    @JsonProperty("ADDRESS")
    private final String address;
    @JsonProperty("BOOKS")
    private final List<BookDto> books;
    // standard constructors / getters
}
public class UserResponseDto {
    @JsonProperty("ID")
    private final Long id;
    @JsonProperty("FIRST_NAME")
    private final String firstName;
    @JsonProperty("LAST_NAME")
    private final String lastName;
    @JsonProperty("BOOKS")
    private final List<BookDto> books;
    // standard constructors / getters
}

3.2. DTO Characteristics

Based on our examples, we can identify a few particularities: immutability, validation annotations, and JSON mapping annotations.

Making DTOs immutable is a best practice. Immutability ensures that the data being transported is not accidentally altered during its journey. One way to achieve this is by declaring all properties as final and not implementing setters. Alternatively, the @Value annotation from Lombok or Java records, introduced in Java 14, offers a concise way to create immutable DTOs.

Moving on, DTOs can also benefit from validation, to ensure that the data transferred via the DTOs meets specific criteria. This way, we can detect and reject invalid data early in the data transfer process, preventing the pollution of the domain with unreliable information.

Moreover, we may usually find JSON mapping annotations in DTOs, to map JSON properties to the fields of our DTOs. For example, the @JsonProperty annotation allows us to specify the JSON names of our DTOs.

4. Repository, Mapper, and Controller

To demonstrate the utility of having both entities and DTOs represent data within our application, we need to complete our code. We’ll start by creating a repository for our User entity:

@Repository
public interface UserRepository extends JpaRepository<User, Long> {
}

Next, we’ll proceed with creating a mapper to be able to convert from one to another:

public class UserMapper {
    public static UserResponseDto toDto(User entity) {
        return new UserResponseDto(
          entity.getId(),
          entity.getFirstName(),
          entity.getLastName(),
          entity.getBooks().stream().map(UserMapper::toDto).collect(Collectors.toList())
        );
    }
    public static User toEntity(UserCreationDto dto) {
        return new User(
          dto.getFirstName(),
          dto.getLastName(),
          dto.getAddress(),
          dto.getBooks().stream().map(UserMapper::toEntity).collect(Collectors.toList())
        );
    }
    public static BookDto toDto(Book entity) {
        return new BookDto(entity.getName(), entity.getAuthor());
    }
    public static Book toEntity(BookDto dto) {
        return new Book(dto.getName(), dto.getAuthor());
    }
}

In our example, we’ve done the mapping manually between entities and DTOs. For more complex models, to avoid boilerplate code, we could’ve used tools like MapStruct.

Now, we only need to create the controller:

@RestController
@RequestMapping("/users")
public class UserController {
    private final UserRepository userRepository;
    public UserController(UserRepository userRepository) {
        this.userRepository = userRepository;
    }
    @GetMapping
    public List<UserResponseDto> getUsers() {
        return userRepository.findAll().stream().map(UserMapper::toDto).collect(Collectors.toList());
    }
    @PostMapping
    public UserResponseDto createUser(@RequestBody UserCreationDto userCreationDto) {
        return UserMapper.toDto(userRepository.save(UserMapper.toEntity(userCreationDto)));
    }
}

5. Why Do We Need Both Entities and DTOs?

5.1. Separation of Concerns

In our example, the entities are closely tied to the database schema and domain-specific operations. On the other hand, DTOs are designed only for data transfer purposes.

In some architectural paradigms, such as hexagonal architecture, we may find an additional layer, commonly referred to as the Model or Domain Model. This layer serves the crucial purpose of totally decoupling the domain from any intrusive technology. This way, the core business logic remains independent of the implementation details of databases, frameworks, or external systems.

5.2. Hiding Sensitive Data

When dealing with external clients or systems, controlling what data is exposed to the outside world is essential. Entities may contain sensitive information or business logic that should remain hidden from external consumers. DTOs act as a barrier that helps us expose only safe and relevant data to the clients.

5.3. Performance

The DTO pattern, as introduced by Martin Fowler, involves batching up multiple parameters in a single call. Instead of making multiple calls to fetch individual pieces of data, we can bundle related data into a DTO and transmit it in a single request. This approach reduces the overhead associated with multiple network calls.

One way of implementing the DTO pattern is through GraphQL, which allows the client to specify the data it desires, allowing multiple queries in a single request.

6. Conclusion

As we’ve learned throughout this article, entities and DTOs have different roles and can be very distinct. The combination of both entities and DTOs ensures data security, separation of concerns, and efficient data management in complex software systems. This approach leads to more robust and maintainable software solutions.

As always, the source code is available over on GitHub.

       

Handling NullPointerException in findFirst() When the First Element Is Null

$
0
0

1. Overview

In this short tutorial, we’ll explore different ways of avoiding NullPointerException when working with the findFirst() method.

First, we’ll explain what causes the method to fail with NullPointerException. Then, we’ll demonstrate how to reproduce and fix the exception using practical examples.

2. Explaining the Problem

In short, a NullPointerException is thrown to signal that we’re doing some operation using null where an object is required.

Typically, we use findFirst() to return an Optional instance holding the first element of a given stream. However, according to the documentation, the method throws NullPointerException if the first returned element is null.

So, the main question here is how to avoid the NullPointerException exception when the first element of our stream is null. Before diving deep and answering our question, let’s reproduce the exception.

3. Reproducing the NullPointerException

For instance, let’s assume we have a list of String objects:

List<String> inputs = Arrays.asList(null, "foo", "bar");

Now, let’s try to get the first element of our list using the findFirst() method:

@Test(expected = NullPointerException.class)
public void givenStream_whenCallingFindFirst_thenThrowNullPointerException() {
    Optional<String> firstElement = inputs.stream()
      .findFirst();
}

As we can see, the test case fails with NullPointerException because the first selected element of our list is null.

The Optional API states that it’s the caller’s responsibility to ensure that the value is not null because it doesn’t provide any way to distinguish between “the value is present but set to null” and “the value is not present”. This is why the documentation prohibits the scenario where null is returned when using findFirst().

4. Avoiding the Exception

The easiest way to avoid NullPointerException in this case is to filter the stream before calling the findFirst() method.

So, let’s see how we can do this in practice:

@Test
public void givenStream_whenUsingFilterBeforeFindFirst_thenCorrect() {
    Optional<String> firstNotNullElement = inputs.stream()
      .filter(Objects::nonNull)
      .findFirst();
    assertTrue(firstNotNullElement.isPresent());
}

Here, we used the Objects#nonNull method to filter only objects that are not null. That way, we ensure the selected first element is not null. As a result, we avoid NullPointerException.

Another option would be to use the Optional#ofNullable method before calling the findFirst() method.

This method returns an Optional instance with the specified value if it’s not null. Otherwise, it returns an empty Optional.

So, let’s see it in action:

@Test
public void givenStream_whenUsingOfNullableBeforeFindFirst_thenCorrect() {
    Optional<String> firstElement = inputs.stream()
      .map(Optional::ofNullable)
      .findFirst()
      .flatMap(Function.identity());
    assertTrue(firstElement.isEmpty());
}

As shown above, we map each element into an Optional object that accepts null with the help of the ofNullable() method. Then, we get the first mapped element using findFirst().

The returned element denotes an Optional of an Optional since findFirst() returns an Optional. This is why we used flapMap() to flatten the nested Optional.

Please note that Function#identity always returns its input argument. In our case, the method returns null because it’s the first element of our list.

5. Conclusion

In this short article, we explained how to avoid NullPointerException when working with the findFirst() method.

Along the way, we showcased how to reproduce and solve the exception using practical examples.

As always, the full source code of the examples is available over on GitHub.

       

Skip Bytes in InputStream in Java

$
0
0

1. Introduction

In Java programming, InputStrеam is a fundamеntal class for rеading bytеs from a sourcе. Howеvеr, thеrе arе scеnarios whеrе it bеcomеs nеcеssary to skip a cеrtain numbеr of bytеs within an InputStrеam.

In this tutorial, wе’ll dеlvе into thе skip() mеthod, еxploring how it can bе еffеctivеly еmployеd to skip bytеs within a Java InputStrеam.

2. An Ovеrviеw

InputStrеam is an abstract class that sеrvеs as thе supеrclass for all classеs rеprеsеnting an input strеam of bytеs. Moreover, it providеs mеthods for rеading bytеs from a strеam, making it a fundamеntal componеnt for input opеrations.

In the same context, there arе various situations whеrе skipping bytеs bеcomеs nеcеssary. Onе common scеnario is whеn dеaling with filе hеadеrs or mеtadata that arе not rеquirеd for a specific opеration. Hence, skipping unnеcеssary bytеs can improvе pеrformancе and rеducе thе amount of data that nееds to bе procеssеd.

3. Skipping Bytеs Using the skip() Mеthod

Thе InputStrеam class in Java providеs a built-in mеthod callеd skip(long n) for skipping a spеcifiеd numbеr of bytеs. Thе n paramеtеr dеnotеs thе numbеr of bytеs to bе skippеd.

Let’s take the following example:

@Test
void givenInputStreamWithBytes_whenSkipBytes_thenRemainingBytes() throws IOException {
    byte[] inputData = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    InputStream inputStream = new ByteArrayInputStream(inputData);
    long bytesToSkip = 3;
    long skippedBytes = inputStream.skip(bytesToSkip);
    assertArrayEquals(new byte[]{4, 5, 6, 7, 8, 9, 10}, readRemainingBytes(inputStream));
    assert skippedBytes == bytesToSkip : "Incorrect number of bytes skipped";
}

Thе tеst bеgins by sеtting up an array of bytеs, ranging from 1 to 10, and crеating an InputStrеam using a BytеArrayInputStrеam initializеd with thе bytе array. Subsеquеntly, thе codе spеcifiеs thе numbеr of bytеs to skip (in this casе, 3) and invokеs thе skip mеthod on thе InputStrеam.

Thе tеst thеn еmploys assеrtions to validatе that thе rеmaining bytеs in thе input strеam match thе еxpеctеd array {4, 5, 6, 7, 8, 9, 10} using thе rеadRеmainingBytеs() mеthod:

byte[] readRemainingBytes(InputStream inputStream) throws IOException {
    byte[] buffer = new byte[inputStream.available()];
    int bytesRead = inputStream.read(buffer);
    if (bytesRead == -1) {
        throw new IOException("End of stream reached");
    }
    return buffer;
}

This mеthod rеads thе rеmaining bytеs into a buffеr and еnsurеs that thе еnd of thе strеam hasn’t bееn rеachеd.

4. Conclusion

In conclusion, еfficiеnt bytе strеam managеmеnt is crucial in Java, and thе InputStrеam class, particularly thе skip() mеthod, providеs a valuablе tool for skipping bytеs whеn handling input opеrations, еnhancing pеrformancе and rеducing unnеcеssary data procеssing.

As always, the complete code samples for this article can be found over on GitHub.

       

Unreachable Statements in Java

$
0
0

1. Overview

In this tutorial, we’ll talk about the Java specification that states that the compiler should raise an error if any statement is unreachable. An unreachable statement is a code that can never be executed during the program execution because there is no way for the program flow to reach it. We’ll see various code examples that correspond to this definition.

2. Code After break Instruction in a Loop

In a loop, if we put instructions after a break statement, they’re not reachable:

public class UnreachableStatement {
    
    public static void main(String[] args) {
        for (int i=0; i<10; i++) {
            break;
            int j = 0;
        }
    }
}

Let’s try to compile our code with javac:

$ javac UnreachableStatement.java
UnreachableStatement.java:9: error: unreachable statement
            int j = 0;
                ^
1 error

As expected, the compilation failed because the int j = 0; statement isn’t reachable. Similarly, an instruction after the continue keyword in a loop isn’t reachable:

public static void main(String[] args) {
    int i = 0;
    while (i<5) {
        i++;
        continue;
        int j = 0;
    }
}

3. Code After a while(true)

A while(true) instruction means the code within runs forever. Thus, any code after that isn’t reachable:

public static void main(String[] args) {
    while (true) {}
    int j = 0;
}

Once again, the statement int j = 0; isn’t reachable in the previous code. This remark is also valid for the equivalent code using the do-while structure:

public static void main(String[] args) {
    do {} while (true);
    int j = 0;
}

On the other hand, any code inside a while(false) loop isn’t reachable:

public static void main(String[] args) {
    while (false) {
        int j = 0;
    }
}

4. Code After Method Returns

A method immediately exits on a return statement. Hence, any code after this instruction isn’t reachable:

public static void main(String[] args) {
    return;
    int i = 0;
}

Once more, the int j = 0; line isn’t reachable, provoking a compiler error. Similarly, when a throw statement isn’t enclosed within a try-catch block or specified in the throws clause, the method completes exceptionally. Thus, any code after this line isn’t reachable:

public static void main(String[] args) throws Exception {
    throw new Exception();
    int i = 0;
}

To recap, if all code branches return, the following code isn’t reachable by any means:

public static void main(String[] args) throws Exception {
    int i = new Random().nextInt(0, 10);
    if (i > 5) {
        return;
    } else {
        throw new Exception();
    }
    int j = 0;
}

In this code, we chose a random number between 0 (inclusive) and 10 (exclusive). If this number is greater than 5, we return immediately, and if not, we throw a generic Exception. Thus, there is no possible execution path for the code after the if-else block.

5. Dead but Reachable Code

Lastly, let’s notice that even obvious dead code isn’t mandatorily unreachable from the compiler’s perspective. In particular, it doesn’t evaluate the conditions inside an if statement:

public static void main(String[] args) {
    if (false) {
        return;
    }
}

This code compiles successfully even if we know at first glance that the code inside the if block is dead code.

6. Conclusion

In this article, we looked at many unreachable statements. There is an argument in the developer community about whether unreachable code should raise a warning or an error. The Java language follows the principle that every written code should have a purpose, thus raising an error. In other languages like C++, as the compiler can execute the code in spite of its incoherencies, it only raises a warning.

       

Create a Mutable String in Java

$
0
0

1. Introduction

In this tutorial, we’ll discuss a few ways to create a mutable String in Java.

2. Immutability of Strings

Unlike other programming languages like C or C++, Strings are immutable in Java.

This immutable nature of Strings also means that any modifications to a String create a new String in memory with the modified content and return the updated reference. Java provides library classes such as StringBuffer and StringBuilder to work with mutable text data efficiently.

3. Mutable String Using Reflection

We can attempt to create a mutable String in Java by using the Reflection framework. The Reflection framework in Java allows us to inspect and modify the structure of objects, methods, and their attributes at runtime. While it is a very powerful tool, it should be used with caution as it can leave bugs in the program without warnings.

We can employ some of the framework’s methods to update the value of Strings, thereby creating a mutable object. Let’s start by creating two Strings, one as a String literal and another with the new keyword:

String myString = "Hello World";
String otherString = new String("Hello World");

Now, we use Reflection’s getDeclaredField() method on the String class to obtain a Field instance and make it accessible for us to override the value:

Field f = String.class.getDeclaredField("value");
f.setAccessible(true);
f.set(myString, "Hi World".toCharArray());

When we set the value of our first string to something else and try printing the second string, the mutated value appears:

System.out.println(otherString);
Hi World

Therefore, we mutated a String, and any String objects referring to this literal get the updated value of “Hi World” in them. This can introduce bugs in the system and cause a lot of breakage. Java programs run with the underlying assumption that Strings are immutable. Any deviation from that may be catastrophic in nature.

It is also important to note that the above example is extremely dated and won’t work with newer Java releases.

4. Charsets and Strings

4.1. Introduction to Charsets

The solution discussed above has a lot of disadvantages and is inconvenient. A different way of mutating a string can be by implementing a custom CharSet for our program.

Computers understand man-made characters only by their numeric codes. A Charset is a dictionary that maintains the mapping of characters against their binary counterpart. For example, ASCII has a character set of 128 characters. A standardized character encoding format, along with a defined Charset, ensures that text is properly interpreted in digital systems worldwide.

Java provides extensive support for encodings and conversions. This includes US-ASCII, ISO-8859-1, UTF-8, and UTF-16, to name a few.

4.2. Using a Charset

Let’s see an example of how we can use Charsets to encode and decode Strings. We’ll take a non-ASCII String and then encode it using UTF-8 charset. Conversely, we’ll then decode the string to the original input using the same charset.

Let’s start with the input String:

String inputString = "Hello, दुनिया";

We obtain a charset for UTF-8 using the Charset.forName() method of java.nio.charset.Charset and also get an encoder:

Charset charset = Charset.forName("UTF-8");
CharsetEncoder encoder = charset.newEncoder();

The encoder object has an encode() method, which expects a CharBuffer object, a ByteBuffer object, and an endOfInput flag.

The CharBuffer object is a buffer for holding Character data and can be obtained as follows:

CharBuffer charBuffer = CharBuffer.wrap(inputString);
ByteBuffer byteBuffer = ByteBuffer.allocate(64);

We also create a ByteBuffer object of size 64 and then pass these to the encode() method to encode the input String:

encoder.encode(charBuffer, byteBuffer, true);

The byteBuffer object is now storing the encoded characters. We can decode the contents of the byteBuffer object to reveal the original String again:

private static String decodeString(ByteBuffer byteBuffer) {
    Charset charset = Charset.forName("UTF-8");
    CharsetDecoder decoder = charset.newDecoder();
    CharBuffer decodedCharBuffer = CharBuffer.allocate(50);
    decoder.decode(byteBuffer, decodedCharBuffer, true);
    decodedCharBuffer.flip();
    return decodedCharBuffer.toString();
}

The following test verifies that we are able to decode the String back to its original value:

String inputString = "hello दुनिया";
String result = ch.decodeString(ch.encodeString(inputString));
Assertions.assertEquals(inputString, result);

4.3. Creating a Custom Charset

We can also create our custom Charset class definition for our programs. To do this, we must provide concrete implementations of the following methods:

  • newDecoder() – this should return a CharsetDecoder instance
  • newEncoder() – this should return a CharsetEncoder instance

We start with an inline Charset definition by creating a new instance of Charset as follows:

private final Charset myCharset = new Charset("mycharset", null) {
    // implement methods
}

We have already seen that Charsets extensively use CharBuffer objects in characters’ encoding and decoding lifecycle. In our custom charset definition, we create a shared CharBuffer object to use throughout the program:

private final AtomicReference<CharBuffer> cbRef = new AtomicReference<>();

Let’s now write our simple inline implementations of the newEncoder() and newDecoder() methods to complete our Charset definition. We’ll also inject the shared CharBuffer object cbRef in the methods:

@Override
public CharsetDecoder newDecoder() {
    return new CharsetDecoder(this, 1.0f, 1.0f) {
        @Override
        protected CoderResult decodeLoop(ByteBuffer in, CharBuffer out) {
            cbRef.set(out);
            while (in.remaining() > 0) {
                out.append((char) in.get());
            }
            return CoderResult.UNDERFLOW;
        }
    };
}
@Override
public CharsetEncoder newEncoder() {
    CharsetEncoder cd = new CharsetEncoder(this, 1.0f, 1.0f) {
        @Override
        protected CoderResult encodeLoop(CharBuffer in, ByteBuffer out) {
            while (in.hasRemaining()) {
                if (!out.hasRemaining()) {
                    return CoderResult.OVERFLOW;
                }
                char currentChar = in.get();
                if (currentChar > 127) {
                    return CoderResult.unmappableForLength(1);
                }
                out.put((byte) currentChar);
            }
            return CoderResult.UNDERFLOW;
        }
    };
    return cd;
}

4.4. Mutating a String With Custom Charset

We have now completed our Charset definition, and we can use this charset in our program. Let’s notice that we have a shared CharBuffer instance, which is updated with the output CharBuffer in the decoding process. This is an essential step towards mutating the string.

String class in Java provides multiple constructors to create and initialize a String, and one of them takes in a bytes array and a Charset:

public String(byte[] bytes, Charset charset) {
    this(bytes, 0, bytes.length, charset);
}

We use this constructor to create a String, and we pass our custom charset object myCharset to it:

public String createModifiableString(String s) {
    return new String(s.getBytes(), charset);
}

Now that we have our String let’s try to mutate it by leveraging the CharBuffer we have:

public void modifyString() {
    CharBuffer cb = cbRef.get();
    cb.position(0);
    cb.put("something");
}

Here, we update the CharBuffer’s contents to a different value at the 0th position. As this character buffer is shared, and the charset maintains a reference to it in the decodeLoop() method of the decoder, the underlying char[] is also changed. We can verify this by adding a test:

String s = createModifiableString("Hello");
Assert.assertEquals("Hello", s);
modifyString();
Assert.assertEquals("something", s);

5. Final Thoughts on String Mutation

We have seen a few ways to mutate a String. String mutation is controversial in the Java world mainly because almost all programs in Java assume the non-mutating nature of Strings.

However, we need to work with changing Strings a lot of times, which is why Java provides us with the StringBuffer and StringBuilder classes. These classes work with mutable sequences of Characters and are hence easily modifiable. Using these classes is the best and most efficient way of working with mutable character sequences.

6. Conclusion

In this article, we looked into mutable Strings and ways of mutating a String. We also understood the disadvantages and difficulties in having a straightforward algorithm for mutating a String.

As usual, the code for this article is available over on GitHub.

       

Convert a Hex String to an Integer in Java

$
0
0

1. Introduction

Converting a hexadecimal (Hex) string into an integer is a frequent task during programming, particularly when handling data types that use hexadecimal notations.

In this tutorial, we’ll dive into various approaches to converting a Hex String into an int in Java.

2. Understanding Hexadecimal Representation

Hexadecimal employs base-16, resulting in each digit that could take on 16 possible values from zero through nine, followed by (A) through (F):

Let’s also note that in most cases, hexadecimal strings begin with “0x” to denote its base.

3. Using Integer.parseInt()

The easiest way of converting a hex string to an integer in Java is via the Integer.parseInt() method. It converts a string into an integer, assuming the base in which it was written. For us, the base is 16:

@Test
public void givenValidHexString_whenUsingParseInt_thenExpectCorrectDecimalValue() {
    String hexString = "0x00FF00";
    int expectedDecimalValue = 65280;
    int decimalValue = Integer.parseInt(hexString.substring(<span class="hljs-number">2</span>), 16);
    assertEquals(expectedDecimalValue, decimalValue);
}

In the above code, the hexadecimal string “0x00FF00” is converted to its corresponding decimal value of 65280 using Integer.parseInt, and the test asserts that the result matches the expected decimal value. Note that we use the substring(2) method to remove the “ox” part from the hexString.

4. Using BigInteger

For more flexibility when working with very large or unsigned hexadecimal values, we can consider using a BigInteger. It operates on arbitrary precision integers and can, therefore, be used in myriad contexts.

Here’s how we can convert a hex string to a BigInteger and then extract the integer value:

@Test
public void givenValidHexString_whenUsingBigInteger_thenExpectCorrectDecimalValue() {
    String hexString = "0x00FF00";
    int expectedDecimalValue = 65280;
    BigInteger bigIntegerValue = new BigInteger(hexString.substring(2), 16);
    int decimalValue = bigIntegerValue.intValue();
    assertEquals(expectedDecimalValue, decimalValue);
}

5. Using Integer.decode()

Another way for changing a Hex string into an integer is provided by the Integer.decode() method. This approach deals with hexadecimal as well as decimal strings.

Here, we use Integer.decode() without stating the base as it is determined from a string itself:

@Test
public void givenValidHexString_whenUsingIntegerDecode_thenExpectCorrectDecimalValue() {
    String hexString = "0x00FF00";
    int expectedDecimalValue = 65280;
    int decimalValue = Integer.decode(hexString);
    assertEquals(expectedDecimalValue, decimalValue);
}

Because the Integer.decode() method can handle the “0x” prefix in the string, we don’t need to manually remove it using substring(2) as we did in the previous approaches.

6. Conclusion

In conclusion, we discussed the significance of hexadecimal representation and delved into three distinct approaches: Integer.parseInt() for a straightforward conversion, BigInteger for handling large or unsigned values, and Integer.decode() for versatility in handling both hexadecimal and decimal strings, including the “0x” prefix.

As always, the complete code samples for this article can be found over on GitHub.

       

Splitting Streams in Kafka

$
0
0

1. Introduction

In this tutorial, we’ll explore how to dynamically route messages in Kafka Streams. Dynamic routing is particularly useful when the destination topic for a message depends on its content, enabling us to direct messages based on specific conditions or attributes within the payload. This kind of conditional routing finds real-world applications in various domains like IoT event handling, user activity tracking, and fraud detection.

We’ll walk through the problem of consuming messages from a single Kafka topic and conditionally routing them to multiple destination topics. The primary focus will be on how to set this up in a Spring Boot application using the Kafka Streams library.

2. Kafka Streams Routing Techniques

Dynamic routing of messages in Kafka Streams isn’t confined to a single approach but rather can be achieved using multiple techniques. Each has its distinct advantages, challenges, and suitability for various scenarios:

  • KStream Conditional Branching: The KStream.split().branch() method is the conventional means to segregate a stream based on predicates. While this method is easy to implement, it has limitations when it comes to scaling the number of conditions and can become less manageable.
  • Branching with KafkaStreamBrancher: This feature appeared in Spring Kafka version 2.2.4. It offers a more elegant and readable way to create branches in a Kafka Stream, eliminating the need for ‘magic numbers’ and allowing more fluid chaining of stream operations.
  • Dynamic Routing with TopicNameExtractor: Another method for topic routing is to use a TopicNameExtractor. This allows for a more dynamic topic selection at runtime based on the message key, value, or even the entire record context. However, it requires topics to be created in advance. This method affords more granular control over topic selection and is more adaptive to complex use cases.
  • Custom Processors: For scenarios requiring complex routing logic or multiple chained operations, we can apply custom processor nodes in the Kafka Streams topology. This approach is the most flexible but also the most complex to implement.

Throughout this article, we’ll focus on implementing the first three approaches—KStream Conditional Branching, Branching with KafkaStreamBrancher, and Dynamic Routing with TopicNameExtractor.

3. Setting Up Environment

In our scenario, we have a network of IoT sensors streaming various types of data, such as temperature, humidity, and motion to a centralized Kafka topic named iot_sensor_data. Each incoming message contains a JSON object with a field named sensorType that indicates the type of data the sensor is sending. Our aim is to dynamically route these messages to dedicated topics for each type of sensor data.

First, let’s establish a running Kafka instance. We can set up Kafka, Zookeeper, and Kafka UI using Docker, along with Docker Compose, by creating a docker-compose.yml file:

version: '3.8'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    ports:
      - 22181:2181
  kafka:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - zookeeper
    ports:
      - 9092:9092
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: "INTERNAL://:29092,EXTERNAL://:9092"
      KAFKA_ADVERTISED_LISTENERS: "INTERNAL://kafka:29092,EXTERNAL://localhost:9092"
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: "INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT"
      KAFKA_INTER_BROKER_LISTENER_NAME: "INTERNAL"
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
  kafka_ui:
    image: provectuslabs/kafka-ui:latest
    depends_on:
      - kafka
    ports:
      - 8082:8080
    environment:
      KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181
      KAFKA_CLUSTERS_0_NAME: local
      KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:29092
  kafka-init-topics:
    image: confluentinc/cp-kafka:latest
    depends_on:
      - kafka
    command: "bash -c 'echo Waiting for Kafka to be ready... && \
               cub kafka-ready -b kafka:29092 1 30 && \
               kafka-topics --create --topic iot_sensor_data --partitions 1 --replication-factor 1 --if-not-exists --bootstrap-server kafka:29092'"

Here we set all required environmental variables and dependencies between services. Furthermore, we are creating the iot_sensor_data topic by using specific commands in the kafka-init-topics service.

Now we can run Kafka inside Docker by executing docker-compose up -d.

Next, we have to add the Kafka Streams dependencies to the pom.xml file:

<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-streams</artifactId>
    <version>3.6.0</version>`
</dependency>
<dependency>
    <groupId>org.springframework.kafka</groupId>
    <artifactId>spring-kafka</artifactId>
    <version>3.0.12</version>
</dependency>

The first dependency is the org.apache.kafka.kafka-streams package, which provides Kafka Streams functionality. The subsequent Maven package, org.springframework.kafka.spring-kafka, facilitates the configuration and integration of Kafka with Spring Boot.

Another essential aspect is configuring the address of the Kafka broker. This is generally done by specifying the broker details in the application’s properties file. Let’s add this configuration along with other properties to our application.properties file:

spring.kafka.bootstrap-servers=localhost:9092
spring.kafka.streams.application-id=baeldung-streams
spring.kafka.consumer.group-id=baeldung-group
spring.kafka.streams.properties[default.key.serde]=org.apache.kafka.common.serialization.Serdes$StringSerde
kafka.topics.iot=iot_sensor_data

Next, let’s define a sample data class IotSensorData:

public class IotSensorData {
    private String sensorType;
    private String value;
    private String sensorId;
}

Lastly, we need to configure Serde for the serialization and deserialization of typed messages in Kafka:

@Bean
public Serde<IotSensorData> iotSerde() {
    return Serdes.serdeFrom(new JsonSerializer<>(), new JsonDeserializer<>(IotSensorData.class));
}

4. Implementing Dynamic Routing in Kafka Streams

After setting up the environment and installing the required dependencies, let’s focus on implementing dynamic routing logic in Kafka Streams.

Dynamic message routing can be an essential part of an event-driven application, as it enables the system to adapt to various types of data flows and conditions without requiring code changes.

4.1. KStream Conditional Branching

Branching in Kafka Streams allows us to take a single stream of data and split it into multiple streams based on some conditions. These conditions are provided as predicates that evaluate each message as it passes through the stream.

In recent versions of Kafka Streams, the branch() method has been deprecated in favor of the newer split().branch() method, which is designed to improve the API’s overall usability and flexibility. Nevertheless, we can apply it in the same way to split a KStream into multiple streams based on certain predicates.

Here we define the configuration that utilizes the split().branch() method for dynamic topic routing:

@Bean
public KStream<String, IotSensorData> iotStream(StreamsBuilder streamsBuilder) {
   KStream<String, IotSensorData> stream = streamsBuilder.stream(iotTopicName, Consumed.with(Serdes.String(), iotSerde()));
   stream.split()
     .branch((key, value) -> "temp".equals(value.getSensorType()), Branched.withConsumer((ks) -> ks.to(iotTopicName + "_temp")))
     .branch((key, value) -> "move".equals(value.getSensorType()), Branched.withConsumer((ks) -> ks.to(iotTopicName + "_move")))
     .branch((key, value) -> "hum".equals(value.getSensorType()), Branched.withConsumer((ks) -> ks.to(iotTopicName + "_hum")))
     .noDefaultBranch();
   return stream;
}

In the example above, we split the initial stream from the iot_sensor_data topic into multiple streams based on the sensorType property and route them to other topics accordingly.

If a target topic name can be generated based on the message content, we can use a lambda function within the to method for more dynamic topic routing:

@Bean
public KStream<String, IotSensorData> iotStreamDynamic(StreamsBuilder streamsBuilder) {
    KStream<String, IotSensorData> stream = streamsBuilder.stream(iotTopicName, Consumed.with(Serdes.String(), iotSerde()));
    stream.split()
      .branch((key, value) -> value.getSensorType() != null, 
        Branched.withConsumer(ks -> ks.to((key, value, recordContext) -> "%s_%s".formatted(iotTopicName, value.getSensorType()))))
      .noDefaultBranch();
    return stream;
}

This approach provides greater flexibility for routing messages dynamically based on their content if a topic name can be generated based on a message’s content.

4.2. Routing With KafkaStreamBrancher

The KafkaStreamBrancher class provides a builder-style API that allows easier chaining of branching conditions, making code more readable and maintainable.

The primary benefit is the removal of the complexities associated with managing an array of branched streams, which is how the original KStream.branch method works. Instead, KafkaStreamBrancher lets us define each branch along with operations that should happen to that branch, removing the need for magic numbers or complex indexing to identify the correct branch. This approach is closely related to the previous one discussed earlier due to the introduction of split().branch() method.

Let’s apply this approach to a stream:

@Bean
public KStream<String, IotSensorData> kStream(StreamsBuilder streamsBuilder) {
    KStream<String, IotSensorData> stream = streamsBuilder.stream(iotTopicName, Consumed.with(Serdes.String(), iotSerde()));
    new KafkaStreamBrancher<String, IotSensorData>()
      .branch((key, value) -> "temp".equals(value.getSensorType()), (ks) -> ks.to(iotTopicName + "_temp"))
      .branch((key, value) -> "move".equals(value.getSensorType()), (ks) -> ks.to(iotTopicName + "_move"))
      .branch((key, value) -> "hum".equals(value.getSensorType()), (ks) -> ks.to(iotTopicName + "_hum"))
      .defaultBranch(ks -> ks.to("%s_unknown".formatted(iotTopicName)))
      .onTopOf(stream);
    return stream;
}

We’ve applied Fluent API to route the message to a specific topic.  Similarly, we can use a single branch() method call to route to multiple topics by using content as a part of a topic name:

@Bean
public KStream<String, IotSensorData> iotBrancherStream(StreamsBuilder streamsBuilder) {
    KStream<String, IotSensorData> stream = streamsBuilder.stream(iotTopicName, Consumed.with(Serdes.String(), iotSerde()));
    new KafkaStreamBrancher<String, IotSensorData>()
      .branch((key, value) -> value.getSensorType() != null, (ks) ->
        ks.to((key, value, recordContext) -> String.format("%s_%s", iotTopicName, value.getSensorType())))
      .defaultBranch(ks -> ks.to("%s_unknown".formatted(iotTopicName)))
      .onTopOf(stream);
    return stream;
}

By providing a higher level of abstraction for branching logic, KafkaStreamBrancher not only makes the code cleaner but also enhances its manageability, especially for applications with complex routing requirements.

4.3. Dynamic Topic Routing With TopicNameExtractor

Another approach to manage conditional branching in Kafka Streams is by using a TopicNameExtractor which, as the name suggests, extracts the topic name dynamically for each message in the stream. This method can be more straightforward for certain use cases compared to the previously discussed split().branch() and KafkaStreamBrancher approaches.

Here’s a sample configuration using TopicNameExtractor in a Spring Boot application:

@Bean
public KStream<String, IotSensorData> kStream(StreamsBuilder streamsBuilder) {
    KStream<String, IotSensorData> stream = streamsBuilder.stream(iotTopicName, Consumed.with(Serdes.String(), iotSerde()));
    TopicNameExtractor<String, IotSensorData> sensorTopicExtractor = (key, value, recordContext) -> "%s_%s".formatted(iotTopicName, value.getSensorType());
    stream.to(sensorTopicExtractor);
    return stream;
}

While the TopicNameExtractor method is proficient in its primary function of routing records to specific topics, it has some limitations when compared to other approaches like split().branch() and KafkaStreamBrancher. Specifically, TopicNameExtractor doesn’t provide the option to perform additional transformations like mapping or filtering within the same routing step.

5. Conclusion

In this article, we’ve seen different approaches for dynamic topic routing using Kafka Streams and Spring Boot.

We began by exploring the modern branching mechanisms like the split().branch() method and the KafkaStreamBrancher class. Furthermore, we examined the dynamic topic routing capabilities offered by TopicNameExtractor.

Each technique presents its advantages and challenges. For instance, the split().branch() can be cumbersome when handling numerous conditions, whereas the TopicNameExtractor provides a structured flow but restricts certain inline data processes. As a result, grasping the subtle differences of each approach is vital for creating an effective routing implementation.

As always, the full source code is available over on GitHub.

       

Comparing the Values of Two Generic Numbers in Java

$
0
0

1. Introduction

Java’s versatility is evident in its ability to handle generic Number objects.

In this tutorial, we’ll delve into the nuances of comparing these objects, offering detailed insights and code examples for each strategy.

2. Using doubleValue() Method

Converting both Number objects to their double representation is a foundational technique in Java.

While this approach is intuitive and straightforward, it’s not without its caveats.

When converting numbers to their double form, there’s a potential for precision loss. This is especially true for large floating-point numbers or numbers with many decimal places:

public int compareDouble(Number num1, Number num2) {
    return Double.compare(num1.doubleValue(), num2.doubleValue());
}

We must be vigilant and consider the implications of this conversion, ensuring that the results remain accurate and reliable.

3. Using compareTo() Method

Java’s wrapper classes are more than just utility classes for primitive types. The abstract class Number doesn’t implement the compareTo() method, but classes like Integer, Double, or BigInteger have a built-in compareTo() method.

Let’s create our custom compareTo() for type-specific comparisons, ensuring both type safety and precision:

// we create a method that compares Integer, but this could also be done for other types e.g. Double, BigInteger
public int compareTo(Integer int1, Integer int2) {
    return int1.compareTo(int2);
}

However, when working with several different types, we might encounter challenges.

It’s essential to understand the nuances of each wrapper class and how they interact with one another to ensure accurate comparisons.

4. Using BiFunction and Map

Java’s ability to seamlessly integrate functional programming with traditional data structures is remarkable.

Let’s create a dynamic comparison mechanism using BiFunction by mapping each Number subclass to a specific comparison function using maps:

// for this example, we create a function that compares Integer, but this could also be done for other types e.g. Double, BigInteger
Map<Class<? extends Number>, BiFunction<Number, Number, Integer>> comparisonMap
  = Map.ofEntries(entry(Integer.class, (num1, num2) -> ((Integer) num1).compareTo((Integer) num2)));
public int compareUsingMap(Number num1, Number num2) {
    return comparisonMap.get(num1.getClass())
      .apply(num1, num2);
}

This approach offers both versatility and adaptability, allowing for comparisons across various number types. It’s a testament to Java’s flexibility and its commitment to providing us with powerful tools.

5. Using Proxy and InvocationHandler

Let’s look into Java’s more advanced features, like proxies combined with InvocationHandlers, which offer a world of possibilities.

This strategy allows us to craft dynamic comparators that can adapt on the fly:

public interface NumberComparator {
    int compare(Number num1, Number num2);
}
NumberComparator proxy = (NumberComparator) Proxy
  .newProxyInstance(NumberComparator.class.getClassLoader(), new Class[] { NumberComparator.class },
  (p, method, args) -> Double.compare(((Number) args[0]).doubleValue(), ((Number) args[1]).doubleValue()));

While this approach provides unparalleled flexibility, it also requires a deep understanding of Java’s inner workings. It’s a strategy best suited for those well-versed in Java’s advanced capabilities.

6. Using Reflection

Java’s Reflection API is a powerful tool, but it comes with its own set of challenges. It allows us to introspect and dynamically determine types and invoke methods:

public int compareUsingReflection(Number num1, Number num2) throws Exception {
    Method method = num1.getClass().getMethod("compareTo", num1.getClass());
    return (int) method.invoke(num1, num2);
}

We must be careful with using Java’s Reflection because not all the Number classes have the compareTo() method implemented, so we might encounter errors, e.g., when using AtomicInteger and AtomicLong.

However, reflection can be performance-intensive and may introduce potential security vulnerabilities. It’s a tool that demands respect and careful usage, ensuring its power is harnessed responsibly.

7. Using Functional Programming

Java’s evolution has seen a significant shift towards functional programming. This paradigm allows us to craft concise and expressive comparisons using transformation functions, predicates, and other functional constructs:

Function<Number, Double> toDouble = Number::doubleValue;
BiPredicate<Number, Number> isEqual = (num1, num2) -> toDouble.apply(num1).equals(toDouble.apply(num2));
@Test
void givenNumbers_whenUseIsEqual_thenWillExecuteComparison() {
    assertEquals(true, isEqual.test(5, 5.0));
}

It’s an approach that promotes cleaner code and offers a more intuitive way to handle number comparisons.

8. Using Dynamic Comparators with Function

Java’s Function interface is a cornerstone of its commitment to functional programming. By using this interface to craft dynamic comparators, we’re equipped with a flexible and type-safe tool:

private boolean someCondition;
Function<Number, ?> dynamicFunction = someCondition ? Number::doubleValue : Number::intValue;
Comparator<Number> dynamicComparator = (num1, num2) -> ((Comparable) dynamicFunction.apply(num1))
  .compareTo(dynamicFunction.apply(num2));
@Test
void givenNumbers_whenUseDynamicComparator_thenWillExecuteComparison() {
    assertEquals(0, dynamicComparator.compare(5, 5.0));
}

It’s an approach that showcases Java’s modern capabilities and its dedication to providing cutting-edge tools.

9. Conclusion

The diverse strategies for comparing generic Number objects in Java have unique characteristics and use cases.

Selecting the appropriate method depends on the context and requirements of our application, and a thorough understanding of each strategy is essential for making an informed decision.

As always, the complete code samples for this article can be found over on GitHub.

       

Switching Between Frames Using Selenium WebDriver in Java

$
0
0

1. Introduction

Managing frames and iframes is a crucial skill for test automation engineers. Selenium WebDriver allows us to work with both frames and iframes in the same way.

In this tutorial, we’ll explore a few distinct methods to switch between frames with Selenium WebDriver. These methods include using a WebElement, a name or ID, and an index.

By the end, we’ll be well-equipped to tackle iframe interactions confidently, enhancing the scope and effectiveness of our automation tests.

2. Difference Between Frame and Iframe

The terms frames and iframes are often encountered in web development. Each serves a distinct purpose in structuring and enhancing web content.

Frames, an older HTML feature, partition a web page into separate sections where each section has its own dedicated HTML document. Although frames are deprecated, they are still encountered on the web.

Iframes (inline frames) embed a separate HTML document within a single frame on a web page. They are widely used in web pages for various purposes, such as incorporating external content like maps, social media widgets, advertisements, or interactive forms seamlessly.

3. Switch to Frame Using a WebElement

Switching using a WebElement is the most flexible option. We can find the frame using any selector, like ID, name, CSS selector, or XPath, to find the specific iframe we want:

WebElement iframeElement = driver.findElement(By.cssSelector("#frame_selector"));
driver.switchTo().frame(iframeElement);

For a more reliable approach, it’s better to use explicit waits, such as ExpectedConditions.frameToBeAvailableAndSwitchToIt():

WebElement iframeElement = driver.findElement(By.cssSelector("#frame_selector"));
new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt(iframeElement))

This helps ensure that the iframe is fully loaded and ready for interaction, reducing potential timing issues and making our automation scripts more robust when working with iframes.

4. Switch to Frame Using a Name or ID

Another method to navigate into a frame is by leveraging its name or ID attribute. This approach is straightforward and particularly useful when these attributes are unique:

driver.switchTo().frame("frame_name_or_id");

Using explicit wait ensures that the frame is fully loaded and prepared for interaction:

new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt("frame_name_or_id"));

5. Switch to Frame Using an Index

Selenium allows us to switch to a frame using a simple numerical index. The first frame has an index of 0, the second has an index of 1, and so on. Switching to frames using an index offers a flexible and convenient approach, especially when an iframe lacks a distinct name or ID.

By specifying the index of the frame, we can seamlessly navigate through the frames within a web page:

driver.switchTo().frame(0);

Explicit wait makes code more robust:

new WebDriverWait(driver, Duration.ofSeconds(10))
  .until(ExpectedConditions.frameToBeAvailableAndSwitchToIt(0));

However, it’s important to use frame indexes with caution because the order of frames can change on a web page. If a frame is added or removed, it can disrupt the index order, leading to potential failures in our automated tests.

6. Switching to a Nested Frame

When frames are nested, it means that one or more frames are embedded within other frames, forming a parent-child relationship. This hierarchy can continue to multiple levels, resulting in complex nested frame structures:

<!DOCTYPE html>
<html>
<head>
    <title>Frames Example</title>
</head>
<body>
    <h1>Main Content</h1>
    <p>This is the main content of the web page.</p>
    <iframe id="outer_frame" width="400" height="300">
        <h2>Outer Frame</h2>
        <p>This is the content of the outer frame.</p>
        <iframe id="inner_frame" width="300" height="200">
            <h3>Inner Frame</h3>
            <p>This is the content of the inner frame.</p>
        </iframe>
    </iframe>
    <p>More content in the main page.</p>
</body>
</html>

Selenium provides a straightforward method for handling them. To access an inner frame within a nested frame structure, we should switch from the outermost to the inner one sequentially. This allows us to access the elements within each frame as we go deeper into the hierarchy:

driver.switchTo().frame("outer_frame");
driver.switchTo().frame("inner_frame");

7. Switching Back From Frame or Nested Frame

Selenium provides a mechanism to switch back from frames and nested frames with distinct methods. For returning to the main content, we can use the method defaultContent():

driver.switchTo().defaultContent()

It essentially exits all frames and ensures that our subsequent interactions take place in the main context of the web page. This is particularly useful when we’ve completed tasks within frames and need to continue our actions in the main content.

For moving to the parent frame, we can use the parentFrame() method:

driver.switchTo().parentFrame()

This method allows us to transition from a child frame back to its immediate parent frame. It’s particularly valuable when we’re working with nested frames, each embedded within another, and we need to move between them.

8. Conclusion

In this article, we’ve explored frames and how to work with them using Selenium WebDriver. We’ve learned different methods to switch between them using WebElements, names or IDs, and numerical indices. These methods offer flexibility and precision.

By using explicit waits, we’ve ensured reliable interactions with frames, reducing potential issues and making our automation scripts more robust.

We’ve learned how to handle nested frames by sequentially switching from the outermost frame to the inner ones, allowing us to access elements within complex nested frame structures. We also learned how to switch back to the main content as well as move to the parent frame.

In conclusion, mastering frame and iframe handling with Selenium WebDriver is vital for test automation engineers. With the knowledge and techniques, we’re well-prepared to confidently deal with frames.

As always, the code presented in this article is available over on GitHub.

       

Why Is sun.misc.Unsafe.park Actually Unsafe?

$
0
0

1. Overview

Java provides certain APIs for internal use and discourages unnecessary use in other cases. The JVM developers gave the packages and classes names such as Unsafe, which should warn developers.  However, often, it doesn’t stop developers from using these classes.

In this tutorial, we’ll learn why Unsafe.park() is actually unsafe. The goal isn’t to scare but to educate and provide a better insight into the interworking of the park() and unpark(Thread) methods.

2. Unsafe

The Unsafe class contains a low-level API that aims to be used only with internal libraries. However, sun.misc.Unsafe is still accessible even after the introduction of JPMS. This was done to maintain backward compatibility and support all the libraries and frameworks that might use this API. In more detail, the reasons are explained in JEP 260,

In this article, we won’t use Unsafe directly but rather the LockSupport class from the java.util.concurrent.locks package that wraps calls to Unsafe:

public static void park() {
    UNSAFE.park(false, 0L);
}
public static void unpark(Thread thread) {
    if (thread != null)
        UNSAFE.unpark(thread);
}

3. park() vs. wait()

The park() and unpark(Thread) functionality are similar to wait() and notify(). Let’s review their differences and understand the danger of using the first instead of the second.

3.1. Lack of Monitors

Unlike wait() and notify(), park() and unpark(Thread) don’t require a monitor. Any code that can get a reference to the parked thread can unpark it. This might be useful in low-level code but can introduce additional complexity and hard-to-debug problems. 

Monitors are designed in Java so that a thread cannot use it if it hasn’t acquired it in the first place. This is done to prevent race conditions and simplify the synchronization process. Let’s try to notify a thread without acquiring it’s monitor:

@Test
@Timeout(3)
void giveThreadWhenNotifyWithoutAcquiringMonitorThrowsException() {
    Thread thread = new Thread() {
        @Override
        public void run() {
            synchronized (this) {
                try {
                    this.wait();
                } catch (InterruptedException e) {
                    // The thread was interrupted
                }
            }
        }
    };
    assertThrows(IllegalMonitorStateException.class, () -> {
        thread.start();
        Thread.sleep(TimeUnit.SECONDS.toMillis(1));
        thread.notify();
        thread.join();
    });
}

Trying to notify a thread without acquiring a monitor results in IllegalMonitorStateException. This mechanism enforces better coding standards and prevents possible hard-to-debug problems.

Now, let’s check the behavior of park() and unpark(Thread):

@Test
@Timeout(3)
void giveThreadWhenUnparkWithoutAcquiringMonitor() {
    Thread thread = new Thread(LockSupport::park);
    assertTimeoutPreemptively(Duration.of(2, ChronoUnit.SECONDS), () -> {
        thread.start();
        LockSupport.unpark(thread);
    });
}

We can control threads with little work. The only thing required is the reference to the thread. This provides us with more power over locking, but at the same time, it exposes us to many more problems.

It’s clear why park() and unpark(Thread) might be helpful for low-level code, but we should avoid this in our usual application code because it might introduce too much complexity and unclear code.

3.2. Information About the Context

The fact that no monitors are involved also might reduce the information about the context. In other words, the thread is parked, and it’s unclear why, when, and if other threads are parked for the same reason. Let’s run two threads:

public class ThreadMonitorInfo {
    private static final Object MONITOR = new Object();
    public static void main(String[] args) throws InterruptedException {
        Thread waitingThread = new Thread(() -> {
            try {
                synchronized (MONITOR) {
                    MONITOR.wait();
                }
            } catch (InterruptedException e) {
                throw new RuntimeException(e);
            }
        }, "Waiting Thread");
        Thread parkedThread = new Thread(LockSupport::park, "Parked Thread");
        waitingThread.start();
        parkedThread.start();
        waitingThread.join();
        parkedThread.join();
    }
}

Let’s check the thread dump using jstack:

"Parked Thread" #12 prio=5 os_prio=31 tid=0x000000013b9c5000 nid=0x5803 waiting on condition [0x000000016e2ee000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at com.baeldung.park.ThreadMonitorInfo$$Lambda$2/284720968.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:750)
"Waiting Thread" #11 prio=5 os_prio=31 tid=0x000000013b9c4000 nid=0xa903 in Object.wait() [0x000000016e0e2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0x00000007401811d8> (a java.lang.Object)
        at java.lang.Object.wait(Object.java:502)
        at com.baeldung.park.ThreadMonitorInfo.lambda$main$0(ThreadMonitorInfo.java:12)
        - locked <0x00000007401811d8> (a java.lang.Object)
        at com.baeldung.park.ThreadMonitorInfo$$Lambda$1/1595428806.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:750)

While analyzing the thread dump, it’s clear that the parked thread contains less information. Thus, it might create a situation when a certain thread problem, even with a thread dump, would be hard to debug.

An additional benefit of using specific concurrent structures or specific locks would provide even more context in the thread dumps, giving more information about the application state. Many JVM concurrent mechanisms are using park() internally. However, if a thread dump explains that the thread is waiting, for example, on a CyclicBarrier, it’s waiting for other threads.

3.3. Interrupted Flag

Another interesting thing is the difference in handling interrupts. Let’s review the behavior of a waiting thread:

@Test
@Timeout(3)
void givenWaitingThreadWhenNotInterruptedShouldNotHaveInterruptedFlag() throws InterruptedException {
    Thread thread = new Thread() {
        @Override
        public void run() {
            synchronized (this) {
                try {
                    this.wait();
                } catch (InterruptedException e) {
                    // The thread was interrupted
                }
            }
        }
    };
    thread.start();
    Thread.sleep(TimeUnit.SECONDS.toMillis(1));
    thread.interrupt();
    thread.join();
    assertFalse(thread.isInterrupted(), "The thread shouldn't have the interrupted flag");
}

If we’re interrupting a thread from its waiting state, the wait() method would immediately throw an InterruptedException and clear the interrupted flag. That’s why the best practice is to use while loops checking the waiting conditions instead of the interrupted flag.

In contrast, a parked thread isn’t interrupted immediately and rather does it on its terms. Also, the interrupt doesn’t cause an exception, and the thread just returns from the park() method. Subsequently, the interrupted flag isn’t reset, as happens while interrupting a waiting thread:

@Test
@Timeout(3)
void givenParkedThreadWhenInterruptedShouldNotResetInterruptedFlag() throws InterruptedException {
    Thread thread = new Thread(LockSupport::park);
    thread.start();
    thread.interrupt();
    assertTrue(thread.isInterrupted(), "The thread should have the interrupted flag");
    thread.join();
}

Not accounting for this behavior may cause problems while handling the interruption. For example, if we don’t reset the flag after the interrupt on a parked thread, it may cause subtle bugs.

3.4. Preemptive Permits

Parking and unparking work on the idea of a binary semaphore. Thus, we can provide a thread with a preemptive permit. For example, we can unpark a thread, which would give it a permit, and the subsequent park won’t suspend it but would take the permit and proceed:

private final Thread parkedThread = new Thread() {
    @Override
    public void run() {
        LockSupport.unpark(this);
        LockSupport.park();
    }
};
@Test
void givenThreadWhenPreemptivePermitShouldNotPark()  {
    assertTimeoutPreemptively(Duration.of(1, ChronoUnit.SECONDS), () -> {
        parkedThread.start();
        parkedThread.join();
    });
}

This technique can be used in some complex synchronization scenarios. As the parking uses a binary semaphore, we cannot add up permits, and two unpark calls wouldn’t produce two permits:

private final Thread parkedThread = new Thread() {
    @Override
    public void run() {
        LockSupport.unpark(this);
        LockSupport.unpark(this);
        LockSupport.park();
        LockSupport.park();
    }
};
@Test
void givenThreadWhenRepeatedPreemptivePermitShouldPark()  {
    Callable<Boolean> callable = () -> {
        parkedThread.start();
        parkedThread.join();
        return true;
    };
    boolean result = false;
    Future<Boolean> future = Executors.newSingleThreadExecutor().submit(callable);
    try {
        result = future.get(1, TimeUnit.SECONDS);
    } catch (InterruptedException | ExecutionException | TimeoutException e) {
        // Expected the thread to be parked
    }
    assertFalse(result, "The thread should be parked");
}

In this case, the thread would have only one permit, and the second call to the park() method would park the thread. This might produce some undesired behavior if not appropriately handled.

4. Conclusion

In this article, we learned why the park() method is considered unsafe. JVM developers hide or suggest not to use internal APIs for specific reasons. This is not only because it might be dangerous and produce unexpected results at the moment but also because these APIs might be subject to change in the future, and their support isn’t guaranteed.

Additionally, these APIs require extensive learning about underlying systems and techniques, which may differ from platform to platform. Not following this might result in fragile code and hard-to-debug problems.

As always, the code in this article is available over on GitHub.

       

HashSet toArray() Method in Java

$
0
0

1. Introduction

HashSet is one of the common data structures that we can utilize in Java Collеctions.

In this tutorial, we’ll dive into the toArray() method of the HashSet class, illustrating how to convert a HashSet to an array.

2. Convеrting HashSеt to Array

Let’s look at a set of examples that illustrate how to apply the toArray() method to convert a HashSet into an array.

2.1. HashSet to an Array of Strings

In the following method, we are seeking to convert a HashSet of strings into an array of strings:

@Test
public void givenStringHashSet_whenConvertedToArray_thenArrayContainsStringElements() {
    HashSet<String> stringSet = new HashSet<>();
    stringSet.add("Apple");
    stringSet.add("Banana");
    stringSet.add("Cherry");
    // Convert the HashSet of Strings to an array of Strings
    String[] stringArray = stringSet.toArray(new String[0]);
    // Test that the array is of the correct length
    assertEquals(3, stringArray.length);
    for (String str : stringArray) {
        assertTrue(stringSet.contains(str));
    }
}

Here, a HashSet named stringSet is initialized with three String elements: (“Apple” “Banana” and “Cherry“). To be specific, the test method ensures that the resulting array has a length of 3, matching the number of elements in the HashSet.

Then, it iterates through the stringArray and checks if each element is contained within the original stringSet, asserting that the array indeed contains the String elements, confirming the successful conversion of the HashSet to a String array. 

2.2. HashSet to an Array of Integers

Additionally, we can utilize the toArray() method to convert an Integer HashSet into an array of Integers as follows:

@Test
public void givenIntegerHashSet_whenConvertedToArray_thenArrayContainsIntegerElements() {
    HashSet<Integer> integerSet = new HashSet<>();
    integerSet.add(5);
    integerSet.add(10);
    integerSet.add(15);
    // Convert the HashSet of Integers to an array of Integers
    Integer[] integerArray = integerSet.toArray(new Integer[0]);
    // Test that the array is of the correct length
    assertEquals(3, integerArray.length);
    for (Integer num : integerArray) {
        assertTrue(integerSet.contains(num));
    }
    assertTrue(integerSet.contains(5));
    assertTrue(integerSet.contains(10));
    assertTrue(integerSet.contains(15));
}

Here, we create a HashSet named integerSet with three Integer elements: (5, 10, and 15). The test method is responsible for verifying the conversion of this Integer HashSet into an array of Integers, referred to as integerArray.

Moreover, it confirms that the resulting array has length = 3, corresponding to the number of elements in the original HashSet. Subsequently, the method iterates through integerArray, ensuring each element is contained within the original integerSet.

3. Conclusion

In conclusion, it is easy to convert a HashSet into an array using the toArray() method of the HashSet class. This can also be useful while handling array-based data structures or some other components in our Java apps.

As always, the complete code samples for this article can be found over on GitHub.

       

Convert ResultSet Into Map

$
0
0

1. Introduction

Java applications widely use the Java Database Connectivity (JDBC) API to connect and execute queries on a database. ResultSet is a tabular representation of the data extracted by these queries.

In this tutorial, we’ll learn how to convert the data of a JDBC ResultSet into a Map.

2. Setup

We’ll write a few test cases to achieve our goal. Our data source will be an H2 database. H2 is a fast, open-source, in-memory database that supports the JDBC API. Let’s add the relevant Maven dependency:

<dependency>
    <groupId>com.h2database</groupId>
    <artifactId>h2</artifactId>
</dependency>

Once the database connection is ready, we’ll write a method to do the initial data setup for our test cases. To achieve this, we first create a JDBC Statement, and subsequently create a database table named employee using the same. The employee table consists of columns named empId, empName, and empCity that will hold information about the ID, name, and city of the employee. We now can insert sample data in the table using the Statement.execute() method:

void initialDataSetup() throws SQLException {
    Statement statement = connection.createStatement();
    String sql = "CREATE TABLE employee ( " +
      "empId INTEGER not null, " +
      "empName VARCHAR(50), " +
      "empCity VARCHAR(50), " +
      "PRIMARY KEY (empId))";
    statement.execute(sql);
    List<String> sqlQueryList = Arrays.asList(
      "INSERT INTO employee VALUES (1, 'Steve','London')", 
      "INSERT INTO employee VALUES (2, 'John','London')", 
      "INSERT INTO employee VALUES (3, 'David', 'Sydney')",
      "INSERT INTO employee VALUES (4, 'Kevin','London')", 
      "INSERT INTO employee VALUES (5, 'Jade', 'Sydney')");
    
    for (String query: sqlQueryList) {
        statement.execute(query);
    }
}

3. ResultSet to Map

Now that the sample data is present in the database, we can query it for extraction. Querying the database gives the output in the form of a ResultSet. Our goal is to transform the data from this ResultSet into a Map where the key is the city name, and the value is the list of employee names in that city.

3.1. Using Java 7

We’ll first create a PreparedStatement from the database connection and provide an SQL query to it. Then, we can use the PreparedStatement.executeQuery() method to get the ResultSet.

We can now iterate over the ResultSet data and fetch the column data individually. In order to do this, we can use the ResultSet.getString() method by passing the column name of the employee table into it. After that, we can use the Map.containsKey() method to check if the map already contains an entry for that city name. If there’s no key found for that city, we’ll add an entry with the city name as the key and an empty ArrayList as the value. Then, we add the employee’s name to the list of employee names for that city:

@Test
void whenUsingContainsKey_thenConvertResultSetToMap() throws SQLException {
    ResultSet resultSet = connection.prepareStatement(
        "SELECT * FROM employee").executeQuery();
    Map<String, List<String>> valueMap = new HashMap<>();
    while (resultSet.next()) {
        String empCity = resultSet.getString("empCity");
        String empName = resultSet.getString("empName");
        if (!valueMap.containsKey(empCity)) {
            valueMap.put(empCity, new ArrayList<>());
        }
        valueMap.get(empCity).add(empName);
    }
    assertEquals(3, valueMap.get("London").size());
}

3.2. Using Java 8

Java 8 introduced the concept of lambda expressions and default methods. We can leverage them in our implementation to simplify the entry of new keys in the output map. We can use the method named computeIfAbsent() of the Map class, which takes two parameters: a key and a mapping function. If the key is found, then it returns the relevant value; otherwise, it will use the mapping function to create the default value and store it in the map as a new key-value pair. We can add the employee’s name to the list afterward.

Here’s the modified version of the previous test case using Java 8:

@Test
void whenUsingComputeIfAbsent_thenConvertResultSetToMap() throws SQLException {
    ResultSet resultSet = connection.prepareStatement(
        "SELECT * FROM employee").executeQuery();
    Map<String, List<String>> valueMap = new HashMap<>();
    while (resultSet.next()) {
        String empCity = resultSet.getString("empCity");
        String empName = resultSet.getString("empName");
        valueMap.computeIfAbsent(empCity, data -> new ArrayList<>()).add(empName);
    }
    assertEquals(3, valueMap.get("London").size());
}

3.3. Using Apache Commons DbUtils

Apache Commons DbUtils is a third-party library that provides additional and simplified functionalities for JDBC operations. It provides an interesting interface named ResultSetHandler that consumes JDBC ResultSet as input and allows us to transform it into the desired object that the application expects. Moreover, this library uses the QueryRunner class to run SQL queries on the database table. The QueryRunner.query() method takes the database connection, SQL query, and ResultSetHandler as input and directly returns the expected format.

Let’s look at an example of how to create a Map from a ResultSet using ResultSetHandler:

@Test
void whenUsingDbUtils_thenConvertResultSetToMap() throws SQLException {
    ResultSetHandler <Map<String, List<String>>> handler = new ResultSetHandler <Map <String, List<String>>>() {
        public Map<String, List<String>> handle(ResultSet resultSet) throws SQLException {
            Map<String, List<String>> result = new HashMap<>();
            while (resultSet.next()) {
                String empCity = resultSet.getString("empCity");
                String empName = resultSet.getString("empName");
                result.computeIfAbsent(empCity, data -> new ArrayList<>()).add(empName);
            }
            return result;
        }
    };
    QueryRunner run = new QueryRunner();
    Map<String, List<String>> valueMap = run.query(connection, "SELECT * FROM employee", handler);
    assertEquals(3, valueMap.get("London").size());
}

4. Conclusion

To summarize, we took a look at several ways we can aggregate data from ResultSet and convert it into a Map using Java 7, Java 8, and the Apache DbUtils library.

As always, the full code for this article can be found over on GitHub.

       

MongoDB Atlas Search Using the Java Driver and Spring Data

$
0
0

1. Introduction

In this tutorial, we’ll learn how to use Atlas Search functionalities using the Java MongoDB driver API. By the end, we’ll have a grasp on creating queries, paginating results, and retrieving meta-information. Also, we’ll cover refining results with filters, adjusting result scores, and selecting specific fields to be displayed.

2. Scenario and Setup

MongoDB Atlas has a free forever cluster that we can use to test all features. To showcase Atlas Search functionalities, we’ll only need a service class. We’ll connect to our collection using MongoTemplate.

2.1. Dependencies

First, to connect to MongoDB, we’ll need spring-boot-starter-data-mongodb:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
    <version>3.1.2</version>
</dependency>

2.2. Sample Dataset

Throughout this tutorial, we’ll use the movies collection from MongoDB Atlas’s sample_mflix sample dataset to simplify examples. It contains data about movies since the 1900s, which will help us showcase the filtering capabilities of Atlas Search.

2.3. Creating an Index With Dynamic Mapping

For Atlas Search to work, we need indexes. These can be static or dynamic. A static index is helpful for fine-tuning, while a dynamic one is an excellent general-purpose solution. So, let’s start with a dynamic index.

There are a few ways to create search indexes (including programmatically); we’ll use the Atlas UI. There, we can do this by accessing Search from the menu, selecting our cluster, then clicking Go to Atlas Search:

Creating an index

 

After clicking on Create Search Index, we’ll choose the JSON Editor to create our index, then click Next:

JSON editor

Finally, on the next screen, we choose our target collection, a name for our index, and input our index definition:

{
    "mappings": {
        "dynamic": true
    }
}

We’ll use the name idx-queries for this index throughout this tutorial. Note that if we name our index default, we don’t need to specify its name when creating queries. Most importantly, dynamic mappings are a simple choice for more flexible, frequently changing schemas.

By setting mappings.dynamic to true, Atlas Search automatically indexes all dynamically indexable and supported field types in a document. While dynamic mappings provide convenience, especially when the schema is unknown, they tend to consume more disk space and might be less efficient compared to static ones.

2.4. Our Movie Search Service

We’ll base our examples on a service class containing some search queries for our movies, extracting interesting information from them. We’ll slowly build them up to more complex queries:

@Service
public class MovieAtlasSearchService {
    private final MongoCollection<Document> collection;
    public MovieAtlasSearchService(MongoTemplate mongoTemplate) {
        MongoDatabase database = mongoTemplate.getDb();
        this.collection = database.getCollection("movies");
    }
    // ...
}

All we need is a reference to our collection for future methods.

3. Constructing a Query

Atlas Search queries are created via pipeline stages, represented by a List<Bson>. The most essential stage is Aggregates.search(), which receives a SearchOperator and, optionally, a SearchOptions object. Since we called our index idx-queries instead of default, we must include its name with SearchOptions.searchOptions().index(). Otherwise, we’ll get no errors and no results.

Many search operators are available to define how we want to conduct our query. In this example, we’ll find movies by tags using SearchOperator.text(), which performs a full-text search. We’ll use it to search the contents of the fullplot field with SearchPath.fieldPath(). We’ll omit static imports for readability:

public Collection<Document> moviesByKeywords(String keywords) {
    List<Bson> pipeline = Arrays.asList(
        search(
          text(
            fieldPath("fullplot"), keywords
          ),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

Also, the second stage in our pipeline is Aggregates.project(), which represents a projection. If not specified, our query results will include all the fields in our documents. But we can set it and choose which fields we want (or don’t want) to appear in our results. Note that specifying a field for inclusion implicitly excludes all other fields except the _id field. So, in this case, we’re excluding the _id field and passing a list of the fields we want. Note we can also specify nested fields, like imdb.rating.

To execute the pipeline, we call aggregate() on our collection. This returns an object we can use to iterate on results. Finally, for simplicity, we call into() to iterate over results and add them to a collection, which we return. Note that a big enough collection can exhaust the memory in our JVM. We’ll see how to eliminate this concern by paginating our results later on.

Most importantly, pipeline stage order matters. We’ll get an error if we put the project() stage before search().

Let’s take a look at the first two results of calling moviesByKeywords(“space cowboy”) on our service:

[
    {
        "title": "Battle Beyond the Stars",
        "fullplot": "Shad, a young farmer, assembles a band of diverse mercenaries in outer space to defend his peaceful planet from the evil tyrant Sador and his armada of aggressors. Among the mercenaries are Space Cowboy, a spacegoing truck driver from Earth; Gelt, a wealthy but experienced assassin looking for a place to hide; and Saint-Exmin, a Valkyrie warrior looking to prove herself in battle.",
        "year": 1980,
        "imdb": {
            "rating": 5.4
        }
    },
    {
        "title": "The Nickel Ride",
        "fullplot": "Small-time criminal Cooper manages several warehouses in Los Angeles that the mob use to stash their stolen goods. Known as \"the key man\" for the key chain he always keeps on his person that can unlock all the warehouses. Cooper is assigned by the local syndicate to negotiate a deal for a new warehouse because the mob has run out of storage space. However, Cooper's superior Carl gets nervous and decides to have cocky cowboy button man Turner keep an eye on Cooper.",
        "year": 1974,
        "imdb": {
            "rating": 6.7
        }
    },
    (...)
]

3.1. Combining Search Operators

It’s possible to combine search operators using SearchOperator.compound(). In this example, we’ll use it to include must and should clauses. A must clause contains one or more conditions for matching documents. On the other hand, a should clause contains one or more conditions that we’d prefer our results to include.

This alters the score so the documents that meet these conditions appear first:

public Collection<Document> late90sMovies(String keywords) {
    List<Bson> pipeline = asList(
        search(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000)
            ))
            .should(asList(
              text(
                fieldPath("fullplot"), keywords
              )
            )),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

We kept the same searchOptions() and projected fields from our first query. But, this time, we moved text() to a should clause because we want the keywords to represent a preference, not a requirement.

Then, we created a must clause, including SearchOperator.numberRange(), to only show movies from 1995 to 2000 (exclusive) by restricting the values on the year field. This way, we only return movies from that era.

Let’s see the first two results for hacker assassin:

[
    {
        "title": "Assassins",
        "fullplot": "Robert Rath is a seasoned hitman who just wants out of the business with no back talk. But, as things go, it ain't so easy. A younger, peppier assassin named Bain is having a field day trying to kill said older assassin. Rath teams up with a computer hacker named Electra to defeat the obsessed Bain.",
        "year": 1995,
        "imdb": {
            "rating": 6.3
        }
    },
    {
        "fullplot": "Thomas A. Anderson is a man living two lives. By day he is an average computer programmer and by night a hacker known as Neo. Neo has always questioned his reality, but the truth is far beyond his imagination. Neo finds himself targeted by the police when he is contacted by Morpheus, a legendary computer hacker branded a terrorist by the government. Morpheus awakens Neo to the real world, a ravaged wasteland where most of humanity have been captured by a race of machines that live off of the humans' body heat and electrochemical energy and who imprison their minds within an artificial reality known as the Matrix. As a rebel against the machines, Neo must return to the Matrix and confront the agents: super-powerful computer programs devoted to snuffing out Neo and the entire human rebellion.",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix"
    },
    (...)
]

4. Scoring the Result Set

When we query documents with search(), the results appear in order of relevance. This relevance is based on the calculated score, from highest to lowest. This time, we’ll modify late90sMovies() to receive a SearchScore modifier to boost the relevance of the plot keywords in our should clause:

public Collection<Document> late90sMovies(String keywords, SearchScore modifier) {
    List<Bson> pipeline = asList(
        search(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000)
            ))
            .should(asList(
              text(
                fieldPath("fullplot"), keywords
              )
              .score(modifier)
            )),
          searchOptions()
            .index("idx-queries")
        ),
        project(fields(
          excludeId(),
          include("title", "year", "fullplot", "imdb.rating"),
          metaSearchScore("score")
        ))
    );
    return collection.aggregate(pipeline)
      .into(new ArrayList<>());
}

Also, we include metaSearchScore(“score”) in our fields list to see the score for each document in our results. For example, we can now multiply the relevance of our “should” clause by the value of the imdb.votes field like this:

late90sMovies(
  "hacker assassin", 
  SearchScore.boost(fieldPath("imdb.votes"))
)

And this time, we can see that The Matrix comes first, thanks to the boost:

[
    {
        "fullplot": "Thomas A. Anderson is a man living two lives (...)",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix",
        "score": 3967210.0
    },
    {
        "fullplot": "(...) Bond also squares off against Xenia Onatopp, an assassin who uses pleasure as her ultimate weapon.",
        "imdb": {
            "rating": 7.2
        },
        "year": 1995,
        "title": "GoldenEye",
        "score": 462604.46875
    },
    (...)
]

4.1. Using a Score Function

We can achieve greater control by using a function to alter the score of our results. Let’s pass a function to our method that adds the value of the year field to the natural score. This way, newer movies end up with a higher score:

late90sMovies(keywords, function(
  addExpression(asList(
    pathExpression(
      fieldPath("year"))
      .undefined(1), 
    relevanceExpression()
  ))
));

That code starts with a SearchScore.function(), which is a SearchScoreExpression.addExpression() since we want an add operation. Then, since we want to add a value from a field, we use a SearchScoreExpression.pathExpression() and specify the field we want: year. Also, we call undefined() to determine a fallback value for year in case it’s missing. In the end, we call relevanceExpression() to return the document’s relevance score, which is added to the value of year.

When we execute that, we’ll see “The Matrix” now appears first, along with its new score:

[
    {
        "fullplot": "Thomas A. Anderson is a man living two lives (...)",
        "imdb": {
            "rating": 8.7
        },
        "year": 1999,
        "title": "The Matrix",
        "score": 2003.67138671875
    },
    {
        "title": "Assassins",
        "fullplot": "Robert Rath is a seasoned hitman (...)",
        "year": 1995,
        "imdb": {
            "rating": 6.3
        },
        "score": 2003.476806640625
    },
    (...)
]

That’s useful for defining what should have greater weight when scoring our results.

5. Getting Total Rows Count From Metadata

If we need to get the total number of results in a query, we can use Aggregates.searchMeta() instead of search() to retrieve metadata information only. With this method, no documents are returned. So, we’ll use it to count the number of movies from the late 90s that also contain our keywords.

For meaningful filtering, we’ll also include the keywords in our must clause:

public Document countLate90sMovies(String keywords) {
    List<Bson> pipeline = asList(
        searchMeta(
          compound()
            .must(asList(
              numberRange(
                fieldPath("year"))
                .gteLt(1995, 2000),
              text(
                fieldPath("fullplot"), keywords
              )
            )),
          searchOptions()
            .index("idx-queries")
            .count(total())
        )
    );
    return collection.aggregate(pipeline)
      .first();
}

This time, searchOptions() includes a call to SearchOptions.count(SearchCount.total()), which ensures we get an exact total count (instead of a lower bound, which is faster depending on the collection size). Also, since we expect a single object in the results, we call first() on aggregate().

Finally, let’s see what is returned for countLate90sMovies(“hacker assassin”):

{
    "count": {
        "total": 14
    }
}

This is useful for getting information about our collection without including documents in our results.

6. Faceting on Results

In MongoDB Atlas Search, a facet query is a feature that allows retrieving aggregated and categorized information about our search results. It helps us analyze and summarize data based on different criteria, providing insights into the distribution of search results.

Also, it enables grouping search results into different categories or buckets and retrieving counts or additional information about each category. This helps answer questions like “How many documents match a specific category?” or “What are the most common values for a certain field within the results?”

6.1. Creating a Static Index

In our first example, we’ll create a facet query to give us information about genres from movies since the 1900s and how these relate. We’ll need an index with facet types, which we can’t have when using dynamic indexes.

So, let’s start by creating a new search index in our collection, which we’ll call idx-facets. Note that we’ll keep dynamic as true so we can still query the fields that are not explicitly defined:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "genres": [
        {
          "type": "stringFacet"
        },
        {
          "type": "string"
        }
      ],
      "year": [
        {
          "type": "numberFacet"
        },
        {
          "type": "number"
        }
      ]
    }
  }
}

We started by specifying that our mappings aren’t dynamic. Then, we selected the fields we were interested in for indexing faceted information. Since we also want to use filters in our query, for each field, we specify an index of a standard type (like string) and one of a faceted type (like stringFacet).

6.2. Running a Facet Query

Creating a facet query involves using searchMeta() and starting a SearchCollector.facet() method to include our facets and an operator for filtering results. When defining the facets, we have to choose a name and use a SearchFacet method that corresponds to the type of index we created. In our case, we define a stringFacet() and a numberFacet():

public Document genresThroughTheDecades(String genre) {
    List pipeline = asList(
      searchMeta(
        facet(
          text(
            fieldPath("genres"), genre
          ), 
          asList(
            stringFacet("genresFacet", 
              fieldPath("genres")
            ).numBuckets(5),
            numberFacet("yearFacet", 
              fieldPath("year"), 
              asList(1900, 1930, 1960, 1990, 2020)
            )
          )
        ),
        searchOptions()
          .index("idx-facets")
      )
    );
    return collection.aggregate(pipeline)
      .first();
}

We filter movies with a specific genre with the text() operator. Since films generally contain multiple genres, the stringFacet() will also show five (specified by numBuckets()) related genres ranked by frequency. For the numberFacet(), we must set the boundaries separating our aggregated results. We need at least two, with the last one being exclusive.

Finally, we return only the first result. Let’s see what it looks like if we filter by the “horror” genre:

{
    "count": {
        "lowerBound": 1703
    },
    "facet": {
        "genresFacet": {
            "buckets": [
                {
                    "_id": "Horror",
                    "count": 1703
                },
                {
                    "_id": "Thriller",
                    "count": 595
                },
                {
                    "_id": "Drama",
                    "count": 395
                },
                {
                    "_id": "Mystery",
                    "count": 315
                },
                {
                    "_id": "Comedy",
                    "count": 274
                }
            ]
        },
        "yearFacet": {
            "buckets": [
                {
                    "_id": 1900,
                    "count": 5
                },
                {
                    "_id": 1930,
                    "count": 47
                },
                {
                    "_id": 1960,
                    "count": 409
                },
                {
                    "_id": 1990,
                    "count": 1242
                }
            ]
        }
    }
}

Since we didn’t specify a total count, we get a lower bound count, followed by our facet names and their respective buckets.

6.3. Including a Facet Stage to Paginate Results

Let’s return to our late90sMovies() method and include a $facet stage in our pipeline. We’ll use it for pagination and a total rows count. The search() and project() stages will remain unmodified:

public Document late90sMovies(int skip, int limit, String keywords) {
    List<Bson> pipeline = asList(
        search(
          // ...
        ),
        project(fields(
          // ...
        )),
        facet(
          new Facet("rows",
            skip(skip),
            limit(limit)
          ),
          new Facet("totalRows",
            replaceWith("$$SEARCH_META"),
            limit(1)
          )
        )
    );
    return collection.aggregate(pipeline)
      .first();
}

We start by calling Aggregates.facet(), which receives one or more facets. Then, we instantiate a Facet to include skip() and limit() from the Aggregates class. While skip() defines our offset, limit() will restrict the number of documents retrieved. Note that we can name our facets anything we like.

Also, we call replaceWith(“$$SEARCH_META“) to get metadata info in this field. Most importantly, so that our metadata information is not repeated for each result, we include a limit(1). Finally, when our query has metadata, the result becomes a single document instead of an array, so we only return the first result.

7. Conclusion

In this article, we saw how MongoDB Atlas Search provides developers with a versatile and potent toolset. Integrating it with the Java MongoDB driver API can enhance search functionalities, data aggregation, and result customization. Our hands-on examples have aimed to provide a practical understanding of its capabilities. Whether implementing a simple search or seeking intricate data analytics, Atlas Search is an invaluable tool in the MongoDB ecosystem.

Remember to leverage the power of indexes, facets, and dynamic mappings to make our data work for us. As always, the source code is available over on GitHub.

       

Sharing Memory Between JVMs

$
0
0

1. Introduction

In this tutorial, we’ll show how to share memory between two or more JVMs running on the same machine. This capability enables very fast inter-process communication since we can move data blocks around without any I/O operation.

2. How Shared Memory Works?

A process running in any modern operating system gets what’s called a virtual memory space. We call it virtual because, although it looks like a large, continuous, and private addressable memory space, in fact, it’s made of pages spread all over the physical RAM. Here, page is just OS slang for a block of contiguous memory, whose size range depends on the particular CPU architecture in use. For x86-84, a page can be as small as 4KB or as large as 1 GB.

At a given time, only a fraction of this virtual space is actually mapped to physical pages. As time passes and the process starts to consume more memory for its tasks, the OS starts to allocate more physical pages and map them to the virtual space. When the demand for memory exceeds what’s physically available, the OS will start to swap out pages that are not being used at that moment to secondary storage to make room for the request.

A shared memory block behaves just like regular memory, but, in contrast with regular memory, it is not private to a single process. When a process changes the contents of any byte within this block, any other process with access to this same shared memory “sees” this change instantly.

This is a list of common uses for shared memory:

  • Debuggers (ever wondered how a debugger can inspect variables in another process?)
  • Inter-process communication
  • Read-only content sharing between processes (ex: dynamic library code)
  • Hacks of all sorts ;^)

3. Shared Memory and Memory-Mapped Files

A memory-mapped file, as the name suggests, is a regular file whose contents are directly mapped to a contiguous area in the virtual memory of a process. This means that we can read and/or change its contents without explicit use of I/O operations. The OS will detect any writes to the mapped area and will schedule a background I/O operation to persist the modified data.

Since there are no guarantees on when this background operation will happen, the OS also offers a system call to flush any pending changes. This is important for use cases like database redo logs, but not needed for our inter-process communication (IPC, for short) scenario.

Memory-mapped files are commonly used by database servers to achieve high throughput I/O operations, but we can also use them to bootstrap a shared-memory-based IPC mechanism. The basic idea is that all processes that need to share data map the same file and, voilà, they now have a shared memory area.

4. Creating Memory-Mapped Files in Java

In Java, we use the FileChannel‘s map() method to map a region of a file into memory, which returns a MappedByteBuffer that allows us to access its contents:

MappedByteBuffer createSharedMemory(String path, long size) {
    try (FileChannel fc = (FileChannel)Files.newByteChannel(new File(path).toPath(),
      EnumSet.of(
        StandardOpenOption.CREATE,
        StandardOpenOption.SPARSE,
        StandardOpenOption.WRITE,
        StandardOpenOption.READ))) {
        return fc.map(FileChannel.MapMode.READ_WRITE, 0, size);
    }
    catch( IOException ioe) {
        throw new RuntimeException(ioe);
    }
}

The use of the SPARSE option here is quite relevant. As long the underlying OS and file system supports it, we can map sizable memory area without actually consuming disk space.

Now, let’s create a simple demo application. The Producer application will allocate a shared memory large enough to hold 64KB of data plus a SHA1 hash (20 bytes). Next, it will start a loop where it will fill the buffer with random data, followed by its SHA1 hash. We’ll repeat this operation continuously for 30 seconds and then exit:

// ... SHA1 digest initialization omitted
MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
Random rnd = new Random();
long start = System.currentTimeMillis();
long iterations = 0;
int capacity = shm.capacity();
System.out.println("Starting producer iterations...");
while(System.currentTimeMillis() - start < 30000) {
    for (int i = 0; i < capacity - hashLen; i++) {
        byte value = (byte) (rnd.nextInt(256) & 0x00ff);
        digest.update(value);
        shm.put(i, value);
    }
    // Write hash at the end
    byte[] hash = digest.digest();
    shm.put(capacity - hashLen, hash);
    iterations++;
}
System.out.printf("%d iterations run\n", iterations);

To test that we indeed can share memory, we’ll also create a Consumer app that will read the buffer’s content, compute its hash, and compare it with the Producer-generated one. We’ll repeat this process for 30 seconds. At each iteration, will also compute the buffer content’s hash and compare it with the one present at the buffer’s end:

// ... digest initialization omitted
MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
long start = System.currentTimeMillis();
long iterations = 0;
int capacity = shm.capacity();
System.out.println("Starting consumer iterations...");
long matchCount = 0;
long mismatchCount = 0;
byte[] expectedHash = new byte[hashLen];
while (System.currentTimeMillis() - start < 30000) {
    for (int i = 0; i < capacity - 20; i++) {
        byte value = shm.get(i);
        digest.update(value);
    }
    byte[] hash = digest.digest();
    shm.get(capacity - hashLen, expectedHash);
    if (Arrays.equals(hash, expectedHash)) {
        matchCount++;
    } else {
        mismatchCount++;
    }
    iterations++;
}
System.out.printf("%d iterations run. matches=%d, mismatches=%d\n", iterations, matchCount, mismatchCount);

To test our memory-sharing scheme, let’s start both programs at the same time. This is their output when running on a 3Ghz, quad-core Intel I7 machine:

# Producer output
Starting producer iterations...
11722 iterations run
# Consumer output
Starting consumer iterations...
18893 iterations run. matches=11714, mismatches=7179

We can see that, in many cases, the consumer detects that the expected computed values are different. Welcome to the wonderful world of concurrency issues!

5. Synchronizing Shared Memory Access

The root cause for the issue we’ve seen is that we need to synchronize access to the shared memory buffer. The Consumer must wait for the Producer to finish writing the hash before it starts reading the data. On the other hand, the Producer also must wait for the Consumer to finish consuming the data before writing to it again.

For a regular multithreaded application, solving this issue is no big deal. The standard library offers several synchronization primitives that allow us to control who can write to the shared memory at a given time.

However, ours is a multi-JVM scenario, so none of those standard methods apply. So, what should we do? Well, the short answer is that we’ll have to cheat. We could resort to OS-specific mechanisms like semaphores, but this would hinder our application’s portability. Also, this implies using JNI or JNA, which also complicates things.

Enter Unsafe. Despite its somewhat scary name, this standard library class offers exactly what we need to implement a simple lock mechanism: the compareAndSwapInt() method.

This method implements an atomic test-and-set primitive that takes four arguments. Although not clearly stated in the documentation, it can target not only Java objects but also a raw memory address. For the latter, we pass null in the first argument, which makes it treat the offset argument as a virtual memory address.

When we call this method, it will first check the value at the target address and compare it with the expected value. If they’re equal, then it will modify the location’s content to the new value and return true indicating success. If the value at the location is different from expected, nothing happens, and the method returns false.

More importantly, this atomic operation is guaranteed to work even in multicore architectures, which is a critical feature for synchronizing multiple executing threads.

Let’s create a SpinLock class that takes advantage of this method to implement a (very!) simple lock mechanism:

//... package and imports omitted
public class SpinLock {
    private static final Unsafe unsafe;
    // ... unsafe initialization omitted
    private final long addr;
    public SpinLock(long addr) {
        this.addr = addr;
    }
    public boolean tryLock(long maxWait) {
        long deadline = System.currentTimeMillis() + maxWait;
        while (System.currentTimeMillis() < deadline ) {
            if (unsafe.compareAndSwapInt(null, addr, 0, 1)) {
                return true;
            }
        }
        return false;
    }
    public void unlock() {
        unsafe.putInt(addr, 0);
    }
}

This implementation lacks key features, like checking whether it owns the lock before releasing it, but it will suffice for our purpose.

Okay, so how do we get the memory address that we’ll use to store the lock status? This must be an address within the shared memory buffer so both processes can use it, but the MappedByteBuffer class does not expose the actual memory address.

Inspecting the object that map() returns, we can see that it is a DirectByteBuffer. This class has a public method called address() that returns exactly what we want. Unfortunately, this class is package-private so we can’t use a simple cast to access this method.

To bypass this limitation, we’ll cheat a little again and use reflection to invoke this method:

private static long getBufferAddress(MappedByteBuffer shm) {
    try {
        Class<?> cls = shm.getClass();
        Method maddr = cls.getMethod("address");
        maddr.setAccessible(true);
        Long addr = (Long) maddr.invoke(shm);
        if (addr == null) {
            throw new RuntimeException("Unable to retrieve buffer's address");
        }
        return addr;
    } catch (NoSuchMethodException | InvocationTargetException | IllegalAccessException ex) {
        throw new RuntimeException(ex);
    }
}

Here, we’re using setAccessible() to make the address() method callable through the Method handle. However, be aware that, from Java 17 onwards, this technique won’t work unless we explicitly use the runtime –add-opens flag.

6. Adding Synchronization to Producer and Consumer

Now that we have a lock mechanism, let’s apply it to the Producer first. For the purposes of this demo, we’ll assume that the Producer will always start before the Consumer. We need this so we can initialize the buffer, clearing its content including the area we’ll use with the SpinLock:

public static void main(String[] args) throws Exception {
    // ... digest initialization omitted
    MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
    // Cleanup lock area 
    shm.putInt(0, 0);
    long addr = getBufferAddress(shm);
    System.out.println("Starting producer iterations...");
    long start = System.currentTimeMillis();
    long iterations = 0;
    Random rnd = new Random();
    int capacity = shm.capacity();
    SpinLock lock = new SpinLock(addr);
    while(System.currentTimeMillis() - start < 30000) {
        if (!lock.tryLock(5000)) {
            throw new RuntimeException("Unable to acquire lock");
        }
        try {
            // Skip the first 4 bytes, as they're used by the lock
            for (int i = 4; i < capacity - hashLen; i++) {
                byte value = (byte) (rnd.nextInt(256) & 0x00ff);
                digest.update(value);
                shm.put(i, value);
            }
            // Write hash at the end
            byte[] hash = digest.digest();
            shm.put(capacity - hashLen, hash);
            iterations++;
        }
        finally {
            lock.unlock();
        }
    }
    System.out.printf("%d iterations run\n", iterations);
}

Compared to the unsynchronized version, there are just minor changes:

  • Retrieve the memory address associated with the MappedByteBufer
  • Create a SpinLock instance using this address. The lock uses an int, so it will take the four initial bytes of the buffer
  • Use the SpinLock instance to protect the code that fills the buffer with random data and its hash

Now, let’s apply similar changes to the Consumer side:

private static void main(String args[]) throws Exception {
    // ... digest initialization omitted
    MappedByteBuffer shm = createSharedMemory("some_path.dat", 64*1024 + 20);
    long addr = getBufferAddress(shm);
    System.out.println("Starting consumer iterations...");
    Random rnd = new Random();
    long start = System.currentTimeMillis();
    long iterations = 0;
    int capacity = shm.capacity();
    long matchCount = 0;
    long mismatchCount = 0;
    byte[] expectedHash = new byte[hashLen];
    SpinLock lock = new SpinLock(addr);
    while (System.currentTimeMillis() - start < 30000) {
        if (!lock.tryLock(5000)) {
            throw new RuntimeException("Unable to acquire lock");
        }
        try {
            for (int i = 4; i < capacity - hashLen; i++) {
                byte value = shm.get(i);
                digest.update(value);
            }
            byte[] hash = digest.digest();
            shm.get(capacity - hashLen, expectedHash);
            if (Arrays.equals(hash, expectedHash)) {
                matchCount++;
            } else {
                mismatchCount++;
            }
            iterations++;
        } finally {
            lock.unlock();
        }
    }
    System.out.printf("%d iterations run. matches=%d, mismatches=%d\n", iterations, matchCount, mismatchCount);
}

With those changes, we can now run both sides and compare them with the previous result:

# Producer output
Starting producer iterations...
8543 iterations run
# Consumer output
Starting consumer iterations...
8607 iterations run. matches=8607, mismatches=0

As expected, the reported iteration count will be lower compared to the non-synchronized version. The main reason is that we spend most part of the time within the critical section of the code holding the lock. Whichever program holding the lock prevents the other side from doing anything.

If we compare the average iteration count reported from the first case, it will be approximately the same as the sum of iterations we got this time. This shows that the overhead added by the lock mechanism itself is minimal.

6. Conclusion

In this tutorial, we’ve explored how to use share a memory area between two JVMs running on the same machine. We can use the technique presented here as the foundation for high-throughput, low-latency inter-process communication libraries.

As usual, all code is available over on GitHub.

       
Viewing all 4468 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>