Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4536

Efficient Word Frequency Calculator in Java

$
0
0

1. Overview

In this tutorial, we’ll show various ways of implementing a word counter in Java.

2. Counter Implementations

Let’s start by simply calculating the word count of words in this array:

static String[] COUNTRY_NAMES 
  = { "China", "Australia", "India", "USA", "USSR", "UK", "China", 
  "France", "Poland", "Austria", "India", "USA", "Egypt", "China" };

If we want to process huge files, we need to go for other options described here.

2.1. Map with Integers

One of the simplest solutions would be to create a Map, store words as keys and the number of occurrences as values:

Map<String, Integer> counterMap = new HashMap<>();

for (String country : COUNTRY_NAMES) { 
    counterMap.compute(country, (k, v) -> v == null ? 1 : v + 1); 
}

assertEquals(3, counterMap.get("China").intValue());
assertEquals(2, counterMap.get("India").intValue());

We simply used Map‘s handy compute method which increments the counter or initializes it with 1 if the value isn’t present.

However, this method of creating counter isn’t efficient as Integer is immutable, so every time when we increment the counter, we create a new Integer object.

2.2. Stream API

Now, let’s leverage Java 8 Stream API, parallel Streams, and the groupingBy() collector:

@Test
public void whenMapWithLambdaAndWrapperCounter_runsSuccessfully() {
    Map<String, Long> counterMap = new HashMap<>();
 
    Stream.of(COUNTRY_NAMES)
      .collect(Collectors.groupingBy(k -> k, ()-> counterMap,
	    Collectors.counting());

    assertEquals(3, counterMap.get("China").intValue());
    assertEquals(2, counterMap.get("India").intValue());
}

Similarly, we could use a parallelStream:

@Test
public void whenMapWithLambdaAndWrapperCounter_runsSuccessfully() {
    Map<String, Long> counterMap = new HashMap<>();
 
    Stream.of(COUNTRY_NAMES).parallel()
      .collect(Collectors.groupingBy(k -> k, ()-> counterMap,
	    Collectors.counting());

    assertEquals(3, counterMap.get("China").intValue());
    assertEquals(2, counterMap.get("India").intValue());
}

2.3. Map with an Integer Array

Next, let’s use a Map that wraps a counter within an Integer array used as a value:

@Test
public void whenMapWithPrimitiveArrayCounter_runsSuccessfully() {
    Map<String, int[]> counterMap = new HashMap<>();

    counterWithPrimitiveArray(counterMap);

    assertEquals(3, counterMap.get("China")[0]);
    assertEquals(2, counterMap.get("India")[0]);
}
 
private void counterWithPrimitiveArray(Map<String, int[]> counterMap) {
    for (String country : COUNTRY_NAMES) {
        counterMap.compute(country, (k, v) -> v == null ? 
          new int[] { 0 } : v)[0]++;
    }
}

Note how we created a simple HashMap with int arrays as values.

In the counterWithPrimitiveArray method, while iterating over each value of the array, we:

  • invoke a get on the counterMap by passing the country name as a key
  • check whether a key was already present or not. If the entry is already present, we create a new instance of primitive integer array with a single “1”. If the entry is absent, we increment the counter value present in the array

This method is better than the wrapper implementation – as it creates fewer objects.

2.4. Map with a MutableInteger

Next, let’s create a wrapper object which embeds a primitive integer counter as below:

private static class MutableInteger {
    int count = 1;
	
    public void increment() {
        this.count++;
    }
	
    // getter and setter
}

Let’s see how we can make use of above class as a counter:

@Test
public void whenMapWithMutableIntegerCounter_runsSuccessfully() {
    Map<String, MutableInteger> counterMap = new HashMap<>();

    mapWithMutableInteger(counterMap);

    assertEquals(3, counterMap.get("China").getCount());
    assertEquals(2, counterMap.get("India").getCount());
}
private void counterWithMutableInteger(
  Map<String, MutableInteger> counterMap) {
    for (String country : COUNTRY_NAMES) {
        counterMap.compute(country, (k, v) -> v == null 
          ? new MutableInteger(0) : v).increment();
    }
}

In the mapWithMutableInteger method, while iterating over each country in the COUNTRY_NAMES array, we:

  • invoke a get on the counterMap by passing the country name as a key
  • check whether the key is already present or not. If an entry is absent, we create an instance of MutableInteger which sets the counter value as 1. We increment the counter value present in the MutableInteger if the country is present in the map

This method of creating a counter is better than the previous one – as we’re reusing the same MutableInteger and thereby creating fewer objects.

This is how Apache Collections HashMultiSet works where it embeds a HashMap with value as MutableInteger internally.

3. Performance Analysis

Here’s the chart that compares the performance of each and every method listed above.


Above chart is created by using JMH and here’s the code that created the statistics above:

Map<String, Integer> counterMap = new HashMap<>();
Map<String, MutableInteger> counterMutableIntMap = new HashMap<>();
Map<String, int[]> counterWithIntArrayMap = new HashMap<>();
Map<String, Long> counterWithLongWrapperMap = new HashMap<>();
 
@Benchmark
public void wrapperAsCounter() {
    counterWithWrapperObject(counterMap);
}

@Benchmark
public void lambdaExpressionWithWrapper() {
    counterWithLambdaAndWrapper(counterWithLongWrapperMap );
}

@Benchmark
public void parallelStreamWithWrapper() {
    counterWithParallelStreamAndWrapper(counterWithLongWrapperStreamMap);
}
    
@Benchmark
public void mutableIntegerAsCounter() {
    counterWithMutableInteger(counterMutableIntMap);
}
    
@Benchmark
public void mapWithPrimitiveArray() {
   counterWithPrimitiveArray(counterWithIntArrayMap);
}

4. Conclusion

In this quick article, we illustrated various ways of creating word counters using Java.

The implementation of these examples can be found in the GitHub project – this is a Maven-based project, so it should be easy to import and run as is.


Viewing all articles
Browse latest Browse all 4536

Trending Articles