1. Introduction
In this article, we’ll see how the groupingBy collector works using various examples.
To understand the material covered in this article, a basic knowledge of Java 8 features is needed. You can have a look at intro to Java 8 Streams and the guide to Java 8’s Collectors.
2. GroupingBy Collectors
The Java 8 Stream API lets us process collections of data in a declarative way.
The static factory methods Collectors.groupingBy() and Collectors.groupingByConcurrent() provide us with functionality similar to the ‘GROUP BY’ clause in the SQL language. They are used for grouping objects by some property and storing results in a Map instance.
The overloaded methods of groupingBy:
-
With a classification function as the method parameter:
static <T,K> Collector<T,?,Map<K,List<T>>> groupingBy(Function<? super T,? extends K> classifier)
-
With a classification function and a second collector as method parameters:
static <T,K,A,D> Collector<T,?,Map<K,D>> groupingBy(Function<? super T,? extends K> classifier, Collector<? super T,A,D> downstream)
-
With a classification function, a supplier method (that provides the Map implementation that will contain the end result), and a second collector as method parameters:
static <T,K,D,A,M extends Map<K,D>> Collector<T,?,M> groupingBy(Function<? super T,? extends K> classifier, Supplier<M> mapFactory, Collector<? super T,A,D> downstream)
2.1. Example Code Setup
To demonstrate the usage of groupingBy(), let’s define a BlogPost class (we will use a stream of BlogPost objects):
class BlogPost { String title; String author; BlogPostType type; int likes; }
The BlogPostType:
enum BlogPostType { NEWS, REVIEW, GUIDE }
The List of BlogPost objects:
List<BlogPost> posts = Arrays.asList( ... );
Let’s also define a Tuple class that will be used to group posts by the combination of their type and author attributes:
class Tuple { BlogPostType type; String author; }
2.2. Simple Grouping by a Single Column
Let’s start with the simplest groupingBy method, which only takes a classification function as its parameter. A classification function is applied to each element of the stream. The value that is returned by the function is used as a key to the map that we get from the groupingBy collector.
To group the blog posts in the blog post list by their type:
Map<BlogPostType, List<BlogPost>> postsPerType = posts.stream() .collect(groupingBy(BlogPost::getType));
2.3. Grouping by with a Complex Map Key Type
The classification function is not limited to returning only a scalar or String value. The key of the resulting map could be any object as long as we make sure that we implement the necessary equals and hashcode methods.
To group by the blog posts in the list by the type and author combined in a Tuple instance:
Map<Tuple, List<BlogPost>> postsPerTypeAndAuthor = posts.stream() .collect(groupingBy(post -> new Tuple(post.getType(), post.getAuthor())));
2.4. Modifying the Returned Map Value Type
The second overload of groupingBy takes an additional second collector (downstream collector), that is applied to the results of the first collector.
When we specify only a classification function and not a downstream collector, the toList() collector is used behind the scenes.
Let’s use the toSet() collector as the downstream collector and get a Set of blog posts (instead of a List):
Map<BlogPostType, Set<BlogPost>> postsPerType = posts.stream() .collect(groupingBy(BlogPost::getType, toSet()));
2.5. Providing a Secondary Group By Collector
A different application of the downstream collector is to do a secondary grouping by to the results of the first group by.
To group the List of BlogPosts first by author and then by type:
Map<String, Map<BlogPostType, List>> map = posts.stream() .collect(groupingBy(BlogPost::getAuthor, groupingBy(BlogPost::getType)));
2.6. Getting the Average from Grouped Results
By using the downstream collector we can apply aggregation functions in the results of the classification function.
To find the average number of likes for each blog post type:
Map<BlogPostType, Double> averageLikesPerType = posts.stream() .collect(groupingBy(BlogPost::getType, averagingInt(BlogPost::getLikes)));
2.7. Getting the Sum from Grouped Results
To calculate the total sum of likes for each type:
Map<BlogPostType, Integer> likesPerType = posts.stream() .collect(groupingBy(BlogPost::getType, summingInt(BlogPost::getLikes)));
2.8. Getting the Maximum or Minimum from Grouped Results
Another aggregation that we can perform is to get the blog post with the maximum number of likes:
Map<BlogPostType, Optional<BlogPost>> maxLikesPerPostType = posts.stream() .collect(groupingBy(BlogPost::getType, maxBy(comparingInt(BlogPost::getLikes))));
Similarly, we can apply the minBy downstream collector to get the blog post with the minimum number of likes.
Note that the maxBy and minBy collectors take into account the possibility that the collection to which it is applied could be empty. This is why the value type in the map is Optional<BlogPost>.
2.9. Getting a Summary for an Attribute of Grouped Results
The Collectors API offers a summarizing collector that can be used in cases when we need to calculate the count, sum, minimum, maximum and average of a numerical attribute at the same time.
Let’s calculate a summary for the likes attribute of the blog posts for each different type:
Map<BlogPostType, IntSummaryStatistics> likeStatisticsPerType = posts.stream() .collect(groupingBy(BlogPost::getType, summarizingInt(BlogPost::getLikes)));
The IntSummaryStatistics object for each type contains the count, sum, average, min and max values for the likes attribute. Additional summary objects exist for double and long values.
2.10. Mapping Grouped Results to a Different Type
More complex aggregations can be achieved by applying a mapping downstream collector to the results of the classification function.
Let’s get a concatenation of the titles of the posts for each blog post type:
Map<BlogPostType, String> postsPerType = posts.stream() .collect(groupingBy(BlogPost::getType, mapping(BlogPost::getTitle, joining(", ", "Post titles: [", "]"))));
What we have done here is to map each BlogPost instance to its title and then reduce the stream of post titles to a concatenated String. In this example, the type of the Map value is also different from the default List type.
2.11. Modifying the Return Map Type
When using the groupingBy collector, we cannot make assumptions about the type of the returned Map. If we want to be specific about which type of Map we want to get from the group by then we can use the third variation of the groupingBy method that allows us to change the type of the Map by passing a Map supplier function.
Let’s retrieve an EnumMap by the passing an EnumMap supplier function to the groupingBy method:
EnumMap<BlogPostType, List<BlogPost>> postsPerType = posts.stream() .collect(groupingBy(BlogPost::getType, () -> new EnumMap<>(BlogPostType.class), toList()));
3. Concurrent Grouping By Collector
Similar to the groupingBy, there is the groupingByConcurrent collector, which leverages multi-core architectures. This collector has three overloaded methods that take exactly the same arguments as the respective overloaded methods of the groupingBy collector. The return type of the groupingByConcurrent collector, however, must be an instance of the ConcurrentHashMap class or a subclass of it.
To do a grouping operation concurrently, the stream needs to be parallel:
ConcurrentMap<BlogPostType, List<BlogPost>> postsPerType = posts.parallelStream() .collect(groupingByConcurrent(BlogPost::getType));
If we choose to pass a Map supplier function to the groupingByConcurrent collector, then we need to make sure that the function returns either a ConcurrentHashMap or a subclass of it.
4. Conclusion
In this article, we have seen several examples of the usage of the groupingBy collector that is offered by the Java 8 Collectors API.
We saw how groupingBy can be used to classify a stream of elements based on one of their attributes and how the results of the classification can be further collected, mutated and reduced to final containers.
The complete implementation of the examples for this article can be found in the GitHub project.