1. Introduction
In this article, we’ll be looking at how Stream implementations differ in Java and Vavr.
This article assumes familiarity with the basics of both Java Stream API and the Vavr library.
2. Comparison
Both implementations represent the same concept of lazy sequences but differ in details.
Java Streams were built with robust parallelism in mind, providing easy support for parallelization. On the other hand, Vavr implementation favors handy work with sequences of data and provides no native support for parallelism (but it can be achieved by converting an instance to a Java implementation).
This is why Java Streams are backed by Spliterator instances – an upgrade to the much older Iterator and Vavr’s implementation is backed by the aforementioned Iterator (at least in one of the latest implementations).
Both implementations are loosely tied to its backing data structure and are essentially facades on top of the source of data that the stream traverses, but since Vavr’s implementation is Iterator-based, it doesn’t tolerate concurrent modifications of the source collection.
Java’s handling of stream sources makes it possible for well-behaved stream sources in to be modified before the terminal stream operation gets executed.
The fundamental design difference notwithstanding, Vavr provides a very robust API that converts its streams (and other data structures) to Java implementation.
3. Additional Functionality
The approach to dealing with streams and their elements lead to interesting differences in the ways we can work with them in both Java and Vavr
3.1. Random Element Access
Providing convenient API and access methods to elements is one area that Vavr truly shines over the Java API. For example, Vavr has some methods that provide random element access:
- get() provides index-based access to elements of a stream.
- indexOf() provides the same index location functionality as in the standard Java List.
- insert() provides the ability to add an element to a stream at a specified position.
- intersperse() will insert the provided argument in between all the elements of the stream.
- find() will locate and return an item from within the stream. Java provides noneMatched which just checks for existence of an element.
- update() will replace the element at a given index. This also accepts a function to compute the replacement.
- search() will locate an item in a sorted stream (unsorted streams will yield an undefined result)
It’s important we remember that this functionality is still backed by a data structure that has a linear performance for searches.
3.2. Parallelism and Concurrent Modification
While Vavr’s Streams don’t natively support parallelism like Java’s parallel() method, there is the toJavaParallelStream method that provides a parallelized Java-based copy of the source Vavr stream.
An area of relative weakness in Vavr streams is on the principle of Non-Interference.
Simply put, Java streams allow us to modify the underlying data source right up until a terminal operation is called. As long as a terminal operation hasn’t been called on a given Java stream, the stream can pick up any changes to the underlying data source:
List<Integer> intList = new ArrayList<>(); intList.add(1); intList.add(2); intList.add(3); Stream<Integer> intStream = intList.stream(); //form the stream intList.add(5); //modify underlying list intStream.forEach(i -> System.out.println("In a Java stream: " + i));
We’ll find that the last addition is reflected in the output from the stream. This behavior is consistent whether the modification is internal or external to the stream pipeline:
in a Java stream: 1 in a Java stream: 2 in a Java stream: 3 in a Java stream: 5
We find that a Vavr stream won’t tolerate this:
Stream<Integer> vavrStream = Stream.ofAll(intList); intList.add(5) vavrStream.forEach(i -> System.out.println("in a Vavr Stream: " + i));
What we get:
Exception in thread "main" java.util.ConcurrentModificationException at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901) at java.util.ArrayList$Itr.next(ArrayList.java:851) at io.vavr.collection.StreamModule$StreamFactory.create(Stream.java:2078)
Vavr streams are not “well-behaved”, by Java standards. Vavr is better-behaved with primitive backing data structures:
int[] aStream = new int[]{1, 2, 4}; Stream<Integer> wrapped = Stream.ofAll(aStream); aStream[2] = 5; wrapped.forEach(i -> System.out.println("Vavr looped " + i));
Giving us:
Vavr looped 1 Vavr looped 2 Vavr looped 5
3.3. Short-circuiting Operations and flatMap()
The flatMap, like the map operation, is an intermediate operation in stream processing – both implementations follow the contract of intermediate stream operations – processing from the underlying data structure shouldn’t occur until a terminal operation has been called.
JDK 8 and 9 however feature a bug that causes the flatMap implementation to break this contract and evaluate eagerly when combined with short-circuiting intermediate operations like findFirst or limit.
A simple example:
Stream.of(42) .flatMap(i -> Stream.generate(() -> { System.out.println("nested call"); return 42; })) .findAny();
In the above snippet, we will never get a result from findAny because flatMap will be evaluated eagerly, instead of simply taking a single element from the nested Stream.
A fix for this bug was provided in Java 10.
Vavr’s flatMap doesn’t have the same problem and a functionally similar operation completes in O(1):
Stream.of(42) .flatMap(i -> Stream.continually(() -> { System.out.println("nested call"); return 42; })) .get(0);
3.4. Core Vavr Functionality
In some areas, there just isn’t a one to one comparison between Java and Vavr; Vavr enhances the streaming experience with functionality that is directly unmatched in Java (or at least requires a fair amount of manual work):
- zip() pairs up items in the stream with those from a supplied Iterable. This operation used to be supported in JDK-8 but has since been removed after build-93
- partition() will split the content of a stream into two streams, given a predicate.
- permutation() as named, will compute the permutation (all possible unique orderings) of the elements of the stream.
- combinations() gives the combination (i.e. possible selection of items) of the stream.
- groupBy will return a Map of streams containing elements from the original stream, categorized by a supplied classifier.
- distinct method in Vavr improves on the Java version by providing a variant that accepts a compareTo lambda expression.
While the support for advanced functionality is somewhat uninspired in Java SE streams, Expression Language 3.0 oddly provides support for way more functionality than standard JDK streams.
4. Stream Manipulation
Vavr allows direct manipulation of the content of a stream:
- Insert into an existing Vavr stream
Stream<String> vavredStream = Stream.of("foo", "bar", "baz"); vavredStream.forEach(item -> System.out.println("List items: " + item)); Stream<String> vavredStream2 = vavredStream.insert(2, "buzz"); vavredStream2.forEach(item -> System.out.println("List items: " + item));
- Remove an item from a stream
Stream<String> removed = inserted.remove("buzz");
- Queue-Based Operations
By Vavr’s stream being backed by a queue, it provides constant-time prepend and append operations.
However, changes made to the Vavr stream don’t propagate back to the data source that the stream was created from.
5. Conclusion
Vavr and Java both have their strengths, and we’ve demonstrated each library’s commitment to its design objectives – Java to cheap parallelism and Vavr to convenient stream operations.
With Vavr’s support for converting back and forth between its own stream and Java’s, one can derive the benefits of both libraries in the same project without a lot of overhead.
The source code for this tutorial is available over on Github.