1. Introduction
We all know that splitting a string is a very common task. However, we often split using just one delimiter.
In this tutorial, we'll discuss in detail different options for splitting a string by multiple delimiters.
2. Splitting a Java String by Multiple Delimiters
In order to show how each of the solutions below performs splitting, we'll use the same example string:
String example = "Mary;Thomas:Jane-Kate";
String[] expectedArray = new String[]{"Mary", "Thomas", "Jane", "Kate"};
2.1. Regex Solution
Programmers often use different regular expressions to define a search pattern for strings. They're also a very popular solution when it comes to splitting a string. So, let's see how we can use a regular expression to split a string by multiple delimiters in Java.
First, we don't need to add a new dependency since regular expressions are available in the java.util.regex package. We just have to define an input string we want to split and a pattern.
The next step is to apply a pattern. A pattern can match zero or multiple times. To split by all different delimiters, we should use the OR operator. Using this logical operator, we define that an input string must match any one of the characters in the pattern.
We'll write a simple test to demonstrate this approach:
String[] names = example.split(";|:|-");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);
We've defined a test string with names that should be split by characters in the pattern. The pattern itself contains a semicolon, a colon, and a hyphen. When applied to the example string, we'll get four names in the array.
2.2. Guava Solution
Guava also offers a solution for splitting a string by multiple delimiters. Its solution is based on a Splitter class. This class extracts the substrings from an input string using the separator sequence. We can define this sequence in multiple ways:
- as a single character
- a fixed string
- a regular expression
- a CharMatcher instance
Further on, the Splitter class has two methods for defining the delimiters. So, let's test both of them.
Firstly, we'll add the Guava dependency:
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>31.0.1-jre</version>
</dependency>
Then, we'll start with the on method: public static Splitter on(Pattern separatorPattern)
It takes the pattern for defining the delimiters for splitting. First, we'll define the combination of the delimiters and compile the pattern. After that, we can split the string.
In our example, we'll use a regular expression to specify the delimiters:
Iterable<String> names = Splitter.on(Pattern.compile(";|:|-")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);
The other method is the onPattern method: public static Splitter onPattern(String separatorPattern)
The difference between this and the previous method is that the onPattern method takes the pattern as a string. There is no need to compile it like in the on method. We'll define the same combination of the delimiters for testing the onPattern method:
Iterable<String> names = Splitter.onPattern(";|:|-").split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);
In both tests, we managed to split the string and get the array with four names.
Since we're splitting an input string with multiple delimiters, we can also use the anyOf method in the CharMatcher class:
Iterable<String> names = Splitter.on(CharMatcher.anyOf(";:-")).split(example);
Assertions.assertEquals(4, Iterators.size(names.iterator()));
Assertions.assertIterableEquals(Arrays.asList(expectedArray), names);
This option comes only with the on method in the Splitter class. The outcome is the same as for the previous two tests.
2.3. Apache Commons Solution
The last option we'll discuss is available in the Apache Commons Lang 3 library.
We'll start by adding the Apache Commons Lang dependency to our pom.xml file:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.12.0</version>
</dependency>
Next, we'll use the split method from the StringUtils class:
String[] names = StringUtils.split(example, ";:-");
Assertions.assertEquals(4, names.length);
Assertions.assertArrayEquals(expectedArray, names);
We only have to define all the characters we'll use to split the string. Calling the split method will divide the example string into four names.
3. Conclusion
In this article, we've seen different options for splitting an input string by multiple delimiters. First, we discussed a solution based on regular expressions and plain Java. Later, we showed different options available in Guava. Finally, we wrapped up our examples with a solution based on the Apache Commons Lang 3 library.
As always, the code for these examples is available over on GitHub.