1. Overview
Simply put, the Apache Commons Text library contains a number of useful utility methods for working with Strings, beyond what the core Java offers.
In this quick introduction, we’ll see what Apache Commons Text is, and what it is used for, as well as some practical examples of using the library.
2. Maven Dependency
Let’s start by adding the following Maven dependency to our pom.xml:
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-text</artifactId> <version>1.1</version> </dependency>
You can find the latest version of the library at the Maven Central Repository.
3. Overview
The root package org.apache.commons.text is divided into different sub-packages:
- org.apache.commons.text.diff – diffs between Strings
- org.apache.commons.text.similarity – similarities and distances between Strings
- org.apache.commons.text.translate – translating text
Let’s see what each package can be used for – in more detail.
3. Handling Text
The org.apache.commons.text package contains multiple tools for working with Strings.
For instance, WordUtils has APIs capable of capitalizing the first letter of each word in a String, swapping the case of a String, and checking if a String contains all words in a given array.
Let’s see how we can capitalize the first letter of each word in a String:
@Test public void whenCapitalized_thenCorrect() { String toBeCapitalized = "to be capitalized!"; String result = WordUtils.capitalize(toBeCapitalized); assertEquals("To Be Capitalized!", result); }
Here is how we can check if a string contains all words in an array:
@Test public void whenContainsWords_thenCorrect() { boolean containsWords = WordUtils .containsAllWords("String to search", "to", "search"); assertTrue(containsWords); }
StrSubstitutor provides a convenient way to building Strings from templates:
@Test public void whenSubstituted_thenCorrect() { Map<String, String> substitutes = new HashMap<>(); substitutes.put("name", "John"); substitutes.put("college", "University of Stanford"); String templateString = "My name is ${name} and I am a student at the ${college}."; StrSubstitutor sub = new StrSubstitutor(substitutes); String result = sub.replace(templateString); assertEquals("My name is John and I am a student at the University of Stanford.", result); }
StrBuilder is an alternative to Java.lang.StringBuilder. It provides some new features which are not provided by StringBuilder.
For example, we can replace all occurrences of a String in another String or clear a String without assigning a new object to its reference.
Here’s a quick example to replace part of a String:
@Test public void whenReplaced_thenCorrect() { StrBuilder strBuilder = new StrBuilder("example StrBuilder!"); strBuilder.replaceAll("example", "new"); assertEquals(new StrBuilder("new StrBuilder!"), strBuilder); }
To clear a String, we can simply do that by calling the clear() method on the builder:
strBuilder.clear();
4. Calculating the Diff between Strings
The package org.apache.commons.text.diff implements Myers algorithm for calculating diffs between two Strings.
The diff between two Strings is defined by a sequence of modifications that can convert one String to another.
There are three types of commands that can be used to convert a String to another – InsertCommand, KeepCommand, and DeleteCommand.
An EditScript object holds the script that should be run in order to convert a String to another. Let’s calculate the number of single-char modifications that should be made in order to convert a String to another:
@Test public void whenEditScript_thenCorrect() { StringsComparator cmp = new StringsComparator("ABCFGH", "BCDEFG"); EditScript<Character> script = cmp.getScript(); int mod = script.getModifications(); assertEquals(4, mod); }
5. Similarities and Distances between Strings
The org.apache.commons.text.similarity package contains algorithms useful for finding similarities and distances between Strings.
For example, LongestCommonSubsequence can be used to find the number of common characters in two Strings:
@Test public void whenCompare_thenCorrect() { LongestCommonSubsequence lcs = new LongestCommonSubsequence(); int countLcs = lcs.apply("New York", "New Hampshire"); assertEquals(5, countLcs); }
Similarly, LongestCommonSubsequenceDistance can be used to find the number of different characters in two Strings:
@Test public void whenCalculateDistance_thenCorrect() { LongestCommonSubsequenceDistance lcsd = new LongestCommonSubsequenceDistance(); int countLcsd = lcsd.apply("New York", "New Hampshire"); assertEquals(11, countLcsd); }
6. Text Translation
The org.apache.text.translate package was initially created to allow us to customize the rules provided by StringEscapeUtils.
The package has a set of classes which are responsible for translating text to some of the different character encoding models such as Unicode and Numeric Character Reference. We can also create our own customized routines for translation.
Let’s see how we can convert a String to its equivalent Unicode text:
@Test public void whenTranslate_thenCorrect() { UnicodeEscaper ue = UnicodeEscaper.above(0); String result = ue.translate("ABCD"); assertEquals("\\u0041\\u0042\\u0043\\u0044", result); }
Here, we are passing the index of the character that we want to start translation from to the above() method.
LookupTranslator enables us to define our own lookup table where each character can have a corresponding value, and we can translate any text to its corresponding equivalent.
7. Conclusion
In this quick tutorial, we’ve seen an overview of what Apache Commons Text is all about and some of its common features.
The code samples can be found over on GitHub.