1. Introduction
We can find use (or abuse) of regular expressions in pretty much every kind of software, from quick scripts to incredibly complex applications.
In this article, we’ll see how to use regular expressions in Kotlin.
We won’t be discussing regular expression syntax; a familiarity with regular expressions, in general, is required to adequately follow the article, and knowledge of the Java Pattern syntax specifically is recommended.
2. Setup
While regular expressions aren’t part of the Kotlin language, they do come with its standard library.
We probably already have it as a dependency of our project:
<dependency> <groupId>org.jetbrains.kotlin</groupId> <artifactId>kotlin-stdlib</artifactId> <version>1.2.21</version> </dependency>
We can find the latest version of kotlin-stdlib on Maven Central.
3. Creating a Regular Expression Object
Regular expressions are instances of the kotlin.text.Regex class. We can create one in several ways.
A possibility is to call the Regex constructor:
Regex("a[bc]+d?")
or we can call the toRegex method on a String:
"a[bc]+d?".toRegex()
Finally, we can use a static factory method:
Regex.fromLiteral("a[bc]+d?")
Save from a difference explained in the next section, these options are equivalent and amount to personal preference. Just remember to be consistent!
Tip: regular expressions often contain characters that would be interpreted as escape sequences in String literals. We can thus use raw Strings to forget about multiple levels of escaping:
"""a[bc]+d?\W""".toRegex()
3.1. Matching Options
Both the Regex constructor and the toRegex method allow us to specify a single additional option or a set:
Regex("a(b|c)+d?", CANON_EQ) Regex("a(b|c)+d?", setOf(DOT_MATCHES_ALL, COMMENTS)) "a(b|c)+d?".toRegex(MULTILINE) "a(b|c)+d?".toRegex(setOf(IGNORE_CASE, COMMENTS, UNIX_LINES))
Options are enumerated in the RegexOption class, which we conveniently imported statically in the example above:
- IGNORE_CASE – enables case-insensitive matching
- MULTILINE – changes the meaning of ^ and $ (see Pattern)
- LITERAL – causes metacharacters or escape sequences in the pattern to be given no special meaning
- UNIX_LINES – in this mode, only the \n is recognized as a line terminator
- COMMENTS – permits whitespace and comments in the pattern
- DOT_MATCHES_ALL – causes the dot to match any character, including a line terminator
- CANON_EQ – enables equivalence by canonical decomposition (see Pattern)
4. Matching
We use regular expressions primarily to match input Strings, and sometimes to extract or replace parts of them.
We’ll now look in detail at the methods offered by Kotlin’s Regex class for matching Strings.
4.1. Checking Partial or Total Matches
In these use cases, we’re interested in knowing whether a String or a portion of a String satisfies our regular expression.
If we only need a partial match, we can use containsMatchIn:
val regex = """a([bc]+)d?""".toRegex() assertTrue(regex.containsMatchIn("xabcdy"))
If we want the whole String to match instead, we use matches:
assertTrue(regex.matches("abcd"))
Note that we can use matches as an infix operator as well:
assertFalse(regex matches "xabcdy")
4.2. Extracting Matching Components
In these use cases, we want to match a String against a regular expression and extract parts of the String.
We might want to match the entire String:
val matchResult = regex.matchEntire("abbccbbd")
Or we might want to find the first substring that matches:
val matchResult = regex.find("abcbabbd")
Or maybe to find all the matching substrings at once, as a Set:
val matchResults = regex.findAll("abcb abbd")
In either case, if the match is successful, the result will be one or more instances of the MatchResult class. In the next section, we’ll see how to use it.
If the match is not successful, instead, these methods return null or the empty Set in case of findAll.
4.3. The MatchResult Class
Instances of the MatchResult class represent successful matches of some input string against a regular expression; either complete or partial matches (see the previous section).
As such, they have a value, which is the matched String or substring:
val regex = """a([bc]+)d?""".toRegex() val matchResult = regex.find("abcb abbd") assertEquals("abcb", matchResult.value)
And they have a range of indices to indicate what portion of the input was matched:
assertEquals(IntRange(0, 3), matchResult.range)
4.4. Groups and Destructuring
We can also extract groups (matched substrings) from MatchResult instances.
We can obtain them as Strings:
assertEquals(listOf("abcb", "bcb"), matchResult.groupValues)
Or we can also view them as MatchGroup objects consisting of a value and a range:
assertEquals(IntRange(1, 3), matchResult.groups[1].range)
The group with index 0 is always the entire matched String. Indices greater than 0, instead, represent groups in the regular expression, delimited by parentheses, such as ([bc]+) in our example.
We can also destructure MatchResult instances in an assignment statement:
val regex = """([\w\s]+) is (\d+) years old""".toRegex() val matchResult = regex.find("Mickey Mouse is 95 years old") val (name, age) = matchResult!!.destructured assertEquals("Mickey Mouse", name) assertEquals("95", age)
4.5. Multiple Matches
MatchResult also has a next method that we can use to obtain the next match of the input String against the regular expression, if there is any:
val regex = """a([bc]+)d?""".toRegex() var matchResult = regex.find("abcb abbd") assertEquals("abcb", matchResult!!.value) matchResult = matchResult.next() assertEquals("abbd", matchResult!!.value) matchResult = matchResult.next() assertNull(matchResult)
As we can see, next returns null when there are no more matches.
5. Replacing
Another common use of regular expressions is replacing matching substrings with other Strings.
For this purpose, we have two methods readily available in the standard library.
One, replace, is for replacing all occurrences of a matching String:
val regex = """(red|green|blue)""".toRegex() val beautiful = "Roses are red, Violets are blue" val grim = regex.replace(beautiful, "dark") assertEquals("Roses are dark, Violets are dark", grim)
The other, replaceFirst, is for replacing only the first occurrence:
val shiny = regex.replaceFirst(beautiful, "rainbow") assertEquals("Roses are rainbow, Violets are blue", shiny)
5.1. Complex Replacements
For more advanced scenarios, when we don’t want to replace matches with constant Strings, but we want to apply a transformation instead, Regex still gives us what we need.
Enter the replace overload taking a closure:
val reallyBeautiful = regex.replace(beautiful) { m -> m.value.toUpperCase() + "!" } assertEquals("Roses are RED!, Violets are BLUE!", reallyBeautiful)
As we can see, for each match, we can compute a replacement String using that match.
6. Splitting
Finally, we might want to split a String into a list of substrings according to a regular expression. Again, Kotlin’s Regex has got us covered:
val regex = """\W+""".toRegex() val beautiful = "Roses are red, Violets are blue" assertEquals(listOf( "Roses", "are", "red", "Violets", "are", "blue"), regex.split(beautiful))
Here, the regular expression matches one or more non-word characters, so the result of the split operation is a list of words.
We can also put a limit on the length of the resulting list:
assertEquals(listOf("Roses", "are", "red", "Violets are blue"), regex.split(beautiful, 4))
7. Java Interoperability
If we need to pass our regular expression to Java code, or some other JVM language API that expects an instance of java.util.regex.Pattern, we can simply convert our Regex:
regex.toPattern()
8. Conclusions
In this article, we’ve examined the regular expression support in the Kotlin standard library.
For further information, see the Kotlin reference.
The implementation of all these examples and code snippets can be found in the GitHub project – this is a Maven project, so it should be easy to import and run as it’s.