1. Overview
When working with regular expressions in Java, we typically want to search a character sequence for a given Pattern. To facilitate this, the Java Regular Expressions API provides the Matcher class, which we can use to match a given regular expression against a text.
As a general rule, we'll almost always want to use one of two popular methods of the Matcher class:
- find()
- matches()
In this quick tutorial, we'll learn about the differences between these methods using a simple set of examples.
2. The find() Method
Put simply, the find() method tries to find the occurrence of a regex pattern within a given string. If multiple occurrences are found in the string, then the first call to find() will jump to the first occurrence. Thereafter, each subsequent call to the find() method will go to the next matching occurrence, one by one.
Let's imagine we want to search the provided string “goodbye 2019 and welcome 2020” for four-digit numbers only.
For this we'll be using the pattern “\\d\\d\\d\\d” :
@Test public void whenFindFourDigitWorks_thenCorrect() { Pattern stringPattern = Pattern.compile("\\d\\d\\d\\d"); Matcher m = stringPattern.matcher("goodbye 2019 and welcome 2020"); assertTrue(m.find()); assertEquals(8, m.start()); assertEquals("2019", m.group()); assertEquals(12, m.end()); assertTrue(m.find()); assertEquals(25, m.start()); assertEquals("2020", m.group()); assertEquals(29, m.end()); assertFalse(m.find()); }
As we have two occurrences in this example – 2019 and 2020 – the find() method will return true twice, and once it reaches the end of the match region, it'll return false.
Once we find any match, we can then use methods like start(), group(), and end() to get more details about the match, as shown above.
The start() method will give the start index of the match, end() will return the last index of the character after the end of the match, and group() will return the actual value of the match.
3. The find(int) Method
We also have the overloaded version of the find method — find(int). It takes the start index as a parameter and considers the start index as the starting point to look for occurrences in the string.
Let's see how to use this method in the same example as before:
@Test public void givenStartIndex_whenFindFourDigitWorks_thenCorrect() { Pattern stringPattern = Pattern.compile("\\d\\d\\d\\d"); Matcher m = stringPattern.matcher("goodbye 2019 and welcome 2020"); assertTrue(m.find(20)); assertEquals(25, m.start()); assertEquals("2020", m.group()); assertEquals(29, m.end()); }
As we have provided a start index of 20, we can see that there is now only one occurrence found — 2020, which occurs as expected after this index. And, as is the case with find(), we can use methods like start(), group(), and end() to extract more details about the match.
4. The matches() Method
On the other hand, the matches() method tries to match the whole string against the pattern.
For the same example, matches() will return false:
@Test public void whenMatchFourDigitWorks_thenFail() { Pattern stringPattern = Pattern.compile("\\d\\d\\d\\d"); Matcher m = stringPattern.matcher("goodbye 2019 and welcome 2020"); assertFalse(m.matches()); }
This is because it will try to match “\\d\\d\\d\\d” against the whole string “goodbye 2019 and welcome 2020” — unlike the find() and find(int) methods, both of which will find the occurrence of the pattern anywhere within the string.
If we change the string to the four-digit number “2019”, then matches() will return true:
@Test public void whenMatchFourDigitWorks_thenCorrect() { Pattern stringPattern = Pattern.compile("\\d\\d\\d\\d"); Matcher m = stringPattern.matcher("2019"); assertTrue(m.matches()); assertEquals(0, m.start()); assertEquals("2019", m.group()); assertEquals(4, m.end()); assertTrue(m.matches()); }
As shown above, we can also use methods like start(), group(), and end() to gather more details about the match. One interesting point to note is that calling find() multiple times may return different output after calling these methods, as we saw in our first example, but matches() will always return the same value.
5. Conclusion
In this article, we’ve seen how find(), find(int), and matches() differ from each other with a practical example. We've also seen how various methods like start(), group(), and end() can help us extract more details about a given match.
As always, the full source code of the article is available over on GitHub.