1. Overview
When we work with Java, we often encounter tasks that require precision and a collaborative effort between elements. Removing characters from a string based on their presence in another string is one such problem.
In this tutorial, we’ll explore various techniques to achieve this task.
2. Introduction to the Problem
As usual, an example can help us understand the problem quickly. Let’s say we have two strings:
String STRING = "a b c d e f g h i j";
String OTHER = "bdfhj";
Our goal is to eliminate characters from the STRING string if they are present in the string OTHER. Thus, we expect to get this string as the result:
"a c e g i "
We’ll learn various approaches to solving this problem in this tutorial. Also, we’ll unit test these solutions to verify whether they produce the expected result.
3. Using Nested Loops
We know a string can be easily split into a char array using the standard toCharArray() method. So, a straightforward and classic approach is first converting the two strings to two char arrays. Then, for each character in STRING, we decide whether to remove it or not by checking if it’s present in OTHER.
We can use nested for loops to implement this logic:
String nestedLoopApproach(String theString, String other) {
StringBuilder sb = new StringBuilder();
for (char c : theString.toCharArray()) {
boolean found = false;
for (char o : other.toCharArray()) {
if (c == o) {
found = true;
break;
}
}
if (!found) {
sb.append(c);
}
}
return sb.toString();
}
It’s worth noting since Java strings are immutable objects, we use StringBuilder instead of the ‘+’ operator to concatenate strings to gain better performance.
Next, let’s create a test:
String result = nestedLoopApproach(STRING, OTHER);
assertEquals("a c e g i ", result);
The test passes if we give it a run, so the method does the job.
Since for each character in STRING, we check through the string OTHER, the time complexity of this solution is O(n2).
4. Replacing the Inner Loop With the indexOf() Method
In the nested loops solution, we created the boolean flag found to store if the current character has been found in the OTHER String and then decided if we need to keep or discard this character by checking the found flag.
Java provides the String.indexOf() method that allows us to locate a given character in a string. Further, if the string doesn’t contain the given character, the method returns -1.
So, if we make use of the String.indexOf() method, the inner loop and the found flag aren’t required:
String loopAndIndexOfApproach(String theString, String other) {
StringBuilder sb = new StringBuilder();
for (char c : theString.toCharArray()) {
if (other.indexOf(c) == -1) {
sb.append(c);
}
}
return sb.toString();
}
As we can see, this method’s code is easier to understand than the nested loops one, and it passes the test as well:
String result = loopAndIndexOfApproach(STRING, OTHER);
assertEquals("a c e g i ", result);
Although this implementation is compact and easy to read, as the String.indexOf() method internally searches the target character through the string by a loop, its time complexity is still O(n2).
Next, let’s see if we can find a solution with lower time complexity.
5. Using a HashSet
HashSet is a commonly used collection data structure. It stores the elements in an internal HashMap.
Since the hash function’s time complexity is O(1), HashSet‘s contains() method is an O(1) operation.
Therefore, we can first store all characters in the OTHER string in a HashSet and then check each character from STRING in the HashSet:
String hashSetApproach(String theString, String other) {
StringBuilder sb = new StringBuilder();
Set<Character> set = new HashSet<>(other.length());
for (char c : other.toCharArray()) {
set.add(c);
}
for (char i : theString.toCharArray()) {
if (set.contains(i)) {
continue;
}
sb.append(i);
}
return sb.toString();
}
As the code above shows, the implementation is quite straightforward. Now, let’s delve into its performance.
Initially, we iterate through one string to populate the Set object, making it an O(n) operation. Subsequently, for each character in the other string, we utilize the set.contains() method. This results in n times O(1), becoming another O(n) complexity. Therefore, the entire solution comprises two O(n) operations.
However, since the factor of two is a constant, the overall time complexity of the solution remains O(n). This stands out as a significant improvement compared to previous O(n2) solutions, demonstrating a considerably faster execution.
Finally, if we test the hashSetApproach() method, it gives the expected result:
String result = hashSetApproach(STRING, OTHER);
assertEquals("a c e g i ", result);
6. Conclusion
In this article, we explored three different approaches to removing characters from one string based on their presence in another.
Furthermore, we conducted a performance analysis, explicitly focusing on time complexity. The results revealed that both nested loops and loops utilizing indexOf() exhibit equivalent time complexities, while solutions employing HashSet to be the most efficient.
As always, the complete source code for the examples is available over on GitHub.