1. Overview
In this tutorial, we’ll discuss techniques in Java to remove repeated characters from a string.
For each technique, we’ll also talk briefly about its time and space complexity.
2. Using distinct
Let’s start by removing the duplicates from our string using the distinct method introduced in Java 8.
In the code below we are obtaining an instance of an IntStream from a given string object. Then, we’re using the distinct method to remove the duplicates. Then, we use forEach to loop over the distinct characters and append them to our StringBuilder:
StringBuilder sb = new StringBuilder(); str.chars().distinct().forEach(c -> sb.append((char) c));
Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string
Auxiliary Space: O(n) – since distinct creates an intermediate LinkedHashSet to maintain the order
Maintains Order: Yes
And, while it’s nice that Java 8 buttons up this task for us so nicely, let’s compare it to efforts to roll our own.
3. Using indexOf
The naive approach to removing duplicates from a string simply involves looping over the input and using the method indexOf to check whether the character exists in the resulting string already:
StringBuilder sb = new StringBuilder(); int idx; for (int i = 0; i < str.length(); i++) { char c = str.charAt(i); idx = str.indexOf(c, i + 1); if (idx == -1) { sb.append(c); } }
Time Complexity: O(n * n) – runtime of the loop is directly proportional to the square of the size of the input data set
Auxiliary Space: O(1) – constant space is required to store our index and character values within the loop
Maintains Order: Yes
This does have the benefit of not having auxiliary space but performs much slower than the Core Java approach.
4. Using a Character Array
We can also remove duplicates from our string by converting it into a char array and then looping over each character and comparing it to all subsequent characters.
As we can see below, we create two for loops and are checking whether each element is repeated in the string. If a duplicate is found we increment repeatedCtr so that we don’t append it to the StringBuilder:
char[] chars = str.toCharArray(); StringBuilder sb = new StringBuilder(); boolean repeatedChar; for (int i = 0; i < chars.length; i++) { repeatedChar = false; for (int j = i + 1; j < chars.length; j++) { if (chars[i] == chars[j]) { repeatedChar = true; break; } } if (!repeatedChar) { sb.append(chars[i]); } }
Time Complexity: O(n * n) – we have an inner and outer loop both required n iterations
Auxiliary Space: O(1) – constant space is required to store repeatedCtr which depends on the number of repeated characters in the input string
Maintains Order: No
Again, our second attempt performs poorly compared to the Core Java offering, but let’s see where we get with our next attempt.
5. Using Sorting
Alternatively, repeated characters can be eliminated by sorting our input string to group duplicates. In order to do that, we have to convert the string to a char array and sort it using the Arrays static method sort. Finally, we will iterate over the sorted char array.
During every iteration, we’re going to compare each element of the array with the previous element. If the elements are different then we will append the current character to the StringBuilder:
StringBuilder sb = new StringBuilder(); if(!str.isEmpty()) { char[] chars = str.toCharArray(); Arrays.sort(chars); sb.append(chars[0]); for (int i = 1; i < chars.length; i++) { if (chars[i] != chars[i - 1]) { sb.append(chars[i]); } } }
Time Complexity: O(n log n) – comparison sorting on an array has a worst case time complexity of O(n log n)
Auxiliary Space: O(n) – the sort uses Quicksort which uses O(n) auxiliary space, and toCharArray makes a copy of the String anyway
Maintains Order: No
So, we are starting to see a space-time tradeoff. Here, we traded some space for some better performance. Let’s try that again with our final attempt.
6. Using a Set
Another way to remove repeated characters from a string is through the use of a Set. If we do not care about the order of characters in our output string we can use a HashSet. Otherwise, we can use a LinkedHashSet to maintain insertion order.
In both cases, we’ll loop over the input string and add each character to the Set. Once the characters are inserted into the set, we’ll iterate over it to add them to the StringBuilder and return our resulting string:
StringBuilder sb = new StringBuilder(); Set<Character> linkedHashSet = new LinkedHashSet<>(); for (int i = 0; i < str.length(); i++) { linkedHashSet.add(str.charAt(i)); } for (Character c : linkedHashSet) { sb.append(c); }
Time Complexity: O(n) – runtime of the loop is directly proportional to the size of the input string
Auxiliary Space: O(n) – space required for the Set depends on the size of the input string
Maintains Order: LinkedHashSet – Yes, HashSet – No
And now, we’ve matched the Core Java approach! It’s not very shocking to find out this is very similar to what distinct does already.
7. Conclusion
In this article, we covered a few ways to remove repeated characters from a string in Java. We also looked at the time and space complexity of these methods.
As always, code snippets can be found over on GitHub.