Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4616

Sorting Alphanumeric Strings in Java

$
0
0
start here featured

1. Introduction

Sorting alphanumeric strings is a common task in Java when dealing with mixed sequences of numbers and letters. This is particularly useful in file sorting, database indexing, and UI display formatting applications.

In this tutorial, we’ll explore different ways to sort an alphanumeric string and an array containing alphanumeric strings using Java. We’ll start with a simple lexicographic sort and move to more advanced natural sorting techniques involving arrays of strings.

2. Problem Definition

Given a string that contains letters and digits, we aim to sort it while maintaining a logical order. Contrary to a purely lexicographic sort, in which we place all numbers before letters based on ASCII values, a more intuitive approach would consider numerical values as whole numbers instead of individual digits. This distinction becomes particularly important when we have a use case such as sorting filenames, i.e. “file1”, “file10”, “file3” should be ordered as “file1”, “file3”, “file10”.

For the sake of simplicity, in all our cases involving arrays of strings, we assume an ALPHANUMERIC pattern, meaning the letters come first and digits follow for each string.

We’ll explore different sorting strategies by first addressing how to sort individual alphanumeric strings lexicographically. Then, we’ll transition to array sorting techniques, which respect natural numeric order and case insensitivity.

3. Sorting a String Alphanumerically (Lexicographically)

The simplest approach involves converting the string to a character array and sorting it using Java’s built-in sorting methods:

public static String lexicographicSort(String input) {
    char[] stringChars = input.toCharArray();
    Arrays.sort(stringChars);
    return new String(stringChars);
}

Here, we take a string as input and sort it based on ASCII, placing digits before uppercase letters. We can test our implementation using the following string as input – “C4B3A21”:

@Test 

void givenAlphanumericString_whenLexicographicSort_thenSortedLettersFirst() { String stringToSort = "C4B3A21"; String sorted = AlphanumericSort.lexicographicSort(stringToSort); assertThat(sorted).isEqualTo("1234ABC"); }

As we can see above, we obtain an output sorted into “1234ABC”, as we expected. While effective for basic sorting, this method does not handle numbers as whole values, leading to incorrect ordering in cases like filenames.

4. Custom Sorting for Natural Alphanumeric Order

To overcome the issue of lexicographic sorting treating numbers as individual characters, we introduce a custom Comparator that extracts and compares numeric values properly:

public static String[] naturalAlphanumericSort(String[] arrayToSort) {
    Arrays.sort(arrayToSort, new Comparator<String>() {
        @Override
        public int compare(String s1, String s2) {
            return extractInt(s1) - extractInt(s2);
        }
        private int extractInt(String str) {
            String num = str.replaceAll("\\D+", "");
            return num.isEmpty() ? 0 : Integer.parseInt(num);
        }
    });
    return arrayToSort;
}

The array of strings we take as input first extracts the numeric values and removes all the non-numeric characters using the extractInt() helper method. The difference between the extracted integers determines their order. We can see that for a hypothetical list of filenames in which the letters come before digits, we get a natural ordering:

@Test
void givenAlphanumericArrayOfStrings_whenNaturalAlphanumericSort_thenSortNaturalOrder() {
    String[] arrayToSort = {"file2", "file10", "file0", "file1", "file20"};
    String[] sorted = AlphanumericSort.naturalAlphanumericSort(arrayToSort);
    assertThat(Arrays.toString(sorted)).isEqualTo("[file0, file1, file2, file10, file20]");
}

With this approach, we ensure an ordering that is numerical rather than lexicographic. However, it doesn’t handle case variations, which we address in the next section.

5. Sorting Mixed Case Alphanumeric Strings

In order to address the scenario of case-insensitive sorting while still keeping the alphanumeric order, we refine our Comparator:

public static String[] naturalAlphanumericCaseInsensitiveSort(String[] arrayToSort) {
    Arrays.sort(arrayToSort, Comparator.comparing((String s) -> s.replaceAll("\\d", "").toLowerCase())
      .thenComparingInt(s -> {
          String num = s.replaceAll("\\D+", "");
          return num.isEmpty() ? 0 : Integer.parseInt(num);
      }).thenComparing(Comparator.naturalOrder()));
    return arrayToSort;
}

In this implementation, we sort an array of alphanumeric strings in a case-insensitive natural order by first comparing the non-numeric prefix (ignoring case), then we sort the strings by the numeric portion (as an integer instead of a character-wise), and finally, we use lexicographic order as a tiebreaker to preserve the original case order of strings.

We test our solution with appropriate input, as well:

@Test
void givenAlphanumericArrayOfStrings_whenAlphanumericCaseInsensitiveSort_thenSortNaturalOrder() {
    String[] arrayToSort = {"a2", "A10", "b1", "B3", "A2"};
    String[] sorted = AlphanumericSort.naturalAlphanumericCaseInsensitiveSort(arrayToSort);
    assertThat(Arrays.toString(sorted)).isEqualTo("[A2, a2, A10, b1, B3]");
}

We obtain a correctly ordered array of strings with the character portions sorted appropriately, followed by the numeric values in their natural order. 

6. Conclusion

We explored three approaches to sorting alphanumeric strings in Java, from basic lexicographic sorting to natural sorting using custom comparators and mixed case handling. The best method depends on the specific requirements, such as performance, order constraints, and whether duplicate values should be allowed.

The complete source code for this article can be found over on GitHub.

The post Sorting Alphanumeric Strings in Java first appeared on Baeldung.
       

Viewing all articles
Browse latest Browse all 4616

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>