1. Overview
In this quick article, we’ll explore a fundamental class in Java – the StringTokenizer.
2. StringTokenizer
The StringTokenizer class helps us split Strings into multiple tokens.
StreamTokenizer provides similar functionality but the tokenization method is much simpler than the one used by the StreamTokenizer class. Methods of StringTokenizer do not distinguish among identifiers, numbers, and quoted strings, nor recognize and skip comments.
The set of delimiters (the characters that separate tokens) may be specified either at the creation time or on a per-token basis.
3. Using the StringTokenizer
The simplest example of using StringTokenizer will be to split a String based on specified delimiters.
In this quick example, we’re going to split the argument String and add the tokens into a list:
public List<String> getTokens(String str) { List<String> tokens = new ArrayList<>(); StringTokenizer tokenizer = new StringTokenizer(str, ","); while (tokenizer.hasMoreElements()) { tokens.add(tokenizer.nextToken()); } return tokens; }
Notice how we’re breaking the String into the list of tokens based on delimiter ‘,‘. Then in the loop, using tokens.add() method; we are adding each token into the ArrayList.
For example, if a user gives input as “Welcome,to,baeldung.com“, this method should return a list containing a three-word fragment as “Welcome“, “to” and “baeldung.com“.
3.1. Java 8 Approach
Since StringTokenizer implements Enumeration<Object> interface, we can use it with Java‘s Collections interface.
If we consider the earlier example, we can retrieve the same set of tokens using Collections.list() method and Stream API:
public List<String> getTokensWithCollection(String str) { return Collections.list(new StringTokenizer(str, ",")).stream() .map(token -> (String) token) .collect(Collectors.toList()); }
Here, we are passing the StringTokenizer itself as a parameter in the Collections.list() method.
Point to note here is that, since the Enumeration is an Object type, we need to type-cast the tokens to String type (i.e. depends on the implementation; if we use List of Integer/Float then we’ll need to type-cast with Integer/Float).
3.2. Variants of StringTokenizer
StringTokenizer comes with two overloaded constructors beside the default constructor: StringTokenizer(String str) and StringTokenizer(String str, String delim, boolean returnDelims):
StringTokenizer(String str, String delim, boolean returnDelims) takes an extra boolean input. If the boolean value is true, then StringTokenizer considers the delimiter itself as a token and add it to its internal pool of tokens.
StringTokenizer(String str) is a shortcut for the previous example; it internally calls the other constructor with hard-coded delimiter as ” \t\n\r\f” and the boolean value as false.
3.3. Token Customization
StringTokenizer also comes with an overloaded nextToken() method which takes a string fragment as input. This String fragment acts as an extra set of delimiters; based on which tokens are re-organized again.
For example, if we can pass ‘e‘ in the nextToken() method to further break the string based on the delimiter ‘e‘:
tokens.add(tokenizer.nextToken("e"));
Hence, for a given string of ‘Hello,baeldung.com‘ we will produce following tokens:
H llo ba ldung.com
3.4. Token Length
To count the available numbers of tokens, we can use StringTokenizer‘s size method:
int tokenLength = tokens.size();
3.5. Reading From CSV File
Now, let’s try using StringTokenizer in a real use case.
There are scenarios where we try to read data from CSV files and parse the data based on the user-given delimiter.
Using StringTokenizer, we can easily get there:
public List<String> getTokensFromFile( String path , String delim ) { List<String> tokens = new ArrayList<>(); String currLine = ""; StringTokenizer tokenizer; try (BufferedReader br = new BufferedReader( new InputStreamReader(Application.class.getResourceAsStream( "/" + path )))) { while (( currLine = br.readLine()) != null ) { tokenizer = new StringTokenizer( currLine , delim ); while (tokenizer.hasMoreElements()) { tokens.add(tokenizer.nextToken()); } } } catch (IOException e) { e.printStackTrace(); } return tokens; }
Here, the function takes two arguments; one as CSV file name (i.e. read from the resources [src -> main -> resources] folder) and the other one as a delimiter.
Based on this two arguments, the CSV data is read line by line, and each line gets tokenized using StringTokenizer.
For example, we’ve put following content in the CSV:
1|IND|India 2|MY|Malaysia 3|AU|Australia
Hence, following tokens should be generated:
1 IND India 2 MY Malaysia 3 AU Australia
3.6. Testing
Now, let’s create a quick test case:
public class TokenizerTest { private MyTokenizer myTokenizer = new MyTokenizer(); private List<String> expectedTokensForString = Arrays.asList( "Welcome" , "to" , "baeldung.com" ); private List<String> expectedTokensForFile = Arrays.asList( "1" , "IND" , "India" , "2" , "MY" , "Malaysia" , "3", "AU" , "Australia" ); @Test public void givenString_thenGetListOfString() { String str = "Welcome,to,baeldung.com"; List<String> actualTokens = myTokenizer.getTokens( str ); assertEquals( expectedTokensForString, actualTokens ); } @Test public void givenFile_thenGetListOfString() { List<String> actualTokens = myTokenizer.getTokensFromFile( "data.csv", "|" ); assertEquals( expectedTokensForFile , actualTokens ); } }
4. Conclusion
In this quick tutorial, we had a look at some practical examples of using the core Java StringTokenizer.
Like always, the full source code is available over on GitHub.