1. Introduction
In this tutorial, we’ll consider how we can localize and format messages based on Locale.
We’ll use both Java’s MessageFormat and the third-party library, ICU.
2. Localization Use Case
When our application acquires a wide audience of users from all over the world, we may naturally want to show different messages based on the user’s preferences.
The first and most important aspect is the language that the user speaks. Others might include currency, number and date formats. Last but not least are cultural preferences: what is acceptable for users from one country might be intolerable for others.
Suppose that we have an email client and we want to show notifications when a new message arrives.
A simple example of such a message might be this one:
Alice has sent you a message.
It’s fine for English-speaking users, but non-English speaking ones might be not that happy. For example, French-speaking users would prefer to see this message:
Alice vous a envoyé un message.
While Polish people would be pleased by seeing this one:
Alice wysłała ci wiadomość.
What if we want to have a properly-formatted notification even in the case when Alice sends not just one message, but few messages?
We might be tempted to address the issue by concatenating various pieces in a single string, like this:
String message = "Alice has sent " + quantity + " messages";
The situation can easily get out of control when we need notifications in the case when not only Alice but also Bob might send the messages:
Bob has sent two messages. Bob a envoyé deux messages. Bob wysłał dwie wiadomości.
Notice, how the verb changes in the case of Polish (wysłała vs wysłał) language. It illustrates the fact that banal string concatenation is rarely acceptable for localizing messages.
As we see, we get two types of issues: one is related to translations and the other is related to formats. Let’s address them in the following sections.
3. Message Localization
We may define the localization, or l10n, of an application as the process of adapting the application to the user’s comfort. Sometimes, the term internalization, or i18n, is also used.
In order to localize the application, first of all, let’s eliminate all hardcoded messages by moving them into our resources folder:
Each file should contain key-value pairs with the messages in the corresponding language. For example, file messages_en.properties should contain the following pair:
label=Alice has sent you a message.
messages_pl.properties should contain the following pair:
label=Alice wysłała ci wiadomość.
Similarly, other files assign appropriate values to the key label. Now, in order to pick up the English version of the notification, we can use ResourceBundle:
ResourceBundle bundle = ResourceBundle.getBundle("messages", Locale.UK); String message = bundle.getString("label");
The value of the variable message will be “Alice has sent you a message.”
Java’s Locale class contains shortcuts to frequently used languages and countries.
In the case of the Polish language, we might write the following:
ResourceBundle bundle = ResourceBundle.getBundle("messages", Locale.forLanguageTag("pl-PL")); String message = bundle.getString("label");
Let’s just mention that if we provide no locale, then the system will use a default one. We may more details on this issue in our article “Internationalization and Localization in Java 8“. Then, among available translations, the system will choose the one that is the most similar to the currently active locale.
Placing the messages in the resource files is a good step towards rendering the application more user-friendly. It makes it easier to translate the whole application for the following reasons:
- a translator does not have to look through the application in search of the messages
- a translator can see the whole phrase which helps to grasp the context and hence facilitates a better translation
- we don’t have to recompile the whole application when a translation for a new language is ready
4. Message Format
Even though we have moved the messages from the code into a separate location, they still contain some hardcoded information. It would be nice to be able to customize the names and numbers in the messages in such a way that they remain grammatically correct.
We may define the formatting as a process of rendering the string template by substituting the placeholders by their values.
In the following sections, we’ll consider two solutions that allow us to format the messages.
4.1. Java’s MessageFormat
In order to format strings, Java defines numerous format methods in java.lang.String. But, we can get even more support via java.text.format.MessageFormat.
To illustrate, let’s create a pattern and feed it to a MessageFormat instance:
String pattern = "On {0, date}, {1} sent you " + "{2, choice, 0#no messages|1#a message|2#two messages|2<{2, number, integer} messages}."; MessageFormat formatter = new MessageFormat(pattern, Locale.UK);
The pattern string has slots for three placeholders.
If we supply each value:
String message = formatter.format(new Object[] {date, "Alice", 2});
Then MessageFormat will fill in the template and render our message:
On 27-Apr-2019, Alice sent you two messages.
4.2. MessageFormat Syntax
From the example above, we see that the message pattern:
pattern = "On {...}, {..} sent you {...}.";
contains placeholders which are the curly brackets {…} with a required argument index and two optional arguments, type and style:
{index} {index, type} {index, type, style}
The placeholder’s index corresponds to the position of an element from the array of objects that we want to insert.
When present, the type and style may take the following values:
type | style |
---|---|
number | integer, currency, percent, custom format |
date | short, medium, long, full, custom format |
time | short, medium, long, full, custom format |
choice | custom format |
The names of the types and styles largely speak for themselves, but we can consult the official documentation for more details.
Let’s take a closer look, though, at custom format.
In the example above, we used the following format expression:
{2, choice, 0#no messages|1#a message|2#two messages|2<{2, number, integer} messages}
In general, the choice style has the form of options separated by the vertical bar (or pipe):
Inside the options, the match value ki and the string vi are separated by # except for the last option. Notice that we may nest other patterns into the string vi as we did it for the last option:
{2, choice, ...|2<{2, number, integer} messages}
The choice type is a numeric-based one, so there is a natural ordering for the match values ki that split a numeric line into intervals:
If we give a value k that belongs to the interval [ki, ki+1) (the left end is included, the right one is excluded), then value vi is selected.
Let’s consider in more details the ranges of the chosen style. To this end, we take this pattern:
pattern = "You''ve got " + "{0, choice, 0#no messages|1#a message|2#two messages|2<{0, number, integer} messages}.";
and pass various values for its unique placeholder:
n | message |
---|---|
-1, 0, 0.5 | You’ve got no messages. |
1, 1.5 | You’ve got a message. |
2 | You’ve got two messages. |
2.5 | You’ve got 2 messages. |
5 | You’ve got 5 messages. |
4.3. Making Things Better
So, we’re now formatting our messages. But, the message itself remains hardcoded.
From the previous section, we know that we should extract the strings patterns to the resources. To separate our concerns, let’s create another bunch of resource files called formats:
In those, we’ll create a key called label with language-specific content.
For example, in the English version, we’ll put the following string:
label=On {0, date, full} {1} has sent you + {2, choice, 0#nothing|1#a message|2#two messages|2<{2,number,integer} messages}.
We should slightly modify the French version because of the zero message case:
label={0, date, short}, {1}{2, choice, 0# ne|0<} vous a envoyé + {2, choice, 0#aucun message|1#un message|2#deux messages|2<{2,number,integer} messages}.
And we’d need to do similar modifications as well in the Polish and Italian versions.
In fact, the Polish version exhibits yet another problem. According to the grammar of the Polish language (and many others), the verb has to agree in gender with the subject. We could resolve this problem by using the choice type, but let’s consider another solution.
4.4. ICU’s MessageFormat
Let’s use the International Components for Unicode (ICU) library. We have already mentioned it in our Convert a String to Title Case tutorial. It’s a mature and widely-used solution that allows us to customize the application for various languages.
Here, we’re not going to explore it in full details. We’ll just limit ourselves to what our toy application needs. For the most comprehensive and updated information, we should check the ICU’s official site.
At the time of writing, the latest version of ICU for Java (ICU4J) is 64.2. As usual, in order to start using it, we should add it as a dependency to our project:
<dependency> <groupId>com.ibm.icu</groupId> <artifactId>icu4j</artifactId> <version>64.2</version> </dependency>
Suppose that we want to have a properly formed notification in various languages and for different numbers of messages:
N | English | Polish |
---|---|---|
0 | Alice has sent you no messages. Bob has sent you no messages. |
Alice nie wysłała ci żadnej wiadomości. Bob nie wysłał ci żadnej wiadomości. |
1 | Alice has sent you a message. Bob has sent you a message. |
Alice wysłała ci wiadomość. Bob wysłał ci wiadomość. |
> 1 | Alice has sent you N messages. Bob has sent you N messages. |
Alice wysłała ci N wiadomości. Bob wysłał ci N wiadomości. |
First of all, we should create a pattern in the locale-specific resource files.
Let’s re-use the file formats.properties and add there a key label-icu with the following content:
label-icu={0} has sent you + {2, plural, =0 {no messages} =1 {a message} + other {{2, number, integer} messages}}.
It contains three placeholders which we feed by passing there a three-element array:
Object[] data = new Object[] { "Alice", "female", 0 }
We see that in the English version, the gender-valued placeholder is of no use, while in the Polish one:
label-icu={0} {2, plural, =0 {nie} other {}} + {1, select, male {wysłał} female {wysłała} other {wysłało}} + ci {2, plural, =0 {żadnych wiadomości} =1 {wiadomość} + other {{2, number, integer} wiadomości}}.
we use it in order to distinguish between wysłał/wysłała/wysłało.
5. Conclusion
In this tutorial, we considered how to localize and format the messages that we demonstrate to the users of our applications.
As always, the code snippets for this tutorial are on our Github repository.