Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4536

Creating Unicode Character From Its Code Point Hex String

$
0
0

1. Overview

Java’s support for Unicode makes it straightforward to work with characters from diverse languages and scripts

In this tutorial, we’ll explore and learn how to obtain Unicode characters from their code points in Java.

2. Introduction to the Problem

Java’s Unicode support allows us to build internationalized applications quickly. Let’s look at a couple of examples:

static final String U_CHECK = "✅"; // U+2705
static final String U_STRONG = "强"; // U+5F3A

In the example above, both the check mark✅” and “” (“Strong” in Chinese) are Unicode characters.

We know that Unicode characters can be represented correctly in Java if our string follows the pattern of an escaped ‘u’ and a hexadecimal number, for example:

String check = "\u2705";
assertEquals(U_CHECK, check);
String strong = "\u5F3A";
assertEquals(U_STRONG, strong);

In some scenarios, we’re given the hexadecimal number after “\u” and need to get the corresponding Unicode character. For instance, the check mark “✅” should be produced when we receive the number “2705″ in the string format.

The first idea we might come up with is concatenating “\\u” and the number. However, this doesn’t do the job:

String check = "\\u" + "2705";
assertEquals("\\u2705", check);
String strong = "\\u" + "5F3A";
assertEquals("\\u5F3A", strong);

As the test shows, concatenating “\\u” and a number, such as “2705”, produces a literal string “\\u2705 instead of the check mark “✅”.

Next, let’s explore how to convert the given number to the Unicode string.

3. Understanding the Hexadecimal Number After “\u

Unicode assigns a unique code point to every character, providing a universal way to represent text across different languages and scripts. A code point is a numerical value that uniquely identifies a character in the Unicode standard.

To create a Unicode character in Java, we need to understand the code point of the desired character. Once we have the code point, we can use Java’s char data type and the escape sequence ‘\u’ to represent the Unicode character.

In the “\uxxxx” notation, “xxxx” is the character’s code point in the hexadecimal representation. For example, the hexadecimal ASCII code of ‘A‘ is 41 (decimal: 65). Therefore, we can get the string “A” if we use the Unicode notation “\u0041”:

assertEquals("A", "\u0041");

So next, let’s see how to get the desired Unicode character from the hexadecimal number after “\u”.

4. Using the Character.toChars() Method

Now we understand what the hexadecimal number after “\u” indicates. When we received “2705,” it was the hexadecimal representation of a character’s code point.

If we get the code point integer, the Character.toChars(int codePoint) method can give us the char array that holds the code point’s Unicode representation. Finally, we can call String.valueOf() to get the target string:

Given "2705"
 |_ Hex(codePoint) = 0x2705
     |_ codePoint = 9989 (decimal)
         |_ char[] chars = Character.toChars(9989) 
            |_ String.valueOf(chars)
               |_"✅"

As we can see, to obtain our target character, we must find the code point first.

The code point integer can be obtained by parsing the provided string in the hexadecimal (base-16) radix using the Integer.parseInt() method. 

So next, let’s put everything together:

int codePoint = Integer.parseInt("2705", 16); // Decimal int: 9989
char[] checkChar = Character.toChars(codePoint);
String check = String.valueOf(checkChar);
assertEquals(U_CHECK, check);
codePoint = Integer.parseInt("5F3A", 16); // Decimal int: 24378
char[] strongChar = Character.toChars(codePoint);
String strong = String.valueOf(strongChar);
assertEquals(U_STRONG, strong);

It’s worth noting that if we work with Java 11 or later version, we can get the string directly from the code point integer using the Character.toString() method, for example:

// For Java 11 and later versions:
assertEquals(U_STRONG, Character.toString(codePoint));

Of course, we can wrap the implementation above in a method:

String stringFromCodePointHex(String codePointHex) {
    int codePoint = Integer.parseInt(codePointHex, 16);
    // For Java 11 and later versions: return Character.toString(codePoint)
    char[] chars = Character.toChars(codePoint);
    return String.valueOf(chars);
}

Finally, let’s test the method to make sure it produces the expected result:

assertEquals("A", stringFromCodePointHex("0041"));
assertEquals(U_CHECK, stringFromCodePointHex("2705"));
assertEquals(U_STRONG, stringFromCodePointHex("5F3A"));

5. Conclusion

In this article, we first learned the significance of “xxxx” in the “\uxxxx” notation, then explored how to obtain the target Unicode string from the hexadecimal representation of a given code point.

As always, the complete source code for the examples is available over on GitHub.

       

Viewing all articles
Browse latest Browse all 4536

Trending Articles