1. Overview
Comments can be useful when we need additional notes in our code. They can help us make our code more understandable. Additionally, they can be especially useful in methods that perform complex operations.
In this tutorial, we’ll explore cases where comments in our code can become executable. Or at least it may appear like they can.
2. Comments
Before we dive in, let’s revisit comments in Java. They are part of the Java syntax and come in two basic formats:
- Single-line comments
- Multiline comments
The text from the “//” characters to the end of the line represents a single-line comment:
// This is a single-line comment.
Additionally, a multiple-line comment (also known as a multiline comment) starts with the “/*” and ends with the “*/” symbol. Everything in between is treated as a comment:
/* This is a
* multiline
* comment.
*/
3. Comments and Unicode
Now, let’s start with an example. The following code prints “Baeldung” in the standard output:
// \u000d System.out.println("Baeldung");
Because the line begins with the “//”, which represents the start of a single-line comment, we might conclude the “System.out.println(“Baeldung”);” statement is part of that comment as well.
However, this isn’t accurate. It’s important to note Java doesn’t allow comment execution.
With that in mind, let’s examine our example in detail and see the reasons why the code prints “Baeldung” in the console.
3.1. Unicode Escapes
The code from the example isn’t treated as a comment because of the “\u000d” Unicode escape sequence we placed before it.
All Java programs use the ASCII character set. However, due to the non-Latin characters, we can’t represent using ASCII codes, Java allows Unicode to appear in comments, identifiers, keywords, literals, and separators.
Furthermore, to be able to use all non-ASCII characters in our code, we need to embed them through Unicode escape sequences. They start with a backslash (“\”) followed by the letter “u” which is then followed by a four-digit hexadecimal code of a specific character.
Using this convention, the CR (or Carriage return) becomes “\u000d“.
Additionally, the Unicode escape sequences are transformed into ASCII code using the lexical translation defined in the Java Language Specification.
Moving forward, let’s take a closer look at how Java performs the lexical transformation.
3.2. Lexical Translation
When executing the lexical translation, the Unicode encoding takes precedence over any other encoding, even if it’s part of the comment. To put it differently, Java will first encode all Unicode escape sequences and then move forward with other translations.
Simply put, during the transformation, the Unicode escape is translated into the Unicode character. Then, the result of the previous step is translated into the ASCII code.
As a side effect, our code won’t compile if we put an invalid Unicode escape inside the comment. Java treats everything that starts with the “\u” as a Unicode escape.
Thanks to this transformation, we can use Unicode escapes to include any Unicode characters using only ASCII characters. This way, ASCII-based programs and tools can still process the code written in Unicode.
Now, back to our example. We used the Unicode escape sequence “\u000d“, which represents a new line.
When we compile our code, the lexical translation will happen first. Therefore, the “\u000d” will translate to the new line. Since, by definition, a single-line comment ends at the end of the line, the code we put after the Unicode escape won’t be part of the comment anymore.
As a result of the transformation, our code will appear in the new line:
//
System.out.println("Baeldung");
3.3. Unicode and IDEs
Nowadays, we often use an IDE as a development tool. Additionally, we frequently rely on it and expect it’ll warn us if something in our code seems suspicious.
However, when it comes to IDEs and Unicode characters, depending on the IDE we’re using, it sometimes displays the code in the wrong way. It might not interpret Unicode escape sequences correctly and, thus, displays incorrect code highlighting.
Since we can use Unicode escapes instead of ASCII characters, nothing prevents us from substituting other parts of the code with Unicode escapes:
\u002f\u002f This is a comment
\u0053ystem.out.println("Baeldung");
Here, we replaced the “//” and the letter “S” with Unicode escapes. The code still prints “Baeldung” in the console.
4. Conclusion
In this tutorial, we learned how comments and Unicode escape sequences work together.
To sum up, Java doesn’t allow executable comments. When using Unicode escapes in our code, Java translates them to ASCII before any other transformation.
Being able to write Unicode characters is useful when we’d like to use non-Latin characters we can’t represent in any other way in our program. Although it’s perfectly legal to write an entire codebase using just Unicode escapes, we should avoid them and use them only when necessary.