Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 4535

Remove Emojis from a Java String

$
0
0

1. Overview

Emojis are becoming more popular in text messaging these days – sometimes we need to clean our text from them and other symbols.

In this tutorial, we’ll discuss different ways to remove emojis from a String in Java.

2. Using Emoji Library

First, we’ll use an emoji library to remove the emojis from our String.

We’ll use emoji-java in the following example, so we need to this dependency to our pom.xml:

<dependency>
    <groupId>com.vdurmont</groupId>
    <artifactId>emoji-java</artifactId>
    <version>4.0.0</version>
</dependency>

The latest version can be found here.

Now let’s see how to use emoji-java to remove emojis from our String:

@Test
public void whenRemoveEmojiUsingLibrary_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String result = EmojiParser.removeAllEmojis(text);

    assertEquals(result, "la conférence, commencera à 10 heures ");
}

Here, we’re calling the removeAllEmojis() method of EmojiParser.

We can also use EmojiParser to replace emoji with its aliases using the parseToAliases() method:

@Test
public void whenReplaceEmojiUsingLibrary_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String result = EmojiParser.parseToAliases(text);

    assertEquals(
      result, 
      "la conférence, commencera à 10 heures :sweat_smile:");
}

Note that using this library is very useful if we need to replace emoji with their aliases.

However, the emoji-java library will only detect emojis, but won’t be able to detect symbols or other special characters.

3. Using a Regular Expression

Next, we can use a regular expression to remove emojis and other symbols.
We’ll allow only specific types of characters:

@Test
public void whenRemoveEmojiUsingMatcher_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String regex = "[^\\p{L}\\p{N}\\p{P}\\p{Z}]";
    Pattern pattern = Pattern.compile(
      regex, 
      Pattern.UNICODE_CHARACTER_CLASS);
    Matcher matcher = pattern.matcher(text);
    String result = matcher.replaceAll("");

    assertEquals(result, "la conférence, commencera à 10 heures ");
}

Let’s break down our regular expression:

  • \p{L} – to allow all letters from any language
  • \p{N} – for numbers
  • \p{P} – for punctuation
  • \p{Z} – for whitespace separators
  • ^ is for negation, so all these expressions will be whitelisted

This expression will only keep letters, numbers, punctuation, and whitespace. We can customize the expression as we want to allow or remove more character types

We can also use String.replaceAll() with the same regex:

@Test
public void whenRemoveEmojiUsingRegex_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String regex = "[^\\p{L}\\p{N}\\p{P}\\p{Z}]";
    String result = text.replaceAll(regex, "");

    assertEquals(result, "la conférence, commencera à 10 heures ");
}

5. Using Code Points

Now, we’ll also detect emojis using their code points. We can use \x{hexidecimal value} expression to match a specific Unicode point.

In the following example, we remove two Unicode ranges of emojis using their Unicode points:

@Test
public void whenRemoveEmojiUsingCodepoints_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String result = text.replaceAll("[\\x{0001f300}-\\x{0001f64f}]|[\\x{0001f680}-\\x{0001f6ff}]", "");

    assertEquals(result, "la conférence, commencera à 10 heures ");
}

The full list of currently available emojis and their code points can be found here.

6. Using Unicode Range

Finally, we’ll use Unicode again but using the \u expression this time.

The problem is that some Unicode points don’t fit in one 16bit Java character, so some of them need two characters.

Here’s the corresponding expression using \u:

@Test
public void whenRemoveEmojiUsingUnicode_thenSuccess() {
    String text = "la conférence, commencera à 10 heures 😅";
    String result = text.replaceAll("[\ud83c\udf00-\ud83d\ude4f]|[\ud83d\ude80-\ud83d\udeff]", "");

    assertEquals(result, "la conférence, commencera à 10 heures ");
}

7. Conclusion

In this quick article, we learned different ways to remove emojis from a Java String. We used emoji library, regular expressions and Unicode ranges.

The full source code for the examples can be found over on GitHub.


Viewing all articles
Browse latest Browse all 4535

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>