1. Overview
In this tutorial, we'll learn how to validate email addresses in Java using regular expressions.
2. Email Validation in Java
Email validation is required in nearly every application that has user registration in place.
An email address is divided into three main parts: the local-part, an @ symbol, and a domain. Let's suppose if “username@domain.com” is an email then:
- local-part = username
- @ = @
- domain = domain.com
It can take a lot of effort to validate an email address through string manipulation techniques, as we typically need to count and check all the character types and lengths. But in Java, by using a regular expression, it can be much easier.
As we know, a regular expression is a sequence of characters to match patterns. In the following sections, we'll see how email validation can be performed by using several different regular expression methods.
3. Simple Regular Expression Validation
The simplest regular expression to validate an email address is ^(.+)@(\S+) $.
It only checks the presence of the @ symbol in the email address. If present, then the validation result returns true otherwise, the result is false. However, this regular expression doesn't check the local-part and domain of the email.
For example, according to this regular expression, username@domain.com will pass the validation, but username#domain.com will fail the validation.
Let's define a simple helper method to match the regex pattern:
public static boolean patternMatches(String emailAddress, String regexPattern) {
return Pattern.compile(regexPattern)
.matcher(emailAddress)
.matches();
}
Now, Let's also write the code to validate the email address using this regular expression:
@Test
public void testUsingSimpleRegex() {
emailAddress = "username@domain.com";
regexPattern = "^(.+)@(\\S+)$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
The absence of the @ symbol in the email address will also fail the validation.
4. Strict Regular Expression Validation
Now let's write a more strict regular expression that will check the local-part as well as the domain-part of the email:
^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$
The following restrictions are imposed in the email addresses local-part by using this regex:
- It allows numeric values from 0 to 9
- Both uppercase and lowercase letters from a to z are allowed
- Allowed are underscore “_”, hyphen “-” and dot “.”
- Dot isn't allowed at the start and end of the local-part
- Consecutive dots aren't allowed
- For the local-part, a maximum of 64 characters are allowed
Restrictions for the domain-part in this regular expression include:
- It allows numeric values from 0 to 9
- We allow both uppercase and lowercase letters from a to z
- Hyphen “-” and dot “.” isn't allowed at the start and end of the domain-part
- No consecutive dots
Let's also write the code to test out this regular expression:
@Test
public void testUsingStrictRegex() {
emailAddress = "username@domain.com";
regexPattern = "^(?=.{1,64}@)[A-Za-z0-9_-]+(\\.[A-Za-z0-9_-]+)*@"
+ "[^-][A-Za-z0-9-]+(\\.[A-Za-z0-9-]+)*(\\.[A-Za-z]{2,})$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
So some of the email addresses that'll be valid via this email validation technique are:
- username@domain.com
- user.name@domain.com
- user-name@domain.com
- username@domain.co.in
- user_name@domain.com
Here is a shortlist of some email addresses that'll be invalid via this email validation:
- username.@domain.com
- .user.name@domain.com
- user-name@domain.com.
- username@.com
5. Regular Expression for Validation of Non-Latin or Unicode Characters Email
The regex that we just saw in the previous section will work well for email addresses written in the English language, but it won't work for Non-Latin email addresses.
So let's write a regular expression that we can use to validate unicode characters as well:
^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$
We can use this regex for validating the Unicode or Non-Latin email addresses to support all languages.
As we can see, this regex is similar to the strict regex that we built in the previous section except that we have changed the “A-Za-Z” part with “\\p{L}”. This is to enable the support for Unicode characters.
Let's check this regex by writing the test:
@Test
public void testUsingUnicodeRegex() {
emailAddress = "用户名@领域.电脑";
regexPattern = "^(?=.{1,64}@)[\\p{L}0-9_-]+(\\.[\\p{L}0-9_-]+)*@"
+ "[^-][\\p{L}0-9-]+(\\.[\\p{L}0-9-]+)*(\\.[\\p{L}]{2,})$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
Now, this regex not only presents a more strict approach to validate email addresses but also supports Non-Latin characters as well.
6. Regular Expression by RFC 5322 for Email Validation
Instead of writing a custom regex to validate email addresses, we can use one provided by the RFC standards.
The RFC 5322, which is an updated version of RFC 822, provides a regular expression for email validation.
Let's check it out:
^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$
As we can see, it's a very simple regex that allows all the characters in the email.
However, it doesn't allow the pipe character (|) and single quote (‘) as these present a potential SQL injection risk when passed from the client site to the server.
Let's write the code to validate an email with this regex:
@Test
public void testUsingRFC5322Regex() {
emailAddress = "username@domain.com";
regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^.-]+@[a-zA-Z0-9.-]+$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
7. Regular Expression to Check Characters in the Top-Level Domain
We have written regex to verify the email address's local and domain parts. Now let's also write a regex that checks the top-level domain of the email.
The below regular expression validates the top-level domain-part of the email address:
^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$
This regex basically checks whether the email address has only one dot and there is a minimum of two and a maximum of six characters are present in the top-level domain.
Let's also write some code to verify the email address by using this regex:
@Test
public void testTopLevelDomain() {
emailAddress = "username@domain.com";
regexPattern = "^[\\w!#$%&'*+/=?`{|}~^-]+(?:\\.[\\w!#$%&'*+/=?`{|}~^-]+)*"
+ "@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,6}$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
8. Regular Expression to Restrict Consecutive, Trailing, and Leading Dots
Now Let's write a regex that will restrict the usage of dots in the email addresses.
^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$
The above regular expression is used to restrict consecutive, leading, and trailing dots. Thus, an email can contain more than one dot but not consecutive in the local and domain parts.
Let's take a look at the code:
@Test
public void testRestrictDots() {
emailAddress = "username@domain.com";
regexPattern = "^[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+(?:\\.[a-zA-Z0-9_!#$%&'*+/=?`{|}~^-]+)*@"
+ "[a-zA-Z0-9-]+(?:\\.[a-zA-Z0-9-]+)*$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
9. OWASP Validation Regular Expression
This regular expression is provided by the OWASP validation regex repository to check the email validation:
^[a-zA-Z0-9_+&*-] + (?:\\.[a-zA-Z0-9_+&*-] + )*@(?:[a-zA-Z0-9-]+\\.) + [a-zA-Z]{2, 7}
This regex also supports the most validations in the standard email structure.
Let's also verify the email address by using the below code:
@Test
public void testOwaspValidation() {
emailAddress = "username@domain.com";
regexPattern = "^[a-zA-Z0-9_+&*-]+(?:\\.[a-zA-Z0-9_+&*-]+)*@(?:[a-zA-Z0-9-]+\\.)+[a-zA-Z]{2,7}$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
10. Gmail Special Case for Emails
There's one special case that applies only to the Gmail domain. It's permission to use the character + character in the local-part of the email. For the Gmail domain, the two email addresses username+something@gmail.com and username@gmail.com are the same.
Also, username@gmail.com is similar to user+name@gmail.com.
We must implement a slightly different regex that will pass the email validation for this special case as well.
^(?=.{1,64}@)[A-Za-z0-9_-+]+(\\.[A-Za-z0-9_-+]+)*@[^-][A-Za-z0-9-+]+(\\.[A-Za-z0-9-+]+)*(\\.[A-Za-z]{2,})$
Let's also write an example to test this use case:
@Test
public void testGmailSpecialCase() {
emailAddress = "username+something@domain.com";
regexPattern = "^(?=.{1,64}@)[A-Za-z0-9\\+_-]+(\\.[A-Za-z0-9\\+_-]+)*@"
+ "[^-][A-Za-z0-9\\+-]+(\\.[A-Za-z0-9\\+-]+)*(\\.[A-Za-z]{2,})$";
assertTrue(EmailValidation.patternMatches(emailAddress, regexPattern));
}
11. Apache Commons Validator for Email
The Apache Commons Validator is a validation package that contains standard validation rules. So, by importing this package, we can apply email validation.
We can use the EmailValidator class to validate email, which uses RFC 822 standards. This Validator contains a mixture of custom code and regular expressions to validate an email. It not only supports the special characters but also supports the Unicode characters we've discussed.
Let's add the commons-validator dependency in our project:
<dependency>
<groupId>commons-validator</groupId>
<artifactId>commons-validator</artifactId>
<version>${validator.version}</version>
</dependency>
Now, we can validate email addresses using the below code:
@Test
public void testUsingEmailValidator() {
emailAddress = "username@domain.com";
assertTrue(EmailValidator.getInstance()
.isValid(emailAddress));
}
12. Which Regex Should I Use?
In this tutorial, we've looked at a variety of solutions using regex for email address validation. Obviously, it depends on how strict we want our validation to be and our exact requirements as to which one solution we should use.
For example, we can use the simple regex from section 3 if we need just a simple regex to check the presence of an @ symbol in an email. However, for more detailed validation, we can opt for a stricter regex solution from section 6 based on the RFC5322 standard.
Finally, if we are dealing with Unicode characters in an email, we can go for the regex solution provided in section 5 of this tutorial.
13. Conclusion
In this tutorial, we've learned various ways to validate email addresses in Java using regular expressions.
The complete code for this tutorial is available over on GitHub.