1. Introduction
Simply put, URL encoding translates special characters from the URL to a representation that adheres to the spec and can be correctly understood and interpreted.
In this article, we’ll focus on how to encode/decode the URL or form data so that it adheres to the spec and transmits over the network properly.
2. Analyze the URL
A basic URI syntax can be generalized as:
scheme:[//[user:password@]host[:port]][/]path[?query][#fragment]
The first step into encoding a URI is analyzing its parts and then encoding only the relevant portions.
Let us look at an example of a URI:
String testUrl = "http://www.baeldung.com?key1=value+1&key2=value%40%21%242&key3=value%253";
One way of analyzing the URI is loading the String representation to a java.net.URI class:
@Test public void givenURL_whenAnalyze_thenCorrect() throws Exception { URI uri = new URI(testUrl); Assert.assertThat(uri.getScheme(), is("http")); Assert.assertThat(uri.getHost(), is("www.baeldung.com")); Assert.assertThat(uri.getRawQuery(), .is("key1=value+1&key2=value%40%21%242&key3=value%253")); }
The URI class parses the string representation URL and exposes its parts via a simple API – e.g. getXXX.
3. Encode the URL
When encoding URI, one of the common pitfalls is encoding the complete URI. Typically, we need to encode only the query portion of the URI.
Let’s encode the data using the encode(data, encodingScheme) method of the URLEncoder class:
private String encodeValue(String value) { String encoded = URLEncoder.encode(value, StandardCharsets.UTF_8.toString()); return encoded; } @Test public void givenRequestParam_whenUTF8Scheme_thenEncode() throws Exception { Map<String, String> requestParams = new HashMap<>(); requestParams.put("key1", "value 1"); requestParams.put("key2", "value@!$2"); requestParams.put("key3", "value%3"); String encodedURL = requestParams.keySet().stream() .map(key -> key + "=" + encodeValue(requestParams.get(key))) .collect(joining("&", "http://www.baeldung.com?", "")); Assert.assertThat(testUrl, is(encodedURL)); }
The encode method accepts two parameters:
- data – string to be translated
- encodingScheme – name of the character encoding
This encode method converts the string into application/x-www-form-urlencoded format.
The encoding scheme will convert special characters into two digits hexadecimal representation of 8 bits that will be represented in form of “%xy“. When we are dealing with path parameters or adding parameters which are dynamic in nature, then we will encode the data and then send to the server.
Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilities. (Reference: https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html)
4. Decode the URL
Let us now decode the previous URL using the decode method of the URLDecoder:
private String decode(String value) { String decoded = URLDecoder.decode(value, StandardCharsets.UTF_8.toString()); return decoded; } @Test public void givenRequestParam_whenUTF8Scheme_thenDecodeRequestParams() { URI uri = new URI(testUrl); String scheme = uri.getScheme(); String host = uri.getHost(); String query = uri.getRawQuery(); String decodedQuery = Arrays.stream(query.split("&")) .map(param -> param.split("=")[0] + "=" + decode(param.split("=")[1])) .collect(Collectors.joining("&")); Assert.assertEquals( "http://www.baeldung.com?key1=value 1&key2=value@!$2&key3=value%3", scheme + "://" + host + "?" + decodedQuery); }
The two important bits here are:
- analyze URL before decoding
- use the same encoding scheme for encoding and decoding
If we were to decode than analyze, URL portions may not be parsed properly. If we used another encoding scheme to decode the data, it would result in garbage data.
5. Encode a Path Segment
URLEncoder cannot be used for encoding path segment of the URL. Path component refers to the hierarchical structure which represents a directory path or it serves to locate resources separated by “/”.
Reserved characters in path segment are different than in query parameter values. For example a “+” sign is a valid character in path segment and therefore should not be encoded.
To encode the path segment, we use the URI class instead . URI class provides two methods to get path:
- getRawPath – This method will encode the path component
- getPath – This method will decode the path component
Let’s see how it is different:
@Test public void givenPath_thenEncodeDecodePath() { URI uri = new URI(null, null, "/Path 1/Path+2", null); Assert.assertEquals("/Path%201/Path+2", uri.getRawPath()); Assert.assertEquals("/Path 1/Path+2", uri.getPath()); }
In the above code snippet we can see that when we used getRawPath method, it returned the encoded value and + is not being encoded because it is value character in path component. Whereas when we used getPath method it simply returned the decoded value.
Let us add a path variable to our test URL:
String testUrl = "http://www.baeldung.com/path+1?key1=value+1&key2=value%40%21%242&key3=value%253";
and a method for encoding path:
private String encodePath(String path) { path = new URI(null, null, path, null).getRawPath(); return path; }
to assemble and assert a properly encoded URL let us change the test from section 2:
String path = "path+1"; String encodedURL = requestParams.keySet().stream() .map(key -> key + "=" + encodeValue(requestParams.get(key))) .collect(joining("&", "http://www.baeldung.com/" + encodePath(path) + "?", "")); Assert.assertThat(testUrl, CoreMatchers.is(encodedURL));
6. Conclusion
In this tutorial, we have seen how to encode and decode the data so that it can be transferred and interpreted correctly. While the article focused on encoding/decoding URI query parameter values, the approach is applicable to HTML form parameters as well.
You can find the source code over on GitHub.