Java URL FAQ: Using Java, how can I encode a String that I want to safely pass in a URL to a servlet or CGI program running on a web server?
Answer: As the question implies, you can't just pass any string of characters to a URL on the internet. You have to encode the String so it is safe to pass to a URL. With Java, you can use the URLEncoder encode
method to safely encode a String
, as shown in the following sample Java source code example:
import java.io.UnsupportedEncodingException; import java.net.URLEncoder; /** * Test the Java URLEncoder class, passing it a few different strings * to see what the resulting output looks like. * * @author alvin alexander, devdaily.com. * */ public class JavaUrlEncoderTest { public static void main(String[] args) throws UnsupportedEncodingException { // one easy string, one that's a little bit harder String[] testStrings = {"All fall gala hall", "this\\is/a%test\t_~!@#$%^&*()dude"}; for (String s : testStrings) { String encodedString = URLEncoder.encode(s, "UTF-8"); System.out.format("'%s'\n", encodedString); } } }
As you can see, this program loops through an array of strings, passing each string in the array to the URLEncoder.encode
method, then printing the string that is received as output from this method call.
Here's what the output from this sample program looks like:
'All+fall+gala+hall' 'this%5Cis%2Fa%25test%09_%7E%21%40%23%24%25%5E%26*%28%29dude'
As you can see from both lines of output, the strings that are passed in are modified considerably by the encode
method. In short, encoding your strings like this makes them safe to put in a URL when you make a call to a web server.
The URLEncoder class
For more information on URL encoding, I recommend reading the Javadoc for the URLEncoder class.
The following text from the URLEncoder
Javadoc is very helpful to understand:
This class contains static methods for converting a String to the application/x-www-form-urlencoded MIME format.
When encoding a String, the following rules apply:
- The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
- The special characters ".", "-", "*", and "_" remain the same.
- The blank space character " " is converted into a plus sign "+".
- All other characters are unsafe and are first converted into one or more bytes using some encoding scheme. Then each byte is represented by the 3-character string "%xy", where xy is the two-digit hexadecimal representation of the byte. The recommended encoding scheme to use is UTF-8. However, for compatibility reasons, if an encoding is not specified, then the default encoding of the platform is used.