By Alvin Alexander. Last updated: June 4, 2016
Java URL download FAQ: Can you share some source code for a Java URL example, specifically a Java class to download and parse the contents of a URL?
This example is a little weak, but it's a program that downloads and parses the contents of a given URL. The purpose has nothing to do with URLs ... it has a lot more to do with the parsing that I am trying to achieve. The parsing code is actually going to be used in an anti-spam program that I am working on.
Java URL download and parse HTML source example
Here then is my Java URL download and parse source code:
//JavaUrlExample.java
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;
public class JavaUrlExample
{
public static void main(String[] args)
{
new JavaUrlExample();
}
public JavaUrlExample()
{
String contents = downloadURL("http://www.devdaily.com/"); // a sample URL
System.out.println("RAW CONTENTS:" + "\n" + contents);
String strippedContents = parseString(contents);
System.out.println("\n\nSTRIPPED CONTENTS:" + "\n" + strippedContents);
}
private String downloadURL(String theURL)
{
URL u;
InputStream is = null;
DataInputStream dis;
String s;
StringBuffer sb = new StringBuffer();
try
{
u = new URL(theURL);
is = u.openStream();
dis = new DataInputStream(new BufferedInputStream(is));
while ((s = dis.readLine()) != null)
{
sb.append(s + "\n");
}
}
catch (MalformedURLException mue)
{
System.out.println("Ouch - a MalformedURLException happened.");
mue.printStackTrace();
System.exit(1);
}
catch (IOException ioe)
{
System.out.println("Oops- an IOException happened.");
ioe.printStackTrace();
System.exit(1);
}
finally
{
try
{
is.close();
}
catch (IOException ioe)
{
}
}
return sb.toString();
}
public String parseString(String s)
{
String output = null;
Pattern replaceWhitespacePattern = Pattern.compile("\\s");
Matcher matcher = null;
matcher = replaceWhitespacePattern.matcher(s);
output = matcher.replaceAll(" ");
Pattern removeHTMLTagsPattern = Pattern.compile("]*>");
matcher = removeHTMLTagsPattern.matcher(output);
output = matcher.replaceAll("");
Pattern leaveOnlyAlphaNumericCharactersPattern = Pattern.compile("[^0-9a-zA-Z ]");
matcher = leaveOnlyAlphaNumericCharactersPattern.matcher(output);
output = matcher.replaceAll("");
return output;
}
}
Java URL download and parse example - Summary
I hope this Java URL download/parse example has been helpful.

