By Alvin Alexander. Last updated: June 4, 2016
Java URL download FAQ: Can you share some source code for a Java URL example, specifically a Java class to download and parse the contents of a URL?
This example is a little weak, but it's a program that downloads and parses the contents of a given URL. The purpose has nothing to do with URLs ... it has a lot more to do with the parsing that I am trying to achieve. The parsing code is actually going to be used in an anti-spam program that I am working on.
Java URL download and parse HTML source example
Here then is my Java URL download and parse source code:
//JavaUrlExample.java import java.util.regex.Pattern; import java.util.regex.Matcher; import java.io.*; import java.net.MalformedURLException; import java.net.URL; public class JavaUrlExample { public static void main(String[] args) { new JavaUrlExample(); } public JavaUrlExample() { String contents = downloadURL("http://www.devdaily.com/"); // a sample URL System.out.println("RAW CONTENTS:" + "\n" + contents); String strippedContents = parseString(contents); System.out.println("\n\nSTRIPPED CONTENTS:" + "\n" + strippedContents); } private String downloadURL(String theURL) { URL u; InputStream is = null; DataInputStream dis; String s; StringBuffer sb = new StringBuffer(); try { u = new URL(theURL); is = u.openStream(); dis = new DataInputStream(new BufferedInputStream(is)); while ((s = dis.readLine()) != null) { sb.append(s + "\n"); } } catch (MalformedURLException mue) { System.out.println("Ouch - a MalformedURLException happened."); mue.printStackTrace(); System.exit(1); } catch (IOException ioe) { System.out.println("Oops- an IOException happened."); ioe.printStackTrace(); System.exit(1); } finally { try { is.close(); } catch (IOException ioe) { } } return sb.toString(); } public String parseString(String s) { String output = null; Pattern replaceWhitespacePattern = Pattern.compile("\\s"); Matcher matcher = null; matcher = replaceWhitespacePattern.matcher(s); output = matcher.replaceAll(" "); Pattern removeHTMLTagsPattern = Pattern.compile("]*>"); matcher = removeHTMLTagsPattern.matcher(output); output = matcher.replaceAll(""); Pattern leaveOnlyAlphaNumericCharactersPattern = Pattern.compile("[^0-9a-zA-Z ]"); matcher = leaveOnlyAlphaNumericCharactersPattern.matcher(output); output = matcher.replaceAll(""); return output; } }
Java URL download and parse example - Summary
I hope this Java URL download/parse example has been helpful.