Java URL example - A Java class to download and parse URL contents

Java URL download FAQ: Can you share some source code for a Java URL example, specifically a Java class to download and parse the contents of a URL?

This example is a little weak, but it's a program that downloads and parses the contents of a given URL. The purpose has nothing to do with URLs ... it has a lot more to do with the parsing that I am trying to achieve. The parsing code is actually going to be used in an anti-spam program that I am working on.

Java URL download and parse HTML source example

Here then is my Java URL download and parse source code:

//JavaUrlExample.java

import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.io.*;
import java.net.MalformedURLException;
import java.net.URL;

public class JavaUrlExample
{
  public static void main(String[] args)
  {
    new JavaUrlExample();
  }

  public JavaUrlExample()
  {
    String contents = downloadURL("http://www.devdaily.com/");  // a sample URL
    System.out.println("RAW CONTENTS:" + "\n" + contents);

    String strippedContents = parseString(contents);
    System.out.println("\n\nSTRIPPED CONTENTS:" + "\n" + strippedContents);

  }

  private String downloadURL(String theURL)
  {
    URL u;
    InputStream is = null;
    DataInputStream dis;
    String s;
    StringBuffer sb = new StringBuffer();

    try
    {
      u = new URL(theURL);
      is = u.openStream();
      dis = new DataInputStream(new BufferedInputStream(is));
      while ((s = dis.readLine()) != null)
      {
        sb.append(s + "\n");
      }
    }
    catch (MalformedURLException mue)
    {
      System.out.println("Ouch - a MalformedURLException happened.");
      mue.printStackTrace();
      System.exit(1);
    }
    catch (IOException ioe)
    {
      System.out.println("Oops- an IOException happened.");
      ioe.printStackTrace();
      System.exit(1);
    }
    finally
    {
      try
      {
        is.close();
      }
      catch (IOException ioe)
      {
      }
    }
    return sb.toString();
  }

  public String parseString(String s)
  {
    String output = null;

    Pattern replaceWhitespacePattern = Pattern.compile("\\s");
    Matcher matcher = null;
    matcher = replaceWhitespacePattern.matcher(s);
    output = matcher.replaceAll(" ");

    Pattern removeHTMLTagsPattern = Pattern.compile("]*>");
    matcher = removeHTMLTagsPattern.matcher(output);
    output = matcher.replaceAll("");

    Pattern leaveOnlyAlphaNumericCharactersPattern = Pattern.compile("[^0-9a-zA-Z ]");
    matcher = leaveOnlyAlphaNumericCharactersPattern.matcher(output);
    output = matcher.replaceAll("");

    return output;

  }
}

Java URL download and parse example - Summary

I hope this Java URL download/parse example has been helpful.

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.