Problem: In a Java program, you want a way to extract a simple HTML tag from a String, and you don't want to use a more complicated approach.
Solution: Use the Java Pattern and Matcher classes, and supply a regular expression (regex) to the Pattern
class that defines the tag you want to extract. Then use the find
method of the Matcher
class to see if there is a match, and if so, use the group
method to extract the actual group of characters from the String
that matches your regular expression.
In the following source code I demonstrate how to extract the contents from a code
tag from a longer HTML string:
import java.util.regex.Matcher; import java.util.regex.Pattern; /** * A complete Java program that demonstrates how to * extract a tag from a line of HTML using the Pattern * and Matcher classes. */ public class PatternMatcherGroupHtml { public static void main(String[] args) { String stringToSearch = "<p>Yada yada yada <code>StringBuffer</code> yada yada ...</p>"; // the pattern we want to search for Pattern p = Pattern.compile("<code>(\\S+)</code>"); Matcher m = p.matcher(stringToSearch); // if we find a match, get the group if (m.find()) { // get the matching group String codeGroup = m.group(1); // print the group System.out.format("'%s'\n", codeGroup); } } }
By using a group to extract the contents between the HTML opening and closing code
tags, the output from this program is:
'StringBuffer'
Discussion
In this example, the regex "<code>(\\S+)</code>"
lets me extract everything between the opening and closing code
tags as a group. I then access this group using this line of code:
String codeGroup = m.group(1);
Finding all matching groups
It’s important to note that this example is hard-coded to look for only one occurrence of this group. In a more robust example, where you want to find and extract the contents of every code
tag, your code would look more like this, using a while
loop with the find
method:
while (m.find()) { String codeGroup = m.group(1); System.out.format("'%s'\n", codeGroup); }
This code repeatedly calls the find
method and prints the contents of the matching group until find
doesn't locate any more matching patterns in the given String
.
In summary, if you wanted to see a simple way to extract an HTML regular expression pattern from a String
in Java, I hope this is helpful.