Problem: In a Java program, you need a way to find/match a pattern against a multiline String or in a more advanced case, you want to extract one or more groups of regular expressions from a multiline String.
Solution: Use the Java Pattern and Matcher classes, and define the regular expressions (regex) you want to look for when creating your Pattern class. Also, specify the Pattern.MULTILINE
flag when creating your Pattern instance. As usual with groups, place your regex definitions inside grouping parentheses so you can extract the actual text that matches your regex patterns from the String.
In the following source code example I demonstrate how to extract the text between the opening and closing HTML code
tags from a given multi-line String:
import java.util.regex.Matcher; import java.util.regex.Pattern; /** * A complete Java program to demonstrate how to extract multiple * HTML tags from a String that contains multiple lines. Multiple * lines are handled with the Pattern.MULTILINE flag. */ public class PatternMatcherGroupHtmlMultiline { public static void main(String[] args) { String stringToSearch = "<p>Yada yada yada <code>foo</code> yada yada ...\n" + "more here <code>bar</code> etc etc\n" + "and still more <code>baz</code> and now the end</p>\n"; // the pattern we want to search for Pattern p = Pattern.compile(" <code>(\\w+)</code> ", Pattern.MULTILINE); Matcher m = p.matcher(stringToSearch); // print all the matches that we find while (m.find()) { System.out.println(m.group(1)); } } }
The output from this program is:
foo bar baz
Discussion
The stringToSearch
is created with several newline characters (\n
) to simulate the multiple strings you might get when reading a file (or input stream) that contains HTML.
The most important part of the solution involves using the Pattern.MULTILINE
flag when creating your Pattern
object. As the name implies, this tells the Pattern class to look across multiple lines when parsing the String.
Another important part of the solution is to use a while
loop with the find
method to make sure you find all occurrences of your regex pattern in the input String. If you only use an if
statement with the find
method, you will only get the first match.