How to use multiple regex patterns with replaceAll (Java String class)

Java FAQ: How can I use multiple regular expression patterns with the replaceAll method in the Java String class?

Here’s a little example that shows how to replace many regular expression patterns with one replacement string in Scala and Java. I’ll show all of this code in Scala’s interactive interpreter environment, but in this case Scala is very similar to Java, so the initial solution can easily be converted to Java.

Back to top

1) A simple string

First, I create a simple String:

scala> val s = "My dog ate all of the cheese; why, I don't know."
s: String = My dog ate all of the cheese; why, I don't know.
Back to top

2) Replace multiple patterns in that string

In my case I want to remove all trailing periods, commas, semi-colons, and apostrophes from a string, so I use the String class replaceAll method with my regex pattern to remove all of those characters with one method call:

scala> val result = s.replaceAll("[\\.$|,|;|']", "")
result: String = My dog ate all of the cheese why I dont know

Because the substitution pattern is "", all of those characters are removed in the resulting string.

Back to top

3) More explanation

In the last few days I’ve been working on parsing sentences into words, so the first thing I do is use the string split method to convert a String into an array of strings, which are essentially words in a sentence:

scala> val words = s.split(" ")
words: Array[String] = Array(My, dog, ate, all, of, the, cheese;, why,, I, don't, know.)

Next, what I want to do is remove all of these characters from the strings in that array:

. , ; '

In Scala I do this with the map method that is available on an Array:

scala> val cleanWords = words.map(_.replaceAll("[\\.$|,|;|']", ""))
cleanWords: Array[String] = Array(My, dog, ate, all, of, the, cheese, why, I, dont, know)
Back to top

Multiple search patterns

As that example shows, this is my search pattern:

"[\\.$|,|;|']"

It is intended to find the following regex patterns in the given string:

  • A period at the end of a word
  • Commas
  • Semi-colons
  • Single apostrophes

The combination of the brackets and pipe characters is what makes this work. It’s often hard to read regex patterns, so it may help to look at that multiple search pattern regex more generally, like this:

[pattern1|pattern2|pattern3|pattern4]

where:

pattern1  =  \\.$
pattern2  =  ,
pattern3  =  ;
pattern4  =  '
Back to top

Summary

In summary, if you wanted to see how to use multiple regex substitution patterns with one call to the replaceAll method, I hope this is helpful. As I showed, the general solution is to create your pattern with brackets and pipe symbols, like this:

[pattern1|pattern2|pattern3|pattern4]

As for the actual regular expressions, I’ll leave that up to you. ;)

Back to top

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.