How to use multiple regex patterns with replaceAll (Java String class)

Java FAQ: How can I use multiple regular expression patterns with the replaceAll method in the Java String class?

Here’s a little example that shows how to replace many regular expression (regex) patterns with one replacement string in Scala and Java. I’ll show all of this code in Scala’s interactive interpreter environment, but in this case Scala is very similar to Java, so the initial solution can easily be converted to Java.

1) A simple string

First, I create a simple String:

scala> val s = "My dog ate all of the cheese; why, I don't know."
s: String = My dog ate all of the cheese; why, I don't know.

2) Replace multiple patterns in that string

In my case I want to remove all trailing periods, commas, semi-colons, and apostrophes from a string, so I use the String class replaceAll method with my regex pattern to remove all of those characters with one method call:

scala> val result = s.replaceAll("[\\.$|,|;|']", "")
result: String = My dog ate all of the cheese why I dont know

Because the substitution pattern is "", all of those characters are removed in the resulting string.

3) More explanation

In the last few days I’ve been working on parsing sentences into words, so the first thing I do is use the string split method to convert a String into an array of strings, which are essentially words in a sentence:

scala> val words = s.split(" ")
words: Array[String] = Array(My, dog, ate, all, of, the, cheese;, why,, I, don't, know.)

Next, what I want to do is remove all of these characters from the strings in that array:

. , ; '

In Scala I do this with the map method that is available on an Array:

scala> val cleanWords = words.map(_.replaceAll("[\\.$|,|;|']", ""))
cleanWords: Array[String] = Array(My, dog, ate, all, of, the, cheese, why, I, dont, know)

Multiple search patterns

As that example shows, this is my search pattern:

"[\\.$|,|;|']"

It is intended to find the following regex patterns in the given string:

  • A period at the end of a word
  • Commas
  • Semi-colons
  • Single apostrophes

The combination of the brackets and pipe characters is what makes this work. It’s often hard to read regex patterns, so it may help to look at that multiple search pattern regex more generally, like this:

[pattern1|pattern2|pattern3|pattern4]

where:

pattern1  =  \\.$
pattern2  =  ,
pattern3  =  ;
pattern4  =  '

Summary

In summary, if you wanted to see how to use multiple regex substitution patterns with one call to the replaceAll method, I hope this is helpful. As I showed, the general solution is to create your pattern with brackets and pipe symbols, like this:

[pattern1|pattern2|pattern3|pattern4]

As for the actual regular expressions, I’ll leave that up to you. ;)