Java FAQ: How can I use multiple regular expression patterns with the replaceAll
method in the Java String
class?
Here’s a little example that shows how to replace many regular expression (regex) patterns with one replacement string in Scala and Java. I’ll show all of this code in Scala’s interactive interpreter environment, but in this case Scala is very similar to Java, so the initial solution can easily be converted to Java.
1) A simple string
First, I create a simple String
:
scala> val s = "My dog ate all of the cheese; why, I don't know."
s: String = My dog ate all of the cheese; why, I don't know.
2) Replace multiple patterns in that string
In my case I want to remove all trailing periods, commas, semi-colons, and apostrophes from a string, so I use the String
class replaceAll
method with my regex pattern to remove all of those characters with one method call:
scala> val result = s.replaceAll("[\\.$|,|;|']", "")
result: String = My dog ate all of the cheese why I dont know
Because the substitution pattern is ""
, all of those characters are removed in the resulting string.
3) More explanation
In the last few days I’ve been working on parsing sentences into words, so the first thing I do is use the string split
method to convert a String
into an array of strings, which are essentially words in a sentence:
scala> val words = s.split(" ")
words: Array[String] = Array(My, dog, ate, all, of, the, cheese;, why,, I, don't, know.)
Next, what I want to do is remove all of these characters from the strings in that array:
. , ; '
In Scala I do this with the map
method that is available on an Array
:
scala> val cleanWords = words.map(_.replaceAll("[\\.$|,|;|']", ""))
cleanWords: Array[String] = Array(My, dog, ate, all, of, the, cheese, why, I, dont, know)
Multiple search patterns
As that example shows, this is my search pattern:
"[\\.$|,|;|']"
It is intended to find the following regex patterns in the given string:
- A period at the end of a word
- Commas
- Semi-colons
- Single apostrophes
The combination of the brackets and pipe characters is what makes this work. It’s often hard to read regex patterns, so it may help to look at that multiple search pattern regex more generally, like this:
[pattern1|pattern2|pattern3|pattern4]
where:
pattern1 = \\.$
pattern2 = ,
pattern3 = ;
pattern4 = '
Summary
In summary, if you wanted to see how to use multiple regex substitution patterns with one call to the replaceAll
method, I hope this is helpful. As I showed, the general solution is to create your pattern with brackets and pipe symbols, like this:
[pattern1|pattern2|pattern3|pattern4]
As for the actual regular expressions, I’ll leave that up to you. ;)