How to find regex patterns in Scala strings

This is an excerpt from the 1st Edition of the Scala Cookbook (#ad) (partially modified for the internet). This is Recipe 1.7, “Finding Patterns in Scala Strings.”

Problem

You need to determine whether a Scala String contains a regular expression pattern.

Solution

Create a Regex object by invoking the .r method on a String, and then use that pattern with findFirstIn when you’re looking for one match, and findAllIn when looking for all matches.

To demonstrate this, first create a Regex for the pattern you want to search for, in this case a sequence of one or more numeric characters:

scala> val numPattern = "[0-9]+".r
numPattern: scala.util.matching.Regex = [0-9]+

Next, create a sample String you can search:

scala> val address = "123 Main Street Suite 101"
address: java.lang.String = 123 Main Street Suite 101

The findFirstIn method finds the first match:

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)

(Notice that this method returns an Option[String]. I’ll dig into that in the Discussion.)

When looking for multiple matches, use the findAllIn method:

scala> val matches = numPattern.findAllIn(address)
matches: scala.util.matching.Regex.MatchIterator = non-empty iterator

As you can see, findAllIn returns an iterator, which lets you loop over the results:

scala> matches.foreach(println)
123
101

If findAllIn doesn’t find any results, an empty iterator is returned, so you can still write your code just like that — you don’t need to check to see if the result is null.

If you’d rather have the results as an Array, add the toArray method after the findAllIn call:

scala> val matches = numPattern.findAllIn(address).toArray
matches: Array[String] = Array(123, 101)

If there are no matches, this approach yields an empty Array. Other methods like toList, toSeq, and toVector are also available.

Discussion

Using the .r method on a String is the easiest way to create a Regex object. Another approach is to import the Regex class, create a Regex instance, and then use the instance in the same way:

scala> import scala.util.matching.Regex
import scala.util.matching.Regex

scala> val numPattern = new Regex("[0-9]+")
numPattern: scala.util.matching.Regex = [0-9]+

scala> val address = "123 Main Street Suite 101"
address: java.lang.String = 123 Main Street Suite 101

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)

Although this is a bit more work, it’s also more obvious. I’ve found that it can be easy to overlook the .r at the end of a String (and then spend a few minutes wondering how the code I saw could possibly work).

Handling the Option returned by findFirstIn

As mentioned in the Solution, the findFirstIn method finds the first match in the String and returns an Option[String]:

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)

The Option/Some/None pattern is discussed in detail in Recipe 20.6, but the simple way to think about an Option is that it’s a container that holds either zero or one values. In the case of findFirstIn, if it succeeds, it returns the string “123” as a Some(123), as shown in this example. However, if it fails to find the pattern in the string it’s searching, it will return a None, as shown here:

scala> val address = "No address given"
address: String = No address given

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = None

To summarize, a method defined to return an Option[String] will either return a Some(String), or a None.

The normal way to work with an Option is to use one of these approaches:

  • Use the Option in a match expression
  • Use the Option in a foreach loop
  • Call getOrElse on the value

Recipe 20.6 describes those approaches in detail, but they’re demonstrated here for your convenience.

A match expression provides a very readable solution to the problem, and is generally the preferred solution, especially by functional programmers, who routinely take advantage of pattern-matching:

match1 match {
    case Some(s) => println(s"Found: $s")
    case None => 
}

Because an Option is a collection of zero or one elements, an experienced Scala developer will also use a foreach loop in this situation:

numPattern.findFirstIn(address).foreach { e =>
    // perform the next step in your algorithm,
    // operating on the value 'e'
}

With the getOrElse approach you attempt to “get” the result, while also specifying a default value that should be used if the method failed:

scala> val result = numPattern.findFirstIn(address).getOrElse("no match")
result: String = 123

See Recipe 20.6 for more information.

Summary

To summarize this approach, the following REPL example shows the complete process of creating a Regex, searching a String with findFirstIn, and then using a foreach loop on the resulting match:

scala> val numPattern = "[0-9]+".r
numPattern: scala.util.matching.Regex = [0-9]+

scala> val address = "123 Main Street Suite 101"
address: String = 123 Main Street Suite 101

scala> val match1 = numPattern.findFirstIn(address)
match1: Option[String] = Some(123)

scala> match1.foreach { e =>
     |   println(s"Found a match: $e")
     | }
Found a match: 123

See Also