Scala “split string” examples (field separator, delimiter)

Scala String FAQ: How do I split a String in Scala based on a field separator, such as a String I get from a comma-separated value (CSV) file or pipe-delimited file.

Solution

Use one of the split methods that are available on Scala/Java String objects. For instance, this example in the Scala REPL shows how to split a string based on a blank space:

scala> "hello world".split(" ")
res0: Array[java.lang.String] = Array(hello, world)

Notice that the split method returns an Array of String elements — Array[String] — which you can then work with as a normal Scala Array, such as printing the array elements in this example:

scala> "hello world".split(" ").foreach(println) 
hello
world

Real-world split-string example

Here’s a real-world example that shows how to split a URL-encoded string you might receive in a web application. Here I split this string by specifying the & character as the field separator or delimiter:

scala> val result = "oauth_token=FOO&oauth_token_secret=BAR&oauth_expires_in=3600"                                   result: java.lang.String = oauth_token=FOO&oauth_token_secret=BAR&oauth_expires_in=3600

scala> val nameValuePairs = result.split("&")
nameValuePairs: Array[java.lang.String] = Array(oauth_token=FOO, oauth_token_secret=BAR, oauth_expires_in=3600)

As you can see in the second line, I call the split method, telling it to use the & character to split my string into multiple strings. As you can see in the REPL output, the variable nameValuePairs is an Array of String type, and in this case, these are the name/value pairs I wanted.

Splitting a CSV string

Here’s an example that shows how to split a CSV string into a string array:

scala> val s = "eggs, milk, butter, Coco Puffs"
s: java.lang.String = eggs, milk, butter, Coco Puffs

// 1st attempt
scala> s.split(",")
res0: Array[java.lang.String] = Array(eggs, " milk", " butter", " Coco Puffs")

Note that when using this approach it’s best to trim each string. This is shown in the following code, where I use the map method to call trim on each string before returning the array:

// 2nd attempt, cleaned up
scala> s.split(",").map(_.trim)
res1: Array[java.lang.String] = Array(eggs, milk, butter, Coco Puffs)

NOTE: This isn’t a perfect solution because CSV rows can have additional commas inside of quotes. I’m just trying to show how this approach generally works, and how it works for simple CSV files.

Splitting a String using regular expressions

You can also split a string based on a regular expression (regex). This example shows how to split a string on whitespace characters:

scala> "hello world, this is Al".split("\\s+")
res0: Array[java.lang.String] = Array(hello, world,, this, is, Al)

For more examples of regular expressions, see the Java Pattern class, or see my common Java regular expression examples.

Details: Where the ‘split’ method comes from

The split method is overloaded, with some versions of the method coming from the Java String class and some coming from the Scala StringLike class. For instance, if you call split with a Char argument instead of a String argument, you’re using the split method from StringLike:

// split with a String argument
scala> "hello world".split(" ")
res0: Array[java.lang.String] = Array(hello, world)

// split with a Char argument
scala> "hello world".split(' ')
res1: Array[String] = Array(hello, world)

The subtle difference in that output -- Array[java.lang.String] versus Array[String] -- is a hint that something is different, but as a practical matter, this isn’t important. Also, with the Scala IDE project integrated into Eclipse, you can see where each method comes from when the Eclipse “code assist” dialog is displayed. (IntelliJ IDEA and NetBeans may show similar information.)

Note: The actual Scala class that contains the split method may change over time. At the time of this writing the split method is in the StringLike class, but as the Scala libraries are reorganized this may change.