How to split strings in Scala

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 1.3, “How to Split Strings in Scala.”

Problem

You want to split a Scala string into parts based on a field separator, such as a string you get from a CSV or pipe-delimited file.

Solution

Use one of the split methods that are available on String objects:

scala> "hello world".split(" ")
res0: Array[java.lang.String] = Array(hello, world)

The split method returns an array of String elements, which you can then treat as a normal Scala Array:

scala> "hello world".split(" ").foreach(println)
hello
world

Discussion

The string that the split method takes can be a regular expression, so you can split a string on simple characters like a comma in a CSV file:

scala> val s = "eggs, milk, butter, Coco Puffs"
s: java.lang.String = eggs, milk, butter, Coco Puffs

// 1st attempt
scala> s.split(",")
res0: Array[java.lang.String] = Array(eggs, " milk", " butter", " Coco Puffs")

Using this approach, it’s best to trim each string. Use the map method to call trim on each string before returning the array:

// 2nd attempt, cleaned up
scala> s.split(",").map(_.trim)
res1: Array[java.lang.String] = Array(eggs, milk, butter, Coco Puffs)

You can also split a string based on a regular expression. This example shows how to split a string on whitespace characters:

scala> "hello world, this is Al".split("\\s+")
res0: Array[java.lang.String] = Array(hello, world,, this, is, Al)

About that split method ...

The split method is overloaded, with some versions of the method coming from the Java String class and some coming from the Scala StringLike class. For instance, if you call split with a Char argument instead of a String argument, you’re using the split method from StringLike:

// split with a String argument
scala> "hello world".split(" ")
res0: Array[java.lang.String] = Array(hello, world)

// split with a Char argument
scala> "hello world".split(' ')
res1: Array[String] = Array(hello, world)

The subtle difference in that output — Array[java.lang.String] versus Array[String] — is a hint that something is different, but as a practical matter, this isn’t important. (But if you’re interested in these details, you can learn more about them with the “code assist” dialogs in Scala IDEs like Eclipse and Intellij IDEA.)