How to transform Scala collections with for/yield

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.13, “How to Transform One Scala Collection to Another with for/yield”

Problem

You want to create a new collection from an existing collection by transforming the elements with an algorithm.

Solution

Use the for/yield construct and your algorithm to create the new collection. For instance, starting with a basic collection:

scala> val a = Array(1, 2, 3, 4, 5)
a: Array[Int] = Array(1, 2, 3, 4, 5)

You can create a copy of that collection by just “yielding” each element (with no algorithm):

scala> for (e <- a) yield e
res0: Array[Int] = Array(1, 2, 3, 4, 5)

You can create a new collection where each element is twice the value of the original:

scala> for (e <- a) yield e * 2
res1: Array[Int] = Array(2, 4, 6, 8, 10)

You can determine the modulus of each element:

scala> for (e <- a) yield e % 2
res2: Array[Int] = Array(1, 0, 1, 0, 1)

This example converts a list of strings to uppercase:

scala> val fruits = Vector("apple", "banana", "lime", "orange")
fruits: Vector[String] = Vector(apple, banana, lime, orange)

scala> val ucFruits = for (e <- fruits) yield e.toUpperCase
ucFruits: Vector[String] = Vector(APPLE, BANANA, LIME, ORANGE)

Your algorithm can return whatever collection is needed. This approach converts the original collection into a sequence of Tuple2 elements:

scala> for (i <- 0 until fruits.length) yield (i, fruits(i))
res0: scala.collection.immutable.IndexedSeq[(Int, String)] =
      Vector((0,apple), (1,banana), (2,lime), (3,orange))

This algorithm yields a sequence of Tuple2 elements that contains each original string along with its length:

scala> for (f <- fruits) yield (f, f.length)
res1: Vector[(String, Int)] = Vector((apple,5), (banana,6), (lime,4), (orange,6))

If your algorithm takes multiple lines, include it in a block after the yield:

scala> val x = for (e <- fruits) yield {
     |     // imagine this required multiple lines
     |     val s = e.toUpperCase
     |     s
     | }
x: Vector[String] = List(APPLE, BANANA, LIME, ORANGE)

Given a Person class and a list of friend’s names like this:

case class Person (name: String)
val friends = Vector("Mark", "Regina", "Matt")

a for/yield loop can yield a collection of Person instances:

scala> for (f <- friends) yield Person(f)
res0: Vector[Person] = Vector(Person(Mark), Person(Regina), Person(Matt))

You can include if statements (guards) in a for-comprehension to filter elements:

scala> val x = for (e <- fruits if e.length < 6) yield e.toUpperCase
x: List[java.lang.String] = List(APPLE, LIME)

Discussion

This combination of a for loop and yield statement is known as a for comprehension or sequence comprehension. It yields a new collection from an existing collection.

For readability, I’ll refer to “for comprehension” as “for-comprehension.”

If you’re new to using the for/yield construct, it can help to think that is has a bucket or temporary holding area on the side. As each element from the original collection is operated on with yield and your algorithm, it’s added to that bucket. Then, when the for loop is finished iterating over the entire collection, all of the elements in the bucket are returned (yielded) by the expression.

In general, the collection type that’s returned by a for-comprehension will be the same type that you begin with. If you begin with an ArrayBuffer, you’ll end up with an ArrayBuffer:

scala> val fruits = scala.collection.mutable.ArrayBuffer("apple", "banana")
fruits: scala.collection.mutable.ArrayBuffer[java.lang.String] = ArrayBuffer(apple, banana)

scala> val x = for (e <- fruits) yield e.toUpperCase
x: scala.collection.mutable.ArrayBuffer[java.lang.String] = ArrayBuffer(APPLE, BANANA)

A List returns a List:

scala> val fruits = "apple" :: "banana" :: "orange" :: Nil
fruits: List[java.lang.String] = List(apple, banana, orange)

scala> val x = for (e <- fruits) yield e.toUpperCase
x: List[java.lang.String] = List(APPLE, BANANA, ORANGE)

However, as shown in the Solution, this isn’t always the case.

Using guards

When you add guards to a for-comprehension and want to write it as a multiline expression, the recommended coding style is to use curly braces rather than parentheses:

for {
    file <- files
    if hasSoundFileExtension(file)
    if !soundFileIsLong(file)
} yield file

This makes the code more readable, especially when the list of guards becomes long. See Recipe 3.3, “Using a for Loop with Embedded if Statements (Guards)”, more information on using guards.

When using guards, the resulting collection can end up being a different size than the input collection:

scala> val cars = Vector("Mercedes", "Porsche", "Tesla")
cars: Vector[String] = Vector(Mercedes, Porsche, Tesla)

scala> for {
     |     c <- cars
     |     if c.startsWith("M")
     | } yield c
res0: Vector[String] = Vector(Mercedes)

In fact, if none of the car names had matched the startsWith test, that code would return an empty Vector.

When I first started working with Scala I always used a for/yield expression to do this kind of work, but one day the light bulb went on and I realized that I could achieve the same result more concisely using the map method. The next recipe demonstrates how to use map to create a new collection from an existing collection.

See Also