How to create a lazy view on a Scala collection

This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 10.24, “How to Create a Lazy View on a Scala Collection”

Problem

You’re working with a large collection and want to create a “lazy” version of it so it will only compute and return results as they are actually needed.

Solution

Except for the Stream class, whenever you create an instance of a Scala collection class, you’re creating a strict version of the collection. This means that if you create a collection that contains one million elements, memory is allocated for all of those elements immediately. This is the way things normally work in a language like Java.

In Scala you can optionally create a view on a collection. A view makes the result non-strict, or lazy. This changes the resulting collection, so when it’s used with a transformer method, the elements will only be calculated as they are accessed, and not “eagerly,” as they normally would be.

A transformer method is a method that transforms an input collection into a new output collection, as described in the Discussion.

You can see the effect of creating a view on a collection by creating one Range without a view, and a second one with a view:

scala> 1 to 100
res0: scala.collection.immutable.Range.Inclusive =
      Range(1, 2, 3, 4, ... 98, 99, 100)

scala> (1 to 100).view
res0: java.lang.Object with 
      scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] =
      SeqView(...)

Creating the Range without a view shows what you expect: a Range with 100 elements. However, the Range with the view shows different output in the REPL, showing something called a SeqView. The signature of the SeqView shows:

  • Int is the type of the view’s elements.
  • The scala.collection.immutable.IndexedSeq[Int] portion of the output indicates the type you’ll get if you force the collection back to a “normal,” strict collection.

You can see this when you force the view back to a normal collection:

scala> val view = (1 to 100).view
view: java.lang.Object with 
      scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] =
      SeqView(...)

scala> val x = view.force
x: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, ... 98, 99, 100)

There are several ways to see the effect of adding a view to a collection. First, you’ll see that using a method like foreach doesn’t seem to change when using a view:

(1 to 100).foreach(println)
(1 to 100).view.foreach(println)

Both of those expressions will print 100 elements to the console. Because foreach isn’t a transformer method, the result is unaffected.

However, calling a map method with and without a view has dramatically different results:

scala> (1 to 100).map { _ * 2 }
res1: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6, ... 196, 198, 200)

scala> (1 to 100).view.map { _ * 2 }
res0: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)

These results are different because map is a transformer method. A fun way to further demonstrate this difference is with the following code:

val x = (1 to 1000).view.map { e =>
    Thread.sleep(10)
    e * 2
}

If you run that code as shown, it will return immediately, returning a SeqView as before. But if you remove the view method call, the code block will take about 10 seconds to run.

Discussion

The Scala documentation states that a view “constructs only a proxy for the result collection, and its elements get constructed only as one demands them ... A view is a special kind of collection that represents some base collection, but implements all transformers lazily.”

A transformer method is a method that constructs a new collection from an existing collection. This includes methods like map, filter, reverse, and many more. When you use these methods, you’re transforming the input collection to a new output collection.

This helps to explain why the foreach method prints the same result for a strict collection and its view: it’s not a transformer method. But the map method, and other transformer methods like reverse, treat the view in a lazy manner:

scala> l.reverse
res0: List[Int] = List(3, 2, 1)

scala> l.view.reverse
res1: scala.collection.SeqView[Int,List[Int]] = SeqViewR(...)

At the end of the Solution you saw this block of code:

val x = (1 to 1000).view.map { e =>
    Thread.sleep(10)
    e * 2
}

As mentioned, that code returns a SeqView immediately. But when you go to print the elements in x, like this:

x.foreach(print)

there will be a 10 millisecond pause before each element is printed. The elements are being “demanded” in this line of code, so the penalty of the Thread.sleep method call is paid as each element is yielded.

Use cases

There are two primary use cases for using a view:

  • Performance
  • To treat a collection like a database view

Regarding performance, assume that you get into a situation where you may (or may not) have to operate on a collection of a billion elements. You certainly want to avoid running an algorithm on a billion elements if you don’t have to, so using a view makes sense here.

The second use case lets you use a Scala view on a collection just like a database view. The following examples show how a collection view works like a database view:

// create a normal array
scala> val arr = (1 to 10).toArray
arr: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

// create a view on the array
scala> val view = arr.view.slice(2, 5)
view: scala.collection.mutable.IndexedSeqView[Int,Array[Int]] = SeqViewS(...)

// modify the array
scala> arr(2) = 42
// the view is affected:

scala> view.foreach(println)
42
4
5

// change the elements in the view
scala> view(0) = 10

scala> view(1) = 20

scala> view(2) = 30

// the array is affected:
scala> arr
res0: Array[Int] = Array(1, 2, 10, 20, 30, 6, 7, 8, 9, 10)

Changing the elements in the array updates the view, and changing the elements referenced by the view changes the elements in the array. When you need to modify a subset of elements in a collection, creating a view on the original collection and modifying the elements in the view can be a powerful way to achieve this goal.

As a final note, don’t confuse using a view with saving memory when creating a collection. Both of the following approaches will generate a “java.lang.OutOfMemoryError: Java heap space” error in the REPL:

val a = Array.range(0,123456789)
val a = Array.range(0,123456789).view

The benefit of using a view in regards to performance comes with how the view works with transformer methods.

See Also