Table of Contents
This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 10.24, “How to Create a Lazy View on a Scala Collection”
Problem
You’re working with a large collection and want to create a “lazy” version of it so it will only compute and return results as they are actually needed.
Solution
Except for the Stream
class, whenever you create an instance of a Scala collection class, you’re creating a strict version of the collection. This means that if you create a collection that contains one million elements, memory is allocated for all of those elements immediately. This is the way things normally work in a language like Java.
In Scala you can optionally create a view on a collection. A view makes the result non-strict, or lazy. This changes the resulting collection, so when it’s used with a transformer method, the elements will only be calculated as they are accessed, and not “eagerly,” as they normally would be.
A transformer method is a method that transforms an input collection into a new output collection, as described in the Discussion.
You can see the effect of creating a view on a collection by creating one Range
without a view, and a second one with a view:
scala> 1 to 100 res0: scala.collection.immutable.Range.Inclusive = Range(1, 2, 3, 4, ... 98, 99, 100) scala> (1 to 100).view res0: java.lang.Object with scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] = SeqView(...)
Creating the Range
without a view shows what you expect: a Range
with 100 elements. However, the Range
with the view shows different output in the REPL, showing something called a SeqView
. The signature of the SeqView
shows:
Int
is the type of the view’s elements.- The scala.collection.immutable.IndexedSeq[Int] portion of the output indicates the type you’ll get if you force the collection back to a “normal,” strict collection.
You can see this when you force the view back to a normal collection:
scala> val view = (1 to 100).view view: java.lang.Object with scala.collection.SeqView[Int,scala.collection.immutable.IndexedSeq[Int]] = SeqView(...) scala> val x = view.force x: scala.collection.immutable.IndexedSeq[Int] = Vector(1, 2, 3, ... 98, 99, 100)
There are several ways to see the effect of adding a view to a collection. First, you’ll see that using a method like foreach
doesn’t seem to change when using a view:
(1 to 100).foreach(println) (1 to 100).view.foreach(println)
Both of those expressions will print 100 elements to the console. Because foreach
isn’t a transformer method, the result is unaffected.
However, calling a map
method with and without a view has dramatically different results:
scala> (1 to 100).map { _ * 2 } res1: scala.collection.immutable.IndexedSeq[Int] = Vector(2, 4, 6, ... 196, 198, 200) scala> (1 to 100).view.map { _ * 2 } res0: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)
These results are different because map
is a transformer method. A fun way to further demonstrate this difference is with the following code:
val x = (1 to 1000).view.map { e => Thread.sleep(10) e * 2 }
If you run that code as shown, it will return immediately, returning a SeqView
as before. But if you remove the view
method call, the code block will take about 10 seconds to run.
Discussion
The Scala documentation states that a view “constructs only a proxy for the result collection, and its elements get constructed only as one demands them ... A view is a special kind of collection that represents some base collection, but implements all transformers lazily.”
A transformer method is a method that constructs a new collection from an existing collection. This includes methods like map
, filter
, reverse
, and many more. When you use these methods, you’re transforming the input collection to a new output collection.
This helps to explain why the foreach
method prints the same result for a strict collection and its view: it’s not a transformer method. But the map
method, and other transformer methods like reverse
, treat the view in a lazy manner:
scala> l.reverse res0: List[Int] = List(3, 2, 1) scala> l.view.reverse res1: scala.collection.SeqView[Int,List[Int]] = SeqViewR(...)
At the end of the Solution you saw this block of code:
val x = (1 to 1000).view.map { e => Thread.sleep(10) e * 2 }
As mentioned, that code returns a SeqView
immediately. But when you go to print the elements in x
, like this:
x.foreach(print)
there will be a 10 millisecond pause before each element is printed. The elements are being “demanded” in this line of code, so the penalty of the Thread.sleep
method call is paid as each element is yielded.
Use cases
There are two primary use cases for using a view:
- Performance
- To treat a collection like a database view
Regarding performance, assume that you get into a situation where you may (or may not) have to operate on a collection of a billion elements. You certainly want to avoid running an algorithm on a billion elements if you don’t have to, so using a view makes sense here.
The second use case lets you use a Scala view on a collection just like a database view. The following examples show how a collection view works like a database view:
// create a normal array scala> val arr = (1 to 10).toArray arr: Array[Int] = Array(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) // create a view on the array scala> val view = arr.view.slice(2, 5) view: scala.collection.mutable.IndexedSeqView[Int,Array[Int]] = SeqViewS(...) // modify the array scala> arr(2) = 42 // the view is affected: scala> view.foreach(println) 42 4 5 // change the elements in the view scala> view(0) = 10 scala> view(1) = 20 scala> view(2) = 30 // the array is affected: scala> arr res0: Array[Int] = Array(1, 2, 10, 20, 30, 6, 7, 8, 9, 10)
Changing the elements in the array updates the view, and changing the elements referenced by the view changes the elements in the array. When you need to modify a subset of elements in a collection, creating a view on the original collection and modifying the elements in the view can be a powerful way to achieve this goal.
As a final note, don’t confuse using a view with saving memory when creating a collection. Both of the following approaches will generate a “java.lang.OutOfMemoryError: Java heap space” error in the REPL:
val a = Array.range(0,123456789) val a = Array.range(0,123456789).view
The benefit of using a view in regards to performance comes with how the view works with transformer methods.
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |