Note: This is an excerpt from the Scala Cookbook (partially re-worded and re-formatted for the internet). This is Recipe 10.4, “Understanding the performance of Scala collections.”
When choosing a collection for an application where performance is extremely important, you want to choose the right Scala collection for the algorithm.
In many cases, you can reason about the performance of a collection by understanding its basic structure. For instance, a
List is a singly-linked list. It’s not indexed, so if you need to access the one-millionth element of a
list(1000000), that will be slower than accessing the one-millionth element of an
Array, because the
Array is indexed, whereas accessing the element in the
List requires traversing 999,999 elements before getting to the last one. In other cases, it may help to look at the tables. For instance, Table 10-13 shows that the “append” operation on a
eC, “effectively constant time.” As a result, I know I can create a large
Vector in the REPL very quickly like this:
var v = Vector[Int]() for (i <- 1 to 50000) v = v :+ i
However, as the table shows, the “append” operation on a
List requires linear time, so attempting to create a
List of the same size takes a much (much!) longer time.
Before looking at the performance tables, Table 10-12 shows the performance characteristic keys that are used in the other tables that follow.
Performance characteristics keys
Table 10-12. Performance characteristic keys for the subsequent tables
|C||The operation takes (fast) constant time.|
|eC||The operation takes effectively constant time, but this might depend on some assumptions, such as maximum length of a vector, or distribution of hash keys.|
|aC||The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed, on average only constant time per operation is taken.|
|Log||The operation takes time proportional to the logarithm of the collection size.|
|L||The operation is linear, so the time is proportional to the collection size.|
|-||The operation is not supported.|
Performance characteristics for operations on immutable and mutable sequential collections
Table 10-13 shows the performance characteristics for operations on immutable and mutable sequential collections.
Table 10-13. Performance characteristics for sequential collections
Table 10-14 describes the column headings used in Table 10-13.
Table 10-14. Descriptions of the column headings for Table 10-13
|head||Selecting the first element of the sequence.|
|tail||Producing a new sequence that consists of all elements of the sequence except the first one.|
|update||Functional update for immutable sequences, side-effecting update (with update) for mutable sequences.|
|prepend||Adding an element to the front of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence.|
|append||Adding an element at the end of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence.|
|insert||Inserting an element at an arbitrary position in the sequence. This is supported directly only for mutable sequences.|
Map and set performance characteristics
Table 10-15 shows the performance characteristics for maps and sets.
Table 10-15. The performance characteristics for maps and sets
Table 10-16 provides descriptions for the column headings used in Table 10-15.
Table 10-16. Descriptions of the column headings used in Table 10-15
|lookup||Testing whether an element is contained in a set, or selecting a value associated with a map key.|
|add||Adding a new element to a set or key/value pair to a map.|
|remove||Removing an element from a set or a key from a map.|
|min||The smallest element of the set, or the smallest key of a map.|