Note: This is an excerpt from the Scala Cookbook (partially re-worded and re-formatted for the internet). This is Recipe 10.4, “Understanding the performance of Scala collections.”
Problem
When choosing a collection for an application where performance is extremely important, you want to choose the right Scala collection for the algorithm.
Solution
In many cases, you can reason about the performance of a collection by understanding its basic structure. For instance, a List
is a singly-linked list. It’s not indexed, so if you need to access the one-millionth element of a List
as list(1000000)
, that will be slower than accessing the one-millionth element of an Array
, because the Array
is indexed, whereas accessing the element in the List
requires traversing 999,999 elements before getting to the last one. In other cases, it may help to look at the tables. For instance, Table 10-13 shows that the “append” operation on a Vector
is eC
, “effectively constant time.” As a result, I know I can create a large Vector
in the REPL very quickly like this:
var v = Vector[Int]() for (i <- 1 to 50000) v = v :+ i
However, as the table shows, the “append” operation on a List
requires linear time, so attempting to create a List
of the same size takes a much (much!) longer time.
With permission from EFPL, the tables in this recipe have been reproduced from this scala-lang.org URL.
Before looking at the performance tables, Table 10-12 shows the performance characteristic keys that are used in the other tables that follow.
Performance characteristics keys
Table 10-12. Performance characteristic keys for the subsequent tables
Key | Description |
---|---|
C | The operation takes (fast) constant time. |
eC | The operation takes effectively constant time, but this might depend on some assumptions, such as maximum length of a vector, or distribution of hash keys. |
aC | The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed, on average only constant time per operation is taken. |
Log | The operation takes time proportional to the logarithm of the collection size. |
L | The operation is linear, so the time is proportional to the collection size. |
- | The operation is not supported. |
Performance characteristics for operations on immutable and mutable sequential collections
Table 10-13 shows the performance characteristics for operations on immutable and mutable sequential collections.
Table 10-13. Performance characteristics for sequential collections
head | tail | apply | update | prepend | append | insert | |
---|---|---|---|---|---|---|---|
Immutable |
|||||||
List | C | C | L | L | C | L | - |
Stream | C | C | L | L | C | L | - |
Vector | eC | eC | eC | eC | eC | eC | - |
Stack | C | C | L | L | C | C | L |
Queue | aC | aC | L | L | L | C | - |
Range | C | C | C | - | - | - | - |
String | C | L | C | L | L | L | - |
Mutable |
|||||||
ArrayBuffer | C | L | C | C | L | aC | L |
ListBuffer | C | L | L | L | C | C | L |
StringBuilder | C | L | C | X | L | aC | L |
MutableList | C | L | L | L | C | C | L |
Queue | C | L | L | L | C | C | L |
ArraySeq | C | L | C | X | - | - | - |
Stack | C | L | L | L | C | L | L |
ArrayStack | C | L | C | C | aC | L | L |
Array | C | L | C | C | - | - | - |
Table 10-14 describes the column headings used in Table 10-13.
Table 10-14. Descriptions of the column headings for Table 10-13
Operation | Description |
---|---|
head | Selecting the first element of the sequence. |
tail | Producing a new sequence that consists of all elements of the sequence except the first one. |
apply | Indexing. |
update | Functional update for immutable sequences, side-effecting update (with update) for mutable sequences. |
prepend | Adding an element to the front of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence. |
append | Adding an element at the end of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence. |
insert | Inserting an element at an arbitrary position in the sequence. This is supported directly only for mutable sequences. |
Map and set performance characteristics
Table 10-15 shows the performance characteristics for maps and sets.
Table 10-15. The performance characteristics for maps and sets
lookup | add | remove | min | |
---|---|---|---|---|
Immutable |
||||
HashSet/HashMap | eC | eC | eC | L |
TreeSet/TreeMap | Log | Log | Log | Log |
BitSet | C | L | L | eC1 |
ListMap | L | L | L | L |
Mutable |
||||
HashSet/HashMap | eC | eC | eC | L |
WeakHashMap | eC | eC | eC | L |
BitSet | C | aC | C | eC1 |
TreeSet | Log | Log | Log | Log |
Table 10-16 provides descriptions for the column headings used in Table 10-15.
Table 10-16. Descriptions of the column headings used in Table 10-15
Operation | Description |
---|---|
lookup | Testing whether an element is contained in a set, or selecting a value associated with a map key. |
add | Adding a new element to a set or key/value pair to a map. |
remove | Removing an element from a set or a key from a map. |
min | The smallest element of the set, or the smallest key of a map. |
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |
See Also
- The tables in this recipe have been reproduced from the following URL, with permission from the Programming Methods Laboratory of EFPL: The Programming Methods Laboratory of EFPL
- Although it may be a little dated now, Li Haoyi’s blog has some benchmarks on Scala collections