Understanding the performance of Scala collections classes

Note: This is an excerpt from the Scala Cookbook (partially re-worded and re-formatted for the internet). This is Recipe 10.4, “Understanding the performance of Scala collections.”

Problem

When choosing a collection for an application where performance is extremely important, you want to choose the right Scala collection for the algorithm.

Solution

In many cases, you can reason about the performance of a collection by understanding its basic structure. For instance, a List is a singly-linked list. It’s not indexed, so if you need to access the one-millionth element of a List as list(1000000), that will be slower than accessing the one-millionth element of an Array, because the Array is indexed, whereas accessing the element in the List requires traversing 999,999 elements before getting to the last one. In other cases, it may help to look at the tables. For instance, Table 10-13 shows that the “append” operation on a Vector is eC, “effectively constant time.” As a result, I know I can create a large Vector in the REPL very quickly like this:

var v = Vector[Int]()
for (i <- 1 to 50000) v = v :+ i

However, as the table shows, the “append” operation on a List requires linear time, so attempting to create a List of the same size takes a much (much!) longer time.

With permission from EFPL, the tables in this recipe have been reproduced from this scala-lang.org URL.

Before looking at the performance tables, Table 10-12 shows the performance characteristic keys that are used in the other tables that follow.

Performance characteristics keys

Table 10-12. Performance characteristic keys for the subsequent tables

Key Description
C The operation takes (fast) constant time.
eC The operation takes effectively constant time, but this might depend on some assumptions, such as maximum length of a vector, or distribution of hash keys.
aC The operation takes amortized constant time. Some invocations of the operation might take longer, but if many operations are performed, on average only constant time per operation is taken.
Log The operation takes time proportional to the logarithm of the collection size.
L The operation is linear, so the time is proportional to the collection size.
- The operation is not supported.

Performance characteristics for operations on immutable and mutable sequential collections

Table 10-13 shows the performance characteristics for operations on immutable and mutable sequential collections.

Table 10-13. Performance characteristics for sequential collections

  head tail apply update prepend append insert

Immutable
List C C L L C L -
Stream C C L L C L -
Vector eC eC eC eC eC eC -
Stack C C L L C C L
Queue aC aC L L L C -
Range C C C - - - -
String C L C L L L -

Mutable
ArrayBuffer C L C C L aC L
ListBuffer C L L L C C L
StringBuilder C L C X L aC L
MutableList C L L L C C L
Queue C L L L C C L
ArraySeq C L C X - - -
Stack C L L L C L L
ArrayStack C L C C aC L L
Array C L C C - - -

Table 10-14 describes the column headings used in Table 10-13.

Table 10-14. Descriptions of the column headings for Table 10-13

Operation Description
head Selecting the first element of the sequence.
tail Producing a new sequence that consists of all elements of the sequence except the first one.
apply Indexing.
update Functional update for immutable sequences, side-effecting update (with update) for mutable sequences.
prepend Adding an element to the front of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence.
append Adding an element at the end of the sequence. For immutable sequences, this produces a new sequence. For mutable sequences, it modifies the existing sequence.
insert Inserting an element at an arbitrary position in the sequence. This is supported directly only for mutable sequences.

Map and set performance characteristics

Table 10-15 shows the performance characteristics for maps and sets.

Table 10-15. The performance characteristics for maps and sets

  lookup add remove min

Immutable
HashSet/HashMap eC eC eC L
TreeSet/TreeMap Log Log Log Log
BitSet C L L eC1
ListMap L L L L

Mutable
HashSet/HashMap eC eC eC L
WeakHashMap eC eC eC L
BitSet C aC C eC1
TreeSet Log Log Log Log

Table 10-16 provides descriptions for the column headings used in Table 10-15.

Table 10-16. Descriptions of the column headings used in Table 10-15

Operation Description
lookup Testing whether an element is contained in a set, or selecting a value associated with a map key.
add Adding a new element to a set or key/value pair to a map.
remove Removing an element from a set or a key from a map.
min The smallest element of the set, or the smallest key of a map.

See Also