How to choose a collection class in Scala

Note: This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 10.2.

Problem

Scala FAQ: How do I choose a Scala collection class to solve a particular problem?

Solution

To being with, there are three main categories of collection classes to choose from:

  • Sequence
  • Map
  • Set

As its name implies, a sequence is a sequential collection of elements and may be indexed or linear (a linked list). A map contains a collection of key/value pairs, like a Java Map, Ruby Hash, or Python dictionary. A set is a collection that contains no duplicate elements.

In addition to these three main categories, there are other useful collection types, including Stack, Queue, and Range. There are a few other classes that act like collections, including tuples, enumerations, and the Option/Some/None and Try/Success/Failure classes.

Choosing a sequence

When choosing a sequence — a sequential collection of elements — you have two main decisions:

  • Should the sequence be indexed (like an array), allowing rapid access to any elements, or should it be implemented as a linked list?
  • Do you want a mutable or immutable collection?

As of Scala 2.10, the recommended, general-purpose, “go to” sequential collections for the combinations of mutable/immutable and indexed/linear are shown in Table 10-1.

Table 10-1. Scala’s general-purpose sequential collections

  Immutable Mutable
Indexed Vector ArrayBuffer
Linear (Linked lists) List ListBuffer

As an example of how to read that table, if you want an immutable, indexed collection, in general you should use a Vector; if you want a mutable, indexed collection, use an ArrayBuffer (and so on).

While those are the general-purpose recommendations, there are many more sequence alternatives. The most common immutable sequence choices are shown in Table 10-2.

Table 10-2. Main immutable sequence choices

  IndexedSeq LinearSeq Description
List   A singly linked list. Suited for recursive algorithms that work by splitting the head from the remainder of the list.
Queue   A first-in, first-out data structure.
Range   A range of integer values.
Stack   A last-in, first-out data structure.
Stream   Similar to List, but it’s lazy and persistent. Good for a large or infinite sequence, similar to a Haskell List.
String   Can be treated as an immutable, indexed sequence of characters.
Vector   The “go to” immutable, indexed sequence. The Scaladoc describes it as, “Implemented as a set of nested arrays that’s efficient at splitting and joining.”

The most common mutable sequence choices are shown in Table 10-3. Queue and Stack are also in this table because there are mutable and immutable versions of these classes.

Table 10-3. Main mutable sequence choices

  IndexedSeq LinearSeq Description
Array   Backed by a Java array, its elements are mutable, but it can’t change in size.
ArrayBuffer   The “go to” class for a mutable, sequential collection. The amortized cost for appending elements is constant.
ArrayStack   A last-in, first-out data structure. Prefer over Stack when performance is important.
DoubleLinkedList   Like a singly linked list, but with a prev method as well. The documentation states, “the additional links make element removal very fast.”
LinkedList   A mutable, singly linked list.
ListBuffer   Like an ArrayBuffer, but backed by a list. The documentation states, “If you plan to convert the buffer to a list, use ListBuffer instead of ArrayBuffer.” Offers constant-time prepend and append; most other operations are linear.
MutableList   A mutable, singly linked list with constant-time append.
Queue   A first-in, first-out data structure.
Stack   A last-in, first-out data structure. (The documentation suggests that an ArrayStack is slightly more efficient.)
StringBuilder   Used to build strings, as in a loop. Like the Java StringBuilder.

In addition to the information shown in these tables, performance can be a consideration. See Recipe 10.4, “Understanding the Performance of Collections”, if performance is important to your selection process.

When creating an API for a library, you may want to refer to your sequences in terms of their superclasses. Table 10-4 shows the traits that are often used when referring generically to a collection in an API.

Table 10-4. Traits commonly used in library APIs

Trait Description
IndexedSeq Implies that random access of elements is efficient.
LinearSeq Implies that linear access to elements is efficient.
Seq Used when it isn’t important to indicate that the sequence is indexed or linear in nature.

Of course if the collection you’re returning can be even more generic, you can also refer to the collections as Iterable or Traversable.

If you’re coming to Scala from Java, this is the rough equivalent of declaring that a Java method returns List or Collection.

You can also learn more about declaring the type a method returns by looking at the “code assist” tool in your IDE. For instance, when I create a new Vector in Eclipse and then look at the methods available on a Vector instance, I see that the methods return types like GenSeqLike, IndexedSeqLike, IterableLike, TraversableLike, and TraversableOnce. You don’t have to be this specific with the types your methods return -- certainly not initially -- but it’s usually a good practice to identify the intent of what you’re returning, so you can declare these more general types once you get used to them.

Choosing a map

Choosing a map class is easier than choosing a sequence. There are the base mutable and immutable Map classes, a SortedMap trait to keep elements in sorted order by key, a LinkedHashMap to store elements in insertion order, and a few other maps for special purposes. These options are shown in Table 10-5. (Quotes in the descriptions come from the Scaladoc for each class.)

Table 10-5. Common map choices, including whether immutable or mutable versions are available

  Immutable Mutable Description
HashMap The immutable version “implements maps using a hash trie”; the mutable version “implements maps using a hashtable.”
LinkedHashMap   “Implements mutable maps using a hashtable.” Returns elements by the order in which they were inserted.
ListMap A map implemented using a list data structure. Returns elements in the opposite order by which they were inserted, as though each element is inserted at the head of the map.
Map The base map, with both mutable and immutable implementations.
SortedMap   A base trait that stores its keys in sorted order. (Creating a variable as a SortedMap currently returns a TreeMap.)
TreeMap   An immutable, sorted map, implemented as a red-black tree.
WeakHashMap   A hash map with weak references, it’s a wrapper around java.util.WeakHashMap.

You can also create a thread-safe mutable map by mixing the SynchronizedMap trait into the map implementation you want. See the map discussion in the “Scala Collections Overview” for more information.

Choosing a set

Choosing a set is similar to choosing a map. There are base mutable and immutable set classes, a SortedSet to return elements in sorted order by key, a LinkedHashSet to store elements in insertion order, and a few other sets for special purposes. The common classes are shown in Table 10-6. (Quotes in the descriptions come from the Scaladoc for each class.)

Table 10-6. Common set choices, including whether immutable or mutable versions are available

  Immutable Mutable Description
BitSet A set of “non-negative integers represented as variable-size arrays of bits packed into 64-bit words.” Used to save memory when you have a set of integers.
HashSet The immutable version “implements sets using a hash trie”; the mutable version “implements sets using a hashtable.”
LinkedHashSet   A mutable set implemented using a hashtable. Returns elements in the order in which they were inserted.
ListSet   A set implemented using a list structure.
TreeSet The immutable version “implements immutable sets using a tree.” The mutable version is a mutable SortedSet with “an immutable AVL Tree as underlying data structure.”
Set Generic base traits, with both mutable and immutable implementations.
SortedSet A base trait. (Creating a variable as a SortedSet returns a TreeSet.)

You can also create a thread-safe mutable set by mixing the SynchronizedSet trait into the set implementation you want. See the “Scala Collections Overview” discussion of maps and sets for more information.

Types that act like collections

Scala offers many other collection types, and some types that act like collections.

Table 10-7 provides descriptions of several types that act somewhat like collections, even though they aren’t.

Table 10-7. Other collections classes (and types that act like collections)

Class Description
Enumeration A finite collection of constant values (i.e., the days in a week or months in a year).
Iterator An iterator isn’t a collection; instead, it gives you a way to access the elements in a collection. It does, however, define many of the methods you’ll see in a normal collection class, including foreach, map, flatMap, etc. You can also convert an iterator to a collection when needed.
Option Acts as a collection that contains zero or one elements. The Some class and None object extend Option.
Some, None Some is a container for one element, and None holds zero elements.
Tuple Supports a heterogeneous collection of elements. There is no one “Tuple” class; tuples are implemented as case classes ranging from Tuple1 to Tuple22, which support 1 to 22 elements.

Transformer methods

To understand strict and lazy collections, it helps to first understand the concept of a transformer method:

A transformer method is a method that constructs a new collection from an existing collection, typically by applying an algorithm to the input collection.

This includes methods like map, filter, reverse, etc. -- any method that transforms the input collection to a new output collection.

Strict and lazy collections

Given that definition, collections can also be thought of in terms of being strict or lazy. In a strict collection, memory for the elements is allocated immediately, and all of its elements are immediately evaluated when a transformer method is invoked. In a lazy collection, memory for the elements is not allocated immediately, and transformer methods do not construct new elements until they are demanded.

All of the collection classes except Stream are strict, but the other collection classes can be converted to a lazy collection by creating a view on the collection. See Recipe 10.24, “Creating a Lazy View on a Collection”, for more information on this approach.

See Also

  • In addition to my own experience using the collections, most of the information used to create these tables comes from the Scaladoc of each type, and the Scala Collections Overview documentation.
  • Recipe 10.1, “Understanding the Collections Hierarchy”
  • Recipe 10.4, “Understanding the Performance of Collections”