Note: This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.2.Back to top
Scala FAQ: How do I choose a Scala collection class to solve a particular problem?Back to top
To being with, there are three main categories of collection classes to choose from:
As its name implies, a sequence is a sequential collection of elements and may be indexed or linear (a linked list). A map contains a collection of key/value pairs, like a Java
Hash, or Python dictionary. A
set is a collection that contains no duplicate elements.
In addition to these three main categories, there are other useful collection types, including
Range. There are a few other classes that act like collections, including tuples, enumerations, and the Option/Some/None and Try/Success/Failure classes.
Choosing a sequence
When choosing a sequence — a sequential collection of elements — you have two main decisions:
- Should the sequence be indexed (like an array), allowing rapid access to any elements, or should it be implemented as a linked list?
- Do you want a mutable or immutable collection?
As of Scala 2.10, the recommended, general-purpose, “go to” sequential collections for the combinations of mutable/immutable and indexed/linear are shown in Table 10-1.
Table 10-1. Scala’s general-purpose sequential collections
|Linear (Linked lists)||List||ListBuffer|
As an example of how to read that table, if you want an immutable, indexed collection, in general you should use a
Vector; if you want a mutable, indexed collection, use an
ArrayBuffer (and so on).
While those are the general-purpose recommendations, there are many more sequence alternatives. The most common immutable sequence choices are shown in Table 10-2.
Table 10-2. Main immutable sequence choices
|List||✓||A singly linked list. Suited for recursive algorithms that work by splitting the head from the remainder of the list.|
|Queue||✓||A first-in, first-out data structure.|
|Range||✓||A range of integer values.|
|Stack||✓||A last-in, first-out data structure.|
|Stream||✓||Similar to List, but it’s lazy and persistent. Good for a large or infinite sequence, similar to a Haskell List.|
|String||✓||Can be treated as an immutable, indexed sequence of characters.|
|Vector||✓||The “go to” immutable, indexed sequence. The Scaladoc describes it as, “Implemented as a set of nested arrays that’s efficient at splitting and joining.”|
The most common mutable sequence choices are shown in Table 10-3.
Stack are also in this table because there are mutable and immutable versions of these classes.
Table 10-3. Main mutable sequence choices
|Array||✓||Backed by a Java array, its elements are mutable, but it can’t change in size.|
|ArrayBuffer||✓||The “go to” class for a mutable, sequential collection. The amortized cost for appending elements is constant.|
|ArrayStack||✓||A last-in, first-out data structure. Prefer over Stack when performance is important.|
|DoubleLinkedList||✓||Like a singly linked list, but with a
|LinkedList||✓||A mutable, singly linked list.|
|ListBuffer||✓||Like an ArrayBuffer, but backed by a list. The documentation states, “If you plan to convert the buffer to a list, use
|MutableList||✓||A mutable, singly linked list with constant-time append.|
|Queue||✓||A first-in, first-out data structure.|
|Stack||✓||A last-in, first-out data structure. (The documentation suggests that an ArrayStack is slightly more efficient.)|
|StringBuilder||✓||Used to build strings, as in a loop. Like the Java
In addition to the information shown in these tables, performance can be a consideration. See Recipe 10.4, “Understanding the Performance of Collections”, if performance is important to your selection process.
When creating an API for a library, you may want to refer to your sequences in terms of their superclasses. Table 10-4 shows the traits that are often used when referring generically to a collection in an API.
Table 10-4. Traits commonly used in library APIs
||Implies that random access of elements is efficient.|
||Implies that linear access to elements is efficient.|
||Used when it isn’t important to indicate that the sequence is indexed or linear in nature.|
Of course if the collection you’re returning can be even more generic, you can also refer to the collections as
If you’re coming to Scala from Java, this is the rough equivalent of declaring that a Java method returns
You can also learn more about declaring the type a method returns by looking at the “code assist” tool in your IDE. For instance, when I create a new
Vector in Eclipse and then look at the methods available on a
Vector instance, I see that the methods return types like
TraversableOnce. You don’t have to be this specific with the types your methods return -- certainly not initially -- but it’s usually a good practice to identify the intent of what you’re returning, so you can declare these more general types once you get used to them.
Choosing a map
Choosing a map class is easier than choosing a sequence. There are the base mutable and immutable
Map classes, a
SortedMap trait to keep elements in sorted order by key, a
LinkedHashMap to store elements in insertion order, and a few other maps for special purposes. These options are shown in Table 10-5. (Quotes in the descriptions come from the Scaladoc for each class.)
Table 10-5. Common map choices, including whether immutable or mutable versions are available
||✓||✓||The immutable version “implements maps using a hash trie”; the mutable version “implements maps using a hashtable.”|
||✓||“Implements mutable maps using a hashtable.” Returns elements by the order in which they were inserted.|
||✓||✓||A map implemented using a list data structure. Returns elements in the opposite order by which they were inserted, as though each element is inserted at the head of the map.|
||✓||✓||The base map, with both mutable and immutable implementations.|
||✓||A base trait that stores its keys in sorted order. (Creating a variable as a
||✓||An immutable, sorted map, implemented as a red-black tree.|
||✓||A hash map with weak references, it’s a wrapper around java.util.WeakHashMap.|
You can also create a thread-safe mutable map by mixing the
SynchronizedMap trait into the map implementation you want. See the map discussion in the “Scala Collections Overview” for more information.
Choosing a set
Choosing a set is similar to choosing a map. There are base mutable and immutable set classes, a
SortedSet to return elements in sorted order by key, a
LinkedHashSet to store elements in insertion order, and a few other sets for special purposes. The common classes are shown in Table 10-6. (Quotes in the descriptions come from the Scaladoc for each class.)
Table 10-6. Common set choices, including whether immutable or mutable versions are available
||✓||✓||A set of “non-negative integers represented as variable-size arrays of bits packed into 64-bit words.” Used to save memory when you have a set of integers.|
||✓||✓||The immutable version “implements sets using a hash trie”; the mutable version “implements sets using a hashtable.”|
||✓||A mutable set implemented using a hashtable. Returns elements in the order in which they were inserted.|
||✓||A set implemented using a list structure.|
||✓||✓||The immutable version “implements immutable sets using a tree.” The mutable version is a mutable SortedSet with “an immutable AVL Tree as underlying data structure.”|
||✓||✓||Generic base traits, with both mutable and immutable implementations.|
||✓||✓||A base trait. (Creating a variable as a SortedSet returns a TreeSet.)|
You can also create a thread-safe mutable set by mixing the
SynchronizedSet trait into the set implementation you want. See the “Scala Collections Overview” discussion of maps and sets for more information.
Types that act like collections
Scala offers many other collection types, and some types that act like collections.
Table 10-7 provides descriptions of several types that act somewhat like collections, even though they aren’t.
Table 10-7. Other collections classes (and types that act like collections)
||A finite collection of constant values (i.e., the days in a week or months in a year).|
||An iterator isn’t a collection; instead, it gives you a way to access the elements in a collection. It does, however, define many of the methods you’ll see in a normal collection class, including
||Acts as a collection that contains zero or one elements. The
||Supports a heterogeneous collection of elements. There is no one “Tuple” class; tuples are implemented as case classes ranging from
To understand strict and lazy collections, it helps to first understand the concept of a transformer method:
A transformer method is a method that constructs a new collection from an existing collection, typically by applying an algorithm to the input collection.
This includes methods like
reverse, etc. -- any method that transforms the input collection to a new output collection.
Strict and lazy collections
Given that definition, collections can also be thought of in terms of being strict or lazy. In a strict collection, memory for the elements is allocated immediately, and all of its elements are immediately evaluated when a transformer method is invoked. In a lazy collection, memory for the elements is not allocated immediately, and transformer methods do not construct new elements until they are demanded.
All of the collection classes except
Stream are strict, but the other collection classes can be converted to a lazy collection by creating a view on the collection. See Recipe 10.24, “Creating a Lazy View on a Collection”, for more information on this approach.
- In addition to my own experience using the collections, most of the information used to create these tables comes from the Scaladoc of each type, and the Scala Collections Overview documentation.
- Recipe 10.1, “Understanding the Collections Hierarchy”
- Recipe 10.4, “Understanding the Performance of Collections”
The Scala Cookbook
This tutorial is sponsored by the Scala Cookbook, which I wrote for O’Reilly:
You can find the Scala Cookbook at these locations:
All the best,