Note: This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 10.2.
Problem
Scala FAQ: How do I choose a Scala collection class to solve a particular problem?
Solution
To being with, there are three main categories of collection classes to choose from:
- Sequence
- Map
- Set
As its name implies, a sequence is a sequential collection of elements and may be indexed or linear (a linked list). A map contains a collection of key/value pairs, like a Java Map
, Ruby Hash
, or Python dictionary. A set
is a collection that contains no duplicate elements.
In addition to these three main categories, there are other useful collection types, including Stack
, Queue
, and Range
. There are a few other classes that act like collections, including tuples, enumerations, and the Option/Some/None and Try/Success/Failure classes.
Choosing a sequence
When choosing a sequence — a sequential collection of elements — you have two main decisions:
- Should the sequence be indexed (like an array), allowing rapid access to any elements, or should it be implemented as a linked list?
- Do you want a mutable or immutable collection?
As of Scala 2.10, the recommended, general-purpose, “go to” sequential collections for the combinations of mutable/immutable and indexed/linear are shown in Table 10-1.
Table 10-1. Scala’s general-purpose sequential collections
Immutable | Mutable | |
---|---|---|
Indexed | Vector | ArrayBuffer |
Linear (Linked lists) | List | ListBuffer |
As an example of how to read that table, if you want an immutable, indexed collection, in general you should use a Vector
; if you want a mutable, indexed collection, use an ArrayBuffer
(and so on).
While those are the general-purpose recommendations, there are many more sequence alternatives. The most common immutable sequence choices are shown in Table 10-2.
Table 10-2. Main immutable sequence choices
IndexedSeq | LinearSeq | Description | |
---|---|---|---|
List | ✓ | A singly linked list. Suited for recursive algorithms that work by splitting the head from the remainder of the list. | |
Queue | ✓ | A first-in, first-out data structure. | |
Range | ✓ | A range of integer values. | |
Stack | ✓ | A last-in, first-out data structure. | |
Stream | ✓ | Similar to List, but it’s lazy and persistent. Good for a large or infinite sequence, similar to a Haskell List. | |
String | ✓ | Can be treated as an immutable, indexed sequence of characters. | |
Vector | ✓ | The “go to” immutable, indexed sequence. The Scaladoc describes it as, “Implemented as a set of nested arrays that’s efficient at splitting and joining.” |
The most common mutable sequence choices are shown in Table 10-3. Queue
and Stack
are also in this table because there are mutable and immutable versions of these classes.
Table 10-3. Main mutable sequence choices
IndexedSeq | LinearSeq | Description | |
---|---|---|---|
Array | ✓ | Backed by a Java array, its elements are mutable, but it can’t change in size. | |
ArrayBuffer | ✓ | The “go to” class for a mutable, sequential collection. The amortized cost for appending elements is constant. | |
ArrayStack | ✓ | A last-in, first-out data structure. Prefer over Stack when performance is important. | |
DoubleLinkedList | ✓ | Like a singly linked list, but with a prev method as well. The documentation states, “the additional links make element removal very fast.” |
|
LinkedList | ✓ | A mutable, singly linked list. | |
ListBuffer | ✓ | Like an ArrayBuffer, but backed by a list. The documentation states, “If you plan to convert the buffer to a list, use ListBuffer instead of ArrayBuffer.” Offers constant-time prepend and append; most other operations are linear. |
|
MutableList | ✓ | A mutable, singly linked list with constant-time append. | |
Queue | ✓ | A first-in, first-out data structure. | |
Stack | ✓ | A last-in, first-out data structure. (The documentation suggests that an ArrayStack is slightly more efficient.) | |
StringBuilder | ✓ | Used to build strings, as in a loop. Like the Java StringBuilder . |
In addition to the information shown in these tables, performance can be a consideration. See Recipe 10.4, “Understanding the Performance of Collections”, if performance is important to your selection process.
When creating an API for a library, you may want to refer to your sequences in terms of their superclasses. Table 10-4 shows the traits that are often used when referring generically to a collection in an API.
Table 10-4. Traits commonly used in library APIs
Trait | Description |
---|---|
IndexedSeq |
Implies that random access of elements is efficient. |
LinearSeq |
Implies that linear access to elements is efficient. |
Seq |
Used when it isn’t important to indicate that the sequence is indexed or linear in nature. |
Of course if the collection you’re returning can be even more generic, you can also refer to the collections as Iterable
or Traversable
.
If you’re coming to Scala from Java, this is the rough equivalent of declaring that a Java method returns
List
orCollection
.
You can also learn more about declaring the type a method returns by looking at the “code assist” tool in your IDE. For instance, when I create a new Vector
in Eclipse and then look at the methods available on a Vector
instance, I see that the methods return types like GenSeqLike
, IndexedSeqLike
, IterableLike
, TraversableLike
, and TraversableOnce
. You don’t have to be this specific with the types your methods return -- certainly not initially -- but it’s usually a good practice to identify the intent of what you’re returning, so you can declare these more general types once you get used to them.
Choosing a map
Choosing a map class is easier than choosing a sequence. There are the base mutable and immutable Map
classes, a SortedMap
trait to keep elements in sorted order by key, a LinkedHashMap
to store elements in insertion order, and a few other maps for special purposes. These options are shown in Table 10-5. (Quotes in the descriptions come from the Scaladoc for each class.)
Table 10-5. Common map choices, including whether immutable or mutable versions are available
Immutable | Mutable | Description | |
---|---|---|---|
HashMap |
✓ | ✓ | The immutable version “implements maps using a hash trie”; the mutable version “implements maps using a hashtable.” |
LinkedHashMap |
✓ | “Implements mutable maps using a hashtable.” Returns elements by the order in which they were inserted. | |
ListMap |
✓ | ✓ | A map implemented using a list data structure. Returns elements in the opposite order by which they were inserted, as though each element is inserted at the head of the map. |
Map |
✓ | ✓ | The base map, with both mutable and immutable implementations. |
SortedMap |
✓ | A base trait that stores its keys in sorted order. (Creating a variable as a SortedMap currently returns a TreeMap .) |
|
TreeMap |
✓ | An immutable, sorted map, implemented as a red-black tree. | |
WeakHashMap |
✓ | A hash map with weak references, it’s a wrapper around java.util.WeakHashMap. |
You can also create a thread-safe mutable map by mixing the SynchronizedMap
trait into the map implementation you want. See the map discussion in the “Scala Collections Overview” for more information.
Choosing a set
Choosing a set is similar to choosing a map. There are base mutable and immutable set classes, a SortedSet
to return elements in sorted order by key, a LinkedHashSet
to store elements in insertion order, and a few other sets for special purposes. The common classes are shown in Table 10-6. (Quotes in the descriptions come from the Scaladoc for each class.)
Table 10-6. Common set choices, including whether immutable or mutable versions are available
Immutable | Mutable | Description | |
---|---|---|---|
BitSet |
✓ | ✓ | A set of “non-negative integers represented as variable-size arrays of bits packed into 64-bit words.” Used to save memory when you have a set of integers. |
HashSet |
✓ | ✓ | The immutable version “implements sets using a hash trie”; the mutable version “implements sets using a hashtable.” |
LinkedHashSet |
✓ | A mutable set implemented using a hashtable. Returns elements in the order in which they were inserted. | |
ListSet |
✓ | A set implemented using a list structure. | |
TreeSet |
✓ | ✓ | The immutable version “implements immutable sets using a tree.” The mutable version is a mutable SortedSet with “an immutable AVL Tree as underlying data structure.” |
Set |
✓ | ✓ | Generic base traits, with both mutable and immutable implementations. |
SortedSet |
✓ | ✓ | A base trait. (Creating a variable as a SortedSet returns a TreeSet.) |
You can also create a thread-safe mutable set by mixing the SynchronizedSet
trait into the set implementation you want. See the “Scala Collections Overview” discussion of maps and sets for more information.
Types that act like collections
Scala offers many other collection types, and some types that act like collections.
Table 10-7 provides descriptions of several types that act somewhat like collections, even though they aren’t.
Table 10-7. Other collections classes (and types that act like collections)
Class | Description |
---|---|
Enumeration |
A finite collection of constant values (i.e., the days in a week or months in a year). |
Iterator |
An iterator isn’t a collection; instead, it gives you a way to access the elements in a collection. It does, however, define many of the methods you’ll see in a normal collection class, including foreach , map , flatMap , etc. You can also convert an iterator to a collection when needed. |
Option |
Acts as a collection that contains zero or one elements. The Some class and None object extend Option . |
Some , None |
Some is a container for one element, and None holds zero elements. |
Tuple |
Supports a heterogeneous collection of elements. There is no one “Tuple” class; tuples are implemented as case classes ranging from Tuple1 to Tuple22 , which support 1 to 22 elements. |
Transformer methods
To understand strict and lazy collections, it helps to first understand the concept of a transformer method:
A transformer method is a method that constructs a new collection from an existing collection, typically by applying an algorithm to the input collection.
This includes methods like map
, filter
, reverse
, etc. -- any method that transforms the input collection to a new output collection.
Strict and lazy collections
Given that definition, collections can also be thought of in terms of being strict or lazy. In a strict collection, memory for the elements is allocated immediately, and all of its elements are immediately evaluated when a transformer method is invoked. In a lazy collection, memory for the elements is not allocated immediately, and transformer methods do not construct new elements until they are demanded.
All of the collection classes except Stream
are strict, but the other collection classes can be converted to a lazy collection by creating a view on the collection. See Recipe 10.24, “Creating a Lazy View on a Collection”, for more information on this approach.
See Also
- In addition to my own experience using the collections, most of the information used to create these tables comes from the Scaladoc of each type, and the Scala Collections Overview documentation.
- Recipe 10.1, “Understanding the Collections Hierarchy”
- Recipe 10.4, “Understanding the Performance of Collections”
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |