How to split sequences into subsets in Scala (groupBy, partition, splitAt, span)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.19, “How to Split Scala Sequences into Subsets (groupBy, partition, etc.)”

Problem

You want to partition a Scala sequence into two or more different sequences (subsets) based on an algorithm or location you define.

Solution

Use the groupBy, partition, span, or splitAt methods to partition a sequence into subsequences. The sliding and unzip methods can also be used to split sequences into subsequences, though sliding can generate many subsequences, and unzip primarily works on a sequence of Tuple2 elements.

The groupBy, partition, and span methods let you split a sequence into subsets according to a function, whereas splitAt lets you split a collection into two sequences by providing an index number, as shown in these examples:

scala> val x = List(15, 10, 5, 8, 20, 12)
x: List[Int] = List(15, 10, 5, 8, 20, 12)

scala> val y = x.groupBy(_ > 10)
y: Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val y = x.partition(_ > 10)
y: (List[Int], List[Int]) = (List(15, 20, 12), List(10, 5, 8))

scala> val y = x.span(_ < 20)
y: (List[Int], List[Int]) = (List(15, 10, 5, 8), List(20, 12))

scala> val y = x.splitAt(2)
y: (List[Int], List[Int]) = (List(15, 10), List(5, 8, 20, 12))

The groupBy method partitions the collection into a Map of sub-collections based on your function. The true map contains the elements for which your predicate returned true, and the false map contains the elements that returned false.

The partition, span, and splitAt methods create a Tuple2 of sequences that are of the same type as the original collection. The partition method creates two lists, one containing values for which your predicate returned true, and the other containing the elements that returned false. The span method returns a Tuple2 based on your predicate p, consisting of “the longest prefix of this list whose elements all satisfy p, and the rest of this list.” The splitAt method splits the original list according to the element index value you supplied.

Handling the results

When a Tuple2 of sequences is returned, its two sequences can be accessed like this:

scala> val (a, b) = x.partition(_ > 10)
a: List[Int] = List(15, 20, 12)
b: List[Int] = List(10, 5, 8)

The sequences in the Map that groupBy creates can be accessed like this:

scala> val groups = x.groupBy(_ > 10)
groups: scala.collection.immutable.Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val trues = groups(true)
trues: List[Int] = List(15, 20, 12)

scala> val falses = groups(false)
falses: List[Int] = List(10, 5, 8)

sliding

The sliding(size, step) method is an interesting creature that can be used to break a sequence into many groups. It can be called with just a size, or both a size and step:

scala> val nums = (1 to 5).toArray
nums: Array[Int] = Array(1, 2, 3, 4, 5)

// size = 2
scala> nums.sliding(2).toList
res0: List[Array[Int]] = List(Array(1, 2), Array(2, 3), Array(3, 4), Array(4, 5))

// size = 2, step = 2
scala> nums.sliding(2,2).toList
res1: List[Array[Int]] = List(Array(1, 2), Array(3, 4), Array(5))

// size = 2, step = 3
scala> nums.sliding(2,3).toList
res2: List[Array[Int]] = List(Array(1, 2), Array(4, 5))

As shown, sliding works by passing a “sliding window” over the original sequence, returning sequences of a length given by size. The step parameter lets you skip over elements, as shown in the last two examples. In my experience, the first two examples are the most useful, first with a default step size of 1, and then when step matches size.

unzip

The unzip method is also interesting. It can be used to take a sequence of Tuple2 values and create two resulting lists: one that contains the first element of each tuple, and another that contains the second element from each tuple:

scala> val listOfTuple2s = List((1,2), ('a', 'b'))
listOfTuple2s: List[(AnyVal, AnyVal)] = List((1,2), (a,b))

scala> val x = listOfTuple2s.unzip
x: (List[AnyVal], List[AnyVal]) = (List(1, a),List(2, b))

For instance, given a list of couples, you can unzip the list to create a list of women and a list of men:

scala> val couples = List(("Kim", "Al"), ("Julia", "Terry"))
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))

scala> val (women, men) = couples.unzip
women: List[String] = List(Kim, Julia)
men: List[String] = List(Al, Terry)

As you might guess from its name, the unzip method is the opposite of zip:

scala> val women = List("Kim", "Julia")
women: List[String] = List(Kim, Julia)

scala> val men = List("Al", "Terry")
men: List[String] = List(Al, Terry)

scala> val couples = women zip men
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))