How to split sequences into subsets in Scala (groupBy, partition, splitAt, span)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.19, “How to Split Scala Sequences into Subsets (groupBy, partition, etc.)”

Back to top

Problem

You want to partition a Scala sequence into two or more different sequences (subsets) based on an algorithm or location you define.

Back to top

Solution

Use the groupBy, partition, span, or splitAt methods to partition a sequence into subsequences. The sliding and unzip methods can also be used to split sequences into subsequences, though sliding can generate many subsequences, and unzip primarily works on a sequence of Tuple2 elements.

The groupBy, partition, and span methods let you split a sequence into subsets according to a function, whereas splitAt lets you split a collection into two sequences by providing an index number, as shown in these examples:

scala> val x = List(15, 10, 5, 8, 20, 12)
x: List[Int] = List(15, 10, 5, 8, 20, 12)

scala> val y = x.groupBy(_ > 10)
y: Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val y = x.partition(_ > 10)
y: (List[Int], List[Int]) = (List(15, 20, 12), List(10, 5, 8))

scala> val y = x.span(_ < 20)
y: (List[Int], List[Int]) = (List(15, 10, 5, 8), List(20, 12))

scala> val y = x.splitAt(2)
y: (List[Int], List[Int]) = (List(15, 10), List(5, 8, 20, 12))

The groupBy method partitions the collection into a Map of sub-collections based on your function. The true map contains the elements for which your predicate returned true, and the false map contains the elements that returned false.

The partition, span, and splitAt methods create a Tuple2 of sequences that are of the same type as the original collection. The partition method creates two lists, one containing values for which your predicate returned true, and the other containing the elements that returned false. The span method returns a Tuple2 based on your predicate p, consisting of “the longest prefix of this list whose elements all satisfy p, and the rest of this list.” The splitAt method splits the original list according to the element index value you supplied.

Back to top

Handling the results

When a Tuple2 of sequences is returned, its two sequences can be accessed like this:

scala> val (a, b) = x.partition(_ > 10)
a: List[Int] = List(15, 20, 12)
b: List[Int] = List(10, 5, 8)

The sequences in the Map that groupBy creates can be accessed like this:

scala> val groups = x.groupBy(_ > 10)
groups: scala.collection.immutable.Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val trues = groups(true)
trues: List[Int] = List(15, 20, 12)

scala> val falses = groups(false)
falses: List[Int] = List(10, 5, 8)
Back to top

sliding

The sliding(size, step) method is an interesting creature that can be used to break a sequence into many groups. It can be called with just a size, or both a size and step:

scala> val nums = (1 to 5).toArray
nums: Array[Int] = Array(1, 2, 3, 4, 5)

// size = 2
scala> nums.sliding(2).toList
res0: List[Array[Int]] = List(Array(1, 2), Array(2, 3), Array(3, 4), Array(4, 5))

// size = 2, step = 2
scala> nums.sliding(2,2).toList
res1: List[Array[Int]] = List(Array(1, 2), Array(3, 4), Array(5))

// size = 2, step = 3
scala> nums.sliding(2,3).toList
res2: List[Array[Int]] = List(Array(1, 2), Array(4, 5))

As shown, sliding works by passing a “sliding window” over the original sequence, returning sequences of a length given by size. The step parameter lets you skip over elements, as shown in the last two examples. In my experience, the first two examples are the most useful, first with a default step size of 1, and then when step matches size.

Back to top

unzip

The unzip method is also interesting. It can be used to take a sequence of Tuple2 values and create two resulting lists: one that contains the first element of each tuple, and another that contains the second element from each tuple:

scala> val listOfTuple2s = List((1,2), ('a', 'b'))
listOfTuple2s: List[(AnyVal, AnyVal)] = List((1,2), (a,b))

scala> val x = listOfTuple2s.unzip
x: (List[AnyVal], List[AnyVal]) = (List(1, a),List(2, b))

For instance, given a list of couples, you can unzip the list to create a list of women and a list of men:

scala> val couples = List(("Kim", "Al"), ("Julia", "Terry"))
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))

scala> val (women, men) = couples.unzip
women: List[String] = List(Kim, Julia)
men: List[String] = List(Al, Terry)

As you might guess from its name, the unzip method is the opposite of zip:

scala> val women = List("Kim", "Julia")
women: List[String] = List(Kim, Julia)

scala> val men = List("Al", "Terry")
men: List[String] = List(Al, Terry)

scala> val couples = women zip men
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))
Back to top

See Also

  • See the Scaladoc for any sequence (List, Array, etc.) for more methods.
Back to top

The Scala Cookbook

This tutorial is sponsored by the Scala Cookbook, which I wrote for O’Reilly:

You can find the Scala Cookbook at these locations:

Back to top

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.