# How to split sequences into subsets in Scala (groupBy, partition, splitAt, span)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.19, “How to Split Scala Sequences into Subsets (`groupBy`, `partition`, etc.)”

## Problem

You want to partition a Scala sequence into two or more different sequences (subsets) based on an algorithm or location you define.

## Solution

Use the `groupBy`, `partition`, `span`, or `splitAt` methods to partition a sequence into subsequences. The `sliding` and `unzip` methods can also be used to split sequences into subsequences, though `sliding` can generate many subsequences, and `unzip` primarily works on a sequence of `Tuple2` elements.

The `groupBy`, `partition`, and `span` methods let you split a sequence into subsets according to a function, whereas `splitAt` lets you split a collection into two sequences by providing an index number, as shown in these examples:

```scala> val x = List(15, 10, 5, 8, 20, 12)
x: List[Int] = List(15, 10, 5, 8, 20, 12)

scala> val y = x.groupBy(_ > 10)
y: Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val y = x.partition(_ > 10)
y: (List[Int], List[Int]) = (List(15, 20, 12), List(10, 5, 8))

scala> val y = x.span(_ < 20)
y: (List[Int], List[Int]) = (List(15, 10, 5, 8), List(20, 12))

scala> val y = x.splitAt(2)
y: (List[Int], List[Int]) = (List(15, 10), List(5, 8, 20, 12))```

The `groupBy` method partitions the collection into a `Map` of sub-collections based on your function. The `true` map contains the elements for which your predicate returned `true`, and the `false` map contains the elements that returned `false`.

The `partition`, `span`, and `splitAt` methods create a `Tuple2` of sequences that are of the same type as the original collection. The `partition` method creates two lists, one containing values for which your predicate returned `true`, and the other containing the elements that returned `false`. The `span` method returns a `Tuple2` based on your predicate `p`, consisting of “the longest prefix of this list whose elements all satisfy `p`, and the rest of this list.” The `splitAt` method splits the original list according to the element index value you supplied.

## Handling the results

When a `Tuple2` of sequences is returned, its two sequences can be accessed like this:

```scala> val (a, b) = x.partition(_ > 10)
a: List[Int] = List(15, 20, 12)
b: List[Int] = List(10, 5, 8)```

The sequences in the Map that `groupBy` creates can be accessed like this:

```scala> val groups = x.groupBy(_ > 10)
groups: scala.collection.immutable.Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12))

scala> val trues = groups(true)
trues: List[Int] = List(15, 20, 12)

scala> val falses = groups(false)
falses: List[Int] = List(10, 5, 8)```

## sliding

The `sliding(size, step)` method is an interesting creature that can be used to break a sequence into many groups. It can be called with just a size, or both a size and step:

```scala> val nums = (1 to 5).toArray
nums: Array[Int] = Array(1, 2, 3, 4, 5)

// size = 2
scala> nums.sliding(2).toList
res0: List[Array[Int]] = List(Array(1, 2), Array(2, 3), Array(3, 4), Array(4, 5))

// size = 2, step = 2
scala> nums.sliding(2,2).toList
res1: List[Array[Int]] = List(Array(1, 2), Array(3, 4), Array(5))

// size = 2, step = 3
scala> nums.sliding(2,3).toList
res2: List[Array[Int]] = List(Array(1, 2), Array(4, 5))```

As shown, `sliding` works by passing a “sliding window” over the original sequence, returning sequences of a length given by size. The `step` parameter lets you skip over elements, as shown in the last two examples. In my experience, the first two examples are the most useful, first with a default step size of `1`, and then when `step` matches `size`.

## unzip

The `unzip` method is also interesting. It can be used to take a sequence of `Tuple2` values and create two resulting lists: one that contains the first element of each tuple, and another that contains the second element from each tuple:

```scala> val listOfTuple2s = List((1,2), ('a', 'b'))
listOfTuple2s: List[(AnyVal, AnyVal)] = List((1,2), (a,b))

scala> val x = listOfTuple2s.unzip
x: (List[AnyVal], List[AnyVal]) = (List(1, a),List(2, b))```

For instance, given a list of couples, you can unzip the list to create a list of women and a list of men:

```scala> val couples = List(("Kim", "Al"), ("Julia", "Terry"))
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))

scala> val (women, men) = couples.unzip
women: List[String] = List(Kim, Julia)
men: List[String] = List(Al, Terry)```

As you might guess from its name, the `unzip` method is the opposite of `zip`:

```scala> val women = List("Kim", "Julia")
women: List[String] = List(Kim, Julia)

scala> val men = List("Al", "Terry")
men: List[String] = List(Al, Terry)

scala> val couples = women zip men
couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))```

