Table of Contents
This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.19, “How to Split Scala Sequences into Subsets (groupBy
, partition
, etc.)”
Problem
You want to partition a Scala sequence into two or more different sequences (subsets) based on an algorithm or location you define.
Solution
Use the groupBy
, partition
, span
, or splitAt
methods to partition a sequence into subsequences. The sliding
and unzip
methods can also be used to split sequences into subsequences, though sliding
can generate many subsequences, and unzip
primarily works on a sequence of Tuple2
elements.
The groupBy
, partition
, and span
methods let you split a sequence into subsets according to a function, whereas splitAt
lets you split a collection into two sequences by providing an index number, as shown in these examples:
scala> val x = List(15, 10, 5, 8, 20, 12) x: List[Int] = List(15, 10, 5, 8, 20, 12) scala> val y = x.groupBy(_ > 10) y: Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12)) scala> val y = x.partition(_ > 10) y: (List[Int], List[Int]) = (List(15, 20, 12), List(10, 5, 8)) scala> val y = x.span(_ < 20) y: (List[Int], List[Int]) = (List(15, 10, 5, 8), List(20, 12)) scala> val y = x.splitAt(2) y: (List[Int], List[Int]) = (List(15, 10), List(5, 8, 20, 12))
The groupBy
method partitions the collection into a Map
of sub-collections based on your function. The true
map contains the elements for which your predicate returned true
, and the false
map contains the elements that returned false
.
The partition
, span
, and splitAt
methods create a Tuple2
of sequences that are of the same type as the original collection. The partition
method creates two lists, one containing values for which your predicate returned true
, and the other containing the elements that returned false
. The span
method returns a Tuple2
based on your predicate p
, consisting of “the longest prefix of this list whose elements all satisfy p
, and the rest of this list.” The splitAt
method splits the original list according to the element index value you supplied.
Handling the results
When a Tuple2
of sequences is returned, its two sequences can be accessed like this:
scala> val (a, b) = x.partition(_ > 10) a: List[Int] = List(15, 20, 12) b: List[Int] = List(10, 5, 8)
The sequences in the Map that groupBy
creates can be accessed like this:
scala> val groups = x.groupBy(_ > 10) groups: scala.collection.immutable.Map[Boolean,List[Int]] = Map(false -> List(10, 5, 8), true -> List(15, 20, 12)) scala> val trues = groups(true) trues: List[Int] = List(15, 20, 12) scala> val falses = groups(false) falses: List[Int] = List(10, 5, 8)
sliding
The sliding(size, step)
method is an interesting creature that can be used to break a sequence into many groups. It can be called with just a size, or both a size and step:
scala> val nums = (1 to 5).toArray nums: Array[Int] = Array(1, 2, 3, 4, 5) // size = 2 scala> nums.sliding(2).toList res0: List[Array[Int]] = List(Array(1, 2), Array(2, 3), Array(3, 4), Array(4, 5)) // size = 2, step = 2 scala> nums.sliding(2,2).toList res1: List[Array[Int]] = List(Array(1, 2), Array(3, 4), Array(5)) // size = 2, step = 3 scala> nums.sliding(2,3).toList res2: List[Array[Int]] = List(Array(1, 2), Array(4, 5))
As shown, sliding
works by passing a “sliding window” over the original sequence, returning sequences of a length given by size. The step
parameter lets you skip over elements, as shown in the last two examples. In my experience, the first two examples are the most useful, first with a default step size of 1
, and then when step
matches size
.
unzip
The unzip
method is also interesting. It can be used to take a sequence of Tuple2
values and create two resulting lists: one that contains the first element of each tuple, and another that contains the second element from each tuple:
scala> val listOfTuple2s = List((1,2), ('a', 'b')) listOfTuple2s: List[(AnyVal, AnyVal)] = List((1,2), (a,b)) scala> val x = listOfTuple2s.unzip x: (List[AnyVal], List[AnyVal]) = (List(1, a),List(2, b))
For instance, given a list of couples, you can unzip the list to create a list of women and a list of men:
scala> val couples = List(("Kim", "Al"), ("Julia", "Terry")) couples: List[(String, String)] = List((Kim,Al), (Julia,Terry)) scala> val (women, men) = couples.unzip women: List[String] = List(Kim, Julia) men: List[String] = List(Al, Terry)
As you might guess from its name, the unzip
method is the opposite of zip
:
scala> val women = List("Kim", "Julia") women: List[String] = List(Kim, Julia) scala> val men = List("Al", "Terry") men: List[String] = List(Al, Terry) scala> val couples = women zip men couples: List[(String, String)] = List((Kim,Al), (Julia,Terry))
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |