This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.21, “How to Extract Unique Elements from a Scala Sequence”
Problem
You have a sequential collection that contains duplicate elements, and you want to remove the duplicates.
Solution
Call the distinct
method on the collection:
scala> val x = Vector(1, 1, 2, 3, 3, 4) x: scala.collection.immutable.Vector[Int] = Vector(1, 1, 2, 3, 3, 4) scala> val y = x.distinct y: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4)
The distinct
method returns a new collection with the duplicate values removed. Remember to assign the result to a new variable. This is required for both immutable and mutable collections.
If you happen to need a Set
as a result, converting the collection to a Set
is another way to remove the duplicate elements:
scala> val s = x.toSet s: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
By definition a Set
can only contain unique elements, so converting an Array
, List
, Vector
, or other sequence to a Set
removes the duplicates. In fact, this is how distinct
works. The source code for the distinct
method in GenSeqLike
shows that it uses an instance of mutable.HashSet.
Using distinct with your own classes
To use distinct
with your own class, you’ll need to implement the equals
and hashCode
methods. For example, the following class will work with distinct
because it implements those methods:
class Person(firstName: String, lastName: String) { override def toString = s"$firstName $lastName" def canEqual(a: Any) = a.isInstanceOf[Person] override def equals(that: Any): Boolean = that match { case that: Person => that.canEqual(this) && this.hashCode == that.hashCode case _ => false } override def hashCode: Int = { val prime = 31 var result = 1 result = prime * result + lastName.hashCode; result = prime * result + (if (firstName == null) 0 else firstName.hashCode) result } } object Person { def apply(firstName: String, lastName: String) = new Person(firstName, lastName) }
You can demonstrate that this class works with distinct
by placing the following code in the REPL:
val dale1 = new Person("Dale", "Cooper") val dale2 = new Person("Dale", "Cooper") val ed = new Person("Ed", "Hurley") val list = List(dale1, dale2, ed) val uniques = list.distinct
The last two lines look like this in the REPL:
scala> val list = List(dale1, dale2, ed) list: List[Person] = List(Dale Cooper, Dale Cooper, Ed Hurley) scala> val uniquePeople = list.distinct uniquePeople: List[Person] = List(Dale Cooper, Ed Hurley)
If you remove either the equals
method or hashCode
method, you’ll see that distinct
won’t work as desired.
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |
See Also
- You can find the source code for the
SeqLike
trait (and itsdistinct
method) by following the Source link on its Scaladoc page.