How to extract unique elements from a Scala sequence

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 10.21, “How to Extract Unique Elements from a Scala Sequence”

Problem

You have a sequential collection that contains duplicate elements, and you want to remove the duplicates.

Solution

Call the distinct method on the collection:

scala> val x = Vector(1, 1, 2, 3, 3, 4)
x: scala.collection.immutable.Vector[Int] = Vector(1, 1, 2, 3, 3, 4)

scala> val y = x.distinct
y: scala.collection.immutable.Vector[Int] = Vector(1, 2, 3, 4)

The distinct method returns a new collection with the duplicate values removed. Remember to assign the result to a new variable. This is required for both immutable and mutable collections.

If you happen to need a Set as a result, converting the collection to a Set is another way to remove the duplicate elements:

scala> val s = x.toSet
s: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)

By definition a Set can only contain unique elements, so converting an Array, List, Vector, or other sequence to a Set removes the duplicates. In fact, this is how distinct works. The source code for the distinct method in GenSeqLike shows that it uses an instance of mutable.HashSet.

Using distinct with your own classes

To use distinct with your own class, you’ll need to implement the equals and hashCode methods. For example, the following class will work with distinct because it implements those methods:

class Person(firstName: String, lastName: String) {
    override def toString = s"$firstName $lastName"
    def canEqual(a: Any) = a.isInstanceOf[Person]
    override def equals(that: Any): Boolean = that match {
            case that: Person => that.canEqual(this) && this.hashCode == that.hashCode
            case _ => false
    }
    override def hashCode: Int = {
        val prime = 31
        var result = 1
        result = prime * result + lastName.hashCode;
        result = prime * result + (if (firstName == null) 0 else firstName.hashCode)
        result
    }
}

object Person {
    def apply(firstName: String, lastName: String) = new Person(firstName, lastName)
}

You can demonstrate that this class works with distinct by placing the following code in the REPL:

val dale1 = new Person("Dale", "Cooper")
val dale2 = new Person("Dale", "Cooper")
val ed = new Person("Ed", "Hurley")
val list = List(dale1, dale2, ed)
val uniques = list.distinct

The last two lines look like this in the REPL:

scala> val list = List(dale1, dale2, ed)
list: List[Person] = List(Dale Cooper, Dale Cooper, Ed Hurley)

scala> val uniquePeople = list.distinct
uniquePeople: List[Person] = List(Dale Cooper, Ed Hurley)

If you remove either the equals method or hashCode method, you’ll see that distinct won’t work as desired.

See Also

  • You can find the source code for the SeqLike trait (and its distinct method) by following the Source link on its Scaladoc page.