Algebraic Data Type: “A type defined by providing several alternatives, each of which comes with its own constructor. It usually comes with a way to decompose the type through pattern matching. The concept is found in specification languages and functional programming languages. Algebraic data types can be emulated in Scala with case classes.”
Introduction
I debated for a long time about how to introduce Algebraic Data Types (ADTs) in this book. I finally decided that ADTs aren’t a way of proactively designing FP code; instead, they’re a way of categorizing the FP code you’ve already written, specifically your FP data models. That is, you don’t sit down and say, “I want to write my Customer class as a ‘sum of the products’ ADT.” Instead, if you do anything, you might look at some code and say, “Hey, Mary, did you notice that the Customer class is a ‘sum of the products’ ADT?”
Because I look at ADTs as a way of categorizing or classifying code rather than designing code, I decided it was best to include this topic as an appendix in this book, rather than as a lesson in the front of the book.
The “TL;DR” version of this lesson goes like this: If you create your data models using (a) case classes with immutable fields and (b) case objects, and (c) those data types have no methods, you’re already writing ADTs.
Surprise: Almost every
caseclass andcaseobject in this book is some form of ADT. I just didn’t think it was important to categorize them when I wrote them.
Goals, motivation
The goals of this lesson are:
- To define basic ADT terminology
- To show examples of the different ADT types
- To help demonstrate another way to see your Scala/FP code as algebra
In regards to that last point, a main benefit of being aware of ADTs is that you begin to see your code as being even more like algebra. You specifically begin to see your case classes and case objects as sets of objects, and the functions that operate on them as operators on those data sets. To understand what that means, it will help to formally define algebra.
What is “Algebra”?
To understand ADTs, you first have to understand what is meant by the word “algebra.” Informally, an algebra can be thought of as consisting of two things:
- A set of objects
- The operations that can be applied to those objects to create new objects
Technically an algebra also consists of a third item, the laws that govern the algebra, but I’m not going to cover laws in this lesson.
If you’re like me, you never thought of algebra as anything other than high school algebra, but it turns out that any concept that can be thought of as (a) a set of objects, and (b) operators on those objects is a form of algebra. Here are a few examples.
Numeric algebra
“High school algebra” is the algebra we learned back in high school (or possibly earlier). It’s more formally known as “numeric algebra,” the algebra of numbers. You can think of it like this:
- A set of objects, such as whole numbers
- The operations that can be used on those objects:
+,-,*(and/)
One thing I never thought about is that the operators are used on existing numbers (objects) to create new numbers (objects):
1 + 1 = 2
3 * 3 = 9
Notice that even in basic math, the numbers 2 and 9 are “created” from the numbers 1 and 3 by using the + and * operators.
Relational algebra
Another type of algebra is known as “relational algebra.” This is the algebra of relational databases, and in this algebra the database tables are the “set of objects,” and query commands like SELECT, UPDATE, and JOIN are the “operators” that let you create new objects from the existing objects.
Algebra in programming
Throughout this book you’ve been using algebra, possibly without knowing it. (I certainly didn’t know it when I started working with FP.) For example, take a look at this case class:
case class Pair (
a: Int,
b: Int
)
This code creates a new type Pair from two instances of the existing type Int. The class constructor itself is an “operator” that lets you create new types from existing Scala types, just like + and * let you create new numbers from existing numbers.
Here’s another example of how you can use Scala operators to create new data types from existing ones:
sealed trait Direction
case object North extends Direction
case object South extends Direction
case object East extends Direction
case object West extends Direction
To learn about the “algebra” of these two examples, read on ...
Three types of Algebraic Data Types
ADTs fall into three main categories:
- Sum type
- Product type
- Hybrid types
You actually just saw the first two types, and I’ll explain all three of them in the following sections.
The Sum type
The Direction example I just showed is called a “Sum type,” or Sum ADT. The Sum type is also referred to as an “enumerated type” because you simply enumerate all of the possible instances of the type. A few important points about this are:
- Sum types are typically created with a
sealed traitas the base type, with instances created ascaseobjects. You use a sealed trait because you don’t want them to be extended. - The number of enumerated types you list are the only possible instances of the base type. In this example,
Directionhas four possible values:North,South,East, orWest. - We use the phrases “is a” and “or” when talking about Sum types. For example,
Northis a type ofDirection, andDirectionis aNorthor aSouthor anEastor aWest.
People use different names for the concrete instances in a Sum type, including value constructors, alternates, and cases.
Another example
As another example, imagine that you need to write your own “boolean” type for Scala. You can write them as a Sum type like this:
sealed trait Bool
case object True extends Bool
case object False extends Bool
Just like the Direction example, the base type is defined as a sealed trait and the two possible values are defined as case object. Also notice that this approach uses the sealed trait and case object syntax as operators to create new data types.
Why use sealed trait?
A great feature of using sealed trait is that it lets the compiler perform “exhaustiveness checking.” What happens is that a sealed trait can only be extended in the file in which it was defined; because it can’t be extended anywhere else, the compiler knows all of the subtypes of the trait that can possibly exist. Because of this, the compiler can exhaustively check the possible cases in match expressions, and it will emit a warning if the match expression isn’t exhaustive. This makes your programming life easier, and your code safer.
Why use case object?
The reason Sum types use the case object declaration is that they only require singleton instances, and the Scala object provides that functionality. For instance, with the Bool example it makes sense to have only one True instance in all of your code. There’s no need to create new True and False instances every time you work with boolean values. Scala’s object gives you this singleton functionality.
As Scala/FP developers, we further use case object — as opposed to object — because it provides important additional functionality, with the most important feature being support for pattern matching; its automatically-generated unapply method lets case objects work easily in match expressions. (case object also provides default equals and hashCode methods, extends Serializable, has a good default toString method, etc.)
The Product type
The second type of ADT is known as a “Product type.” It’s name comes from the fact that you use the Scala case class constructor to create a data type whose number of possible concrete instances can be determined by multiplying the number of possibilities of all of its constructor fields.
Take this class for example:
case class DoubleBoo (
b1: Bool,
b2: Bool
)
How many possible instances of this class can you have? Well, each field can either be True or False, so the possibilities are:
DoubleBoo(True, True)
DoubleBoo(True, False)
DoubleBoo(False, True)
DoubleBoo(False, False)
Therefore, the correct answer is that there are four possible instances. You can also derive this answer mathematically:
b1has two possibilitiesb2has two possibilities- The total number of possible instances is determined by multiplying the number of possibilities of each constructor field, and
2multiplied by2is4
Because the number of possible instances of Product ADTs can be calculated by multiplying the number of possible values of every constructor parameter, what do you think the number of possibilities of this Pair type are:
case class Pair (
a: Int,
b: Int
)
If you answered, “A lot,” that’s close enough. An Int has 2^32 possible values, so if you multiply the number of possible Int values by itself, you get a very large number.
Next, what do you think the number of possibilities are for this class:
case class Person (
firstName: String,
lastName: String,
mother: Person,
father: Person
)
If you answered, “Infinite,” that’s a good answer. Because a String has an infinite number of possibilities, Person can have an infinite number of concrete instances.
While I don’t concern myself with ADTs too much, this particular point had a significant impact on me. When I first saw this, I realized that any time a function accepted a
String, thatStringhad an infinite number of possibilities. That’s a lot to account for. Similarly, a boolean value has two possibilities, aBytehas 256 possible values, and theDirectionSum type has four possibilities. The lesson for me is that the fewer possibilities you have to deal with, the simpler your code will be. (At the very least, this was the last time I ever used a series of string constants instead of enumerations.)
Before we move on, here are a few important points about the Product type:
- Writing
case classand defining the constructor parameters is essentially the “product” operator. - The number of possible values of a Product type is the product of all possible combinations of the constructor parameters (i.e., a Cartesian product).
- We use the phrases “has a” and “and” when talking about Product types.
Pairhas aaand ab;Personhas afirstName,lastName,mother, andfather. - As shown in the
Personexample, Product types can be recursive;motherandfatherare declared asPersontypes inside thePersondefinition.
Hybrid types
The Sum and Product types are the two base ADTs; all other ADTs are hybrids created from those base types. As the book Essential Scala states, “An algebraic data type is any data that uses the above two patterns” (Sum and Product).
One formally-defined hybrid type is known as the “Sum of Products” type. With a few minor changes to reflect modern Scala practices, Mario Gleichmann created a good example of this in 2011:
sealed trait Shape
final case class Circle(radius: Double) extends Shape
final case class Rectangle(width: Double, height: Double) extends Shape
These types represent a Sum type because Shape is a Circle or a Rectangle; Circle is a Product type because it has a radius; and Rectangle is also Product type because it has a width and a height.
There are other variations of these possibilities, which is why I refer to all other combinations as “hybrid” types. For instance, the Pizza class in the domain modeling lessons is a Product type that contains three Sum types:
case class Pizza (
crustSize: CrustSize,
crustType: CrustType,
toppings: Seq[Topping]
)
Sum and Product types can be combined in any ways that are needed to solve the problem at hand. Hopefully this demonstrates the point I made at the beginning of this lesson: ADTs are just a way of formally categorizing the data types in your data model.
Pattern matching
A great benefit of ADTs is that they simplify and encourage the use of pattern matching in your code. For instance, given these Shape types:
sealed trait Shape
final case class Circle(radius: Double) extends Shape
final case class Rectangle(width: Double, height: Double) extends Shape
you can easily write an isRound function using pattern matching:
def isRound(s: Shape): Boolean = s match {
case Circle(_) => true
case _ => false
}
Similarly, using the Bool type I created earlier:
sealed trait Bool
case object True extends Bool
case object False extends Bool
you can define and and or functions with pattern matching:
def and(a: Bool, b: Bool): Bool = (a,b) match {
case (True, True) => True
case (False, False) => True
case (True, False) => False
case (False, True) => False
}
def or(a: Bool, b: Bool): Bool = (a,b) match {
case (True, _) => True
case (_, True) => True
case (_, _) => False
}
This is what those last two functions look like in the Scala REPL:
scala> or(True,False)
res0: Bool = True
scala> and(True,False)
res1: Bool = False
As demonstrated in these examples, using pattern matching with ADTs is a common programming pattern ... a Scala/FP idiom.
Key points
The key points of this lesson are:
- If you create your data models using (a)
caseclasses with immutable fields and (b)caseobjects, and (c) those data types have no methods, you’re already writing ADTs (whether you knew it or not). - I view ADTs as a way of categorizing or observing code, not designing code.
- An “algebra” is a set of objects, the operators that can be used on those objects, and laws governing their behavior.
- The two main types of ADTs are the Sum type and Product type. Other hybrid ADTs are derived from these base types.
- ADTs encourage a pattern-matching style of programming.
See also
- If you don’t mind a little Haskell, Chris Taylor’s The Algebra of ADTs is an excellent resource, and discusses ADT “laws”
- Mario Gleichmann wrote a series of articles on ADTs in Scala, starting with this one
- Daniel Eklund wrote a good article, What the heck are Algebraic Data Types?
- Susan Potter has a nice example of ADTs in Scala
- Tim Perrett uses ADTs well in this “traffic signal” example
- ADTs are described here on Wikipedia
- ADTs are described here on Haskell.org
- Wikipedia has this definition of Universal Algebra
- Martin Odersky mentions ADTs in this acm.org article
- In underscore.io’s article on sealed traits they have a nice discussion of ADTs
- The best book I know on Scala and functional programming :)