Writing Scala code like this gets to be so much like algebra that you can imagine walking in on Einstein writing at a chalkboard, and following his equations one at a time until he joins them all together at the end.
For example, let’s walk through the process of writing a “word occurrence” algorithm using only pure functions. I think you’ll see that we can simply talk about this as we sketch out a series of function signatures, just like writing algebra on a chalkboard.
This example was inspired by a Haskell book. I’d like to give it credit, but at the moment I can’t remember which Haskell book I saw it in.
Problem statement
The first thing to do is to write a problem statement:
Given a document that’s represented as a
String
, count the number of occurrences of each word in that document.
The output of our algorithm should look like this:
the 543
an 210
or 99
...
...
So, how do we start?
Getting started
Well, the first thing we can say is that it looks like that output should be stored in a map, specifically a map where the key is a string, and the value is an integer:
Map[String, Int]
Note that the output is printed in sorted order, so depending on the details of the requirements, you may also want to return a LinkedHashMap
or VectorMap
indicate to others that there’s something unique about this map:
LinkedHashMap[String, Int]
In fact, because of Scala’s type system, it’s easy to create your own type to express exactly what’s going on:
DudeThisIsAMapSortedByValue[String, Int]
For the purposes of this example I’ll just leave the type as a Map
, but you get the idea.
Getting back to our problem, we also know that the input our word occurrence algorithm is given is a string, so we can further state the type signature for our word count function should look like this:
def wordOccurrences(s: String): Map[String, Int]
So there we have it, that’s the signature for our main function. Now all we have to do is sketch out some more pure functions that this function will use.
From a document to words
The next thing I would think about is that I have this document represented by a string, and I need to convert it into a series of words. So now I’m thinking about a document-to-words function:
def convertDocumentToWords ...
Well, “a series of words” is just a sequence of strings, so I can sketch my function like this:
def convertDocumentToWords(): Seq[String] = ???
And what I’m calling a “document” as input is really just a string, so now I have this:
def convertDocumentToWords(s: String): Seq[String] = ???
A lot of programmers like to use aliases to make their code more readable, so using Scala 3 opaque types we can write this:
opaque type Document = String
opaque type Word = String
and now this:
def convertDocumentToWords(doc: Document): Seq[Word] = ???
Cool. It looks like we’re done with that function.
What do we need next? Well, now we have a list of all of the words in the document, so now we need to convert that list into a Map[Word, Int]
, i.e., a map of (a) each word that was found in the document to (b) the number of occurrences of that word.
Counting word occurrences
Following our previous thought process, let’s sketch another pure function. Right now we have a series of words from this function as our input:
def documentToWords(doc: Document): Seq[Word] =
So now we need to convert that Seq[Word]
into something else:
def countWordOccurrences(words: Seq[Word]) ...
I stated above that what we want is a Map[Word, Int]
, so now we can completely sketch this function:
def countWordOccurrences(words: Seq[Word]): Map[Word, Int]
Given a series of words, this function returns a map of each word that was found, and how many times it was found.
Hmmm, I think we’re done ... all we have to do is glue our pure functions together.
Putting it all together
At this point all we have to do is glue our pure functions together. Assuming that I use our opaque types, the main function signature looks like this:
def wc(d: Document): Map[Word, Int] = ???
Then the first then we do is convert the document into a list of words:
def wc(d: Document): Map[Word, Int] =
val listOfWords = documentToWords(d)
...
After that we convert the list of words into a map of word occurrences:
def wc(d: Document): Map[Word, Int] =
val listOfWords = documentToWords(d)
val wcMap = countWordOccurrences(listOfWords)
Then, ignoring sorting, we return that map:
def wc(d: Document): Map[Word, Int] =
val listOfWords = documentToWords(d)
val wcMap = countWordOccurrences(listOfWords)
wcMap
If you prefer, you can shorten that code to this:
def wc(d: Document): Map[Word, Int] =
val listOfWords = documentToWords(d)
countWordOccurrences(listOfWords)
and then this:
def wc(d: Document): Map[Word, Int] =
countWordOccurrences(documentToWords(d))
I often find that when I write my code with pure functions it ends up looking a little like Lisp, and that’s true again here.
That’s pretty cool. We almost completely solved this problem by (a) sketching out a series of pure function signatures, and then (b) combining those signatures like a series of algebraic equations. All that’s left to do is write some code inside those function bodies.
For the detail-oriented
You may have noticed that I skipped a few functions to get to that solution. For instance, documentToWords
is going to need a few helper functions:
- Convert a document into paragraphs
- Convert a paragraph into words
If you follow the same thought process that I used above, you’ll end up sketching a couple of pure function signatures like this:
splitDocumentIntoParagraphs(d: Document): Seq[Paragraph]
splitParagraphIntoWords(p: Paragraph): Seq[Word]
You’re a translator
One thing to note in all of this is that you, the programmer, are a translator. When someone says “a series of words”, you think:
Seq[String]
or
Seq[Word]
And when they say they need “a list of words and the number of times they occur,” you think:
Map[String, Int]
or
Map[Word, Count]
That may seem obvious, but sometimes the obvious bears pointing out. And interestingly, we’re already working with more complicated types like Map[Word, Count]
, and we’re creating aliases to make our types more meaningful.
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |
Let’s read that document from a file
So far our solution looks like this:
def wordOccurrences(s: String): Map[Word, Count]
or this, if you prefer:
def wordOccurrences(s: String): DudeThisIsAMapSortedByValue[Word, Count]
What happens now if our requirement changes and we’re told:
You don’t get a string for input, you have to read the document from a file.
Well, right away that changes the input to our pure function:
def wordOccurrences(f: File) ...
What about our result?
Well, we know that we still want a Map[Word, Count]
as a result, but when we’re dealing with a file, things can go wrong. For instance, the file might not exist. That might make you want to write this:
@throws FileNotFoundException
def wordOccurrences(f: File) ...
Don’t do that!
Oops, sorry, I’ll calm down. :)
When you’re using the standard tools that are built into Scala and you’re dealing with external resources like files, databases, and network I/O, you want to return a Try
type:
def wordOccurrences(f: File): Try[?]
You might prefer Option
or Either
, but you get the idea, you want to return an error-handling type to account for the possibility that the file doesn’t exist, you don’t have permission to read it, or other errors that might occur.
So this leads to the question, what goes inside that Try
declaration?
The answer is that the function’s answer goes inside Try
. Because the function returns a Map[Word, Count]
when it succeeds, the function’s return type now looks like this:
def wordOccurrences(f: File): Try[Map[Word, Count]]
So, when you’re told that you have to read a the document from a file, that’s the new solution. If you prefer Either
, the solution looks like this:
def wordOccurrences(f: File): Either[Throwable, Map[Word, Count]]
This means that if everything goes fine, you’ll get a Map[Word, Count]
in the Either
, but if something fails, you’ll get a Throwable
.