A Quick Look at Parallel Programming (Scala 3 Video)
Introduction
A first reason that algebraic substitution is important is because, as I mentioned, code written like this is easier to write, read, and test. That’s a simple sentence to write, but in practice, this is an enormous gain that makes your life simpler. (I could write several chapters about this alone.)
A second reason it’s important is because you can use algebraic substitution to write your code in any of the ways I just showed. The second approach I showed, where your code looks like this:
doE(
doD(
doC(
doB(
doA(someData)
)
)
)
)
can also be formatted to look like this:
doE(doD(doC(doB(doA(someData)))))
With either approach, this is interesting because it’s like writing Lisp code, and you read it from the inside-out. (In small doses I like this style.)
But there’s an even more important point about EOP and algebraic substitution:
When your expressions (equations) don’t have to be evaluated in a specific order, the compiler is completely free to evaluate them in parallel, and then join them into one result some time later. I only worked with Haskell for a few weeks, but I’ve read that this is how the Haskell compiler works, and it can do that because Haskell code is guaranteed to be FP code.
Enabling parallel programming
To demonstrate what I mean, here’s an example, where again you can assume that all functions are pure, and the variables and data types are immutable:
val a = f1(x)
val b = f2(y)
val c = f3(z)
val d = f(a, b, c)
Because the first three expressions have no dependencies on each other, a smart compiler is perfectly free to run those functions on different threads (or, more recently, fibers). So if they each take 20 seconds to run, you don’t have to wait 60 seconds for them to run in serial; they can be run in parallel, and you get the final result in 20 seconds.
Visually, that means that rather than having this slow serial code:
// code runs in serial, 20 seconds each, 60 seconds total
val a = f1(x)
─────────────>
│ val b = f2(y)
│ ─────────────>
│ val c = f3(z)
│ ─────────────>
│
│ val d = f(a, b, c)
you can have this high-performance code that runs in parallel:
// code runs in parallel, 20 seconds total
val a = f1(x)
─────────────> │
val b = f2(y) │
─────────────>
│
val c = f3(z)
─────────────> │
│ val d = f(a, b, c)
When you want high-performance code — and who doesn’t — this is a huge win. And all you have to do to achieve these gains are to use:
- Immutable variables
- Immutable data structures
- Pure functions
- Expressions (and not statements)
This doesn’t work with mutable things
It’s important to be clear that a compiler can’t do this for you when you use mutable variables, mutable data structures, and impure functions.
Perhaps the simplest way to demonstrate this is to imagine that you have these three lines of code that consist of two statements and a function with a side effect:
println("What’s your name?") // statement
val name = readLine() // side effect
println(s"You said your name is $name") // statement
They make sense when run serially, but if you try to run them in parallel, you can imagine the mess you can end up with:
println(s"You said your name is $name")
val name = readLine()
println("What’s your name?")
As another example, assume that you have three functions named g1
, g2
, and g3
, and they each modify some shared global state. If g1
writes a variable in that state that g2
reads, and g2
writes a variable that g3
reads, those operations really need to happen in that order — just like the println
/readLine
example.
But if g1
, g1
, and g3
are run in parallel ... well, there’s no guarantee about the order in which things will happen. We say that the result you get from this situation is indeterminate.
One time I was working with a client, and they had a problem where they’d occasionally have inconsistent customer data. They might have a user’s name and street address correct, but somehow the city, state, or zip code wouldn’t match up with the name and street address. We found that for some reason, one or more developers wrote this portion of code to try to work in parallel. Apparently 99 times out of 100 it would work properly — perhaps because clients don’t change their addresses often — but when it failed, customers might find out that they lived in “Chicago, Alaska” instead of “Chicago, Illinois.”
This can’t happen with immutable variables, immutable data structures, and pure functions.
Reasoning about code
You’ll sometimes hear FPers say that they like to “reason about their code.” Of course we do that in OOP as well, but when FPers say this, it means that we’re looking at lines of code like algebra, and we’re making sure they can be joined the way we want them to be.
If they can be run in parallel, cool, that’s awesome. But if they can’t, you’ll see that we ensure that they’re run in serial, typically with a for
expression:
val result =
for
a <- g1(x)
b <- g2(y)
c <- g3(z)
yield
// some use of a, b, and c here
g4(a, b, c)
That’s a bit of a contrived example, but I demonstrate real uses of this technique later in this book.
Update: All of my new videos are now on
LearnScala.dev