This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 20.1, “Scala best practice: Create methods that have no side effects (pure functions).”
Problem
In keeping with the best practices of Functional Programming (FP), you want to write “pure functions.”
Solution
In general, when writing a function (or method), your goal should be to write it as a pure function. This raises the question, “What is a pure function?” Before we tackle that question we need to look at another term, referential transparency, because it’s part of the description of a pure function.
Referential transparency
If you like algebra, you’ll like referential transparency. An expression is referentially transparent (RT) if it can be replaced by its resulting value without changing the behavior of the program. This must be true regardless of where the expression is used in the program.
For instance, assume that x
and y
are immutable variables within some scope of an application, and within that scope they’re used to form this expression:
x + y
You can assign this expression to a third variable, like this:
val z = x + y
Now, throughout the given scope of your program, anywhere the expression x + y
is used, it can be replaced by z
without affecting the result of the program.
Note that although I stated that x
and y
are immutable variables, they can also be the result of RT functions. For instance, "hello".length + "world".length
will always be 10. This result could be assigned to z
, and then z
could be used everywhere instead of this expression.
Although this is a simple example, this is referential transparency in a nutshell.
Pure functions
Wikipedia defines a pure function as follows:
- The function always evaluates to the same result value given the same argument value(s). It cannot depend on any hidden state or value, and it cannot depend on any I/O.
- Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
The book Functional Programming in Scala by Chiusano and Bjarnason (Manning Publications), states this a little more precisely:
“A function
f
is pure if the expressionf(x)
is referentially transparent for all referentially transparent valuesx
.”
To summarize, a pure function is referentially transparent and has no side effects.
Regarding side effects, the authors of the book, Programming in Scala, make a great observation:
“A telltale sign of a function with side effects is that its result type is
Unit
.”
From these definitions, we can make these statements about pure functions:
- A pure function is given one or more input parameters.
- Its result is based solely off of those parameters and its algorithm. The algorithm will not be based on any hidden state in the class or object it’s contained in.
- It won’t mutate the parameters it’s given.
- It won’t mutate the state of its class or object.
- It doesn’t perform any I/O operations, such as reading from disk, writing to disk, prompting for input, or reading input.
These are some examples of pure functions:
- Mathematical functions, such as addition, subtraction, multiplication.
- Methods like
split
andlength
on theString
class. - The
to*
methods on theString
class (toInt
,toDouble
, etc.) - Methods on immutable collections, including
map
,drop
,take
,filter
, etc. - The functions that extract values from an HTML string in Recipe 20.3.
The following functions are not pure functions:
- Methods like
getDayOfWeek
,getHour
, orgetMinute
. They return a different value depending on when they are called. - A
getRandomNumber
function. - A function that reads user input or prints output.
- A function that writes to an external data store, or reads from a data store.
If you’re coming to Scala from a pure OOP background, it can be difficult to write pure functions. Speaking for myself, historically my code has followed the OOP paradigm of encapsulating data and behavior in classes, and as a result, my methods often mutated the internal state of objects.
At this point you may be wondering how you can get anything done in a program consisting only of pure functions. If you can’t read input from a user or database, and can’t write output, how will your application ever work?
The best advice I can share about FP is to follow the 80/20 rule: write 80% of your program using pure functions (the “cake”), then create a 20% layer of other code on top of the functional base (the “icing”) to handle the user interface, printing, database interactions, and other methods that have “side effects”.
I read about this cake/icing philosophy somewhere, but haven’t been able to find the original reference.
Obviously any interesting application will have I/O, and this balanced approach lets you have the best of both worlds.
The Java/OOP approach
To look at how to write pure functions, you’ll convert the methods in an OOP class into pure functions. The following code shows how you might create a Stock
class that follows the Java/OOP paradigm. The following class intentionally has a few flaws. It not only has the ability to store information about a Stock
, but it can also access the Internet to get the current stock price, and further maintains a list of historical prices for the stock:
// a poorly written class class Stock (var symbol: String, var company: String, var price: BigDecimal, var volume: Long) { var html: String = _ def buildUrl(stockSymbol: String): String = { ... } def getUrlContent(url: String):String = { ... } def setPriceFromHtml(html: String) { this.price = ... } def setVolumeFromHtml(html: String) { this.volume = ... } def setHighFromHtml(html: String) { this.high = ... } def setLowFromHtml(html: String) { this.low = ... } // some dao-like functionality private val _history: ArrayBuffer[Stock] = { ... } val getHistory = _history }
Beyond attempting to do too many things, from an FP perspective, it has these other problems:
- All of its fields are mutable
- All of the
set
methods mutate the class fields - The
getHistory
method returns a mutable data structure
The getHistory
method is easily fixed by only sharing an immutable data structure, but this class has deeper problems. Let’s fix them.
Fixing the problems
The first fix is to separate two concepts that are buried in the class. First, there should be a concept of a Stock
, where a Stock
consists only of a symbol
and company
name. You can make this a case
class:
case class Stock(symbol: String, company: String)
Examples of this are Stock("AAPL", "Apple")
and Stock("GOOG", "Google")
.
Second, at any moment in time there is information related to a stock’s performance on the stock market. You can call this data structure a StockInstance
, and also define it as a case
class:
case class StockInstance( symbol: String, datetime: String, price: BigDecimal, volume: Long)
A StockInstance
example looks like this:
StockInstance("AAPL", "Nov. 2, 2012 5:00pm", 576.80, 20431707)
Going back to the original class, the getUrlContent
method isn’t specific to a stock, and should be moved to a different object, such as a general-purpose NetworkUtils
object:
object NetworkUtils { def getUrlContent(url: String): String = { ... } }
This method takes a URL as a parameter and returns the HTML content from that URL.
Similarly, the ability to build a URL from a stock symbol should be moved to an object. Because this behavior is specific to a stock, you’ll put it in an object named StockUtils
:
object StockUtils { def buildUrl(stockSymbol: String): String = { ... } }
The ability to extract the stock price from the HTML can also be written as a pure function and should be moved into the same object:
object StockUtils { def buildUrl(stockSymbol: String): String = { ... } def getPrice(html: String): String = { ... } }
In fact, all of the methods named set*
in the previous class should be get*
methods in StockUtils
:
object StockUtils { def buildUrl(stockSymbol: String): String = { ... } def getPrice(symbol: String, html: String): String = { ... } def getVolume(symbol: String, html: String): String = { ... } def getHigh(symbol: String, html: String): String = { ... } def getLow(symbol: String, html: String): String = { ... } }
The methods getPrice
, getVolume
, getHigh
, and getLow
are all pure functions: given the same HTML string and stock symbol, they will always return the same values, and they don’t have side effects.
Following this thought process, the date and time methods are moved to a DateUtils
object:
object DateUtils { def currentDate: String = { ... } def currentTime: String = { ... } }
With this new design, you create an instance of a Stock
for the current date and time as a simple series of expressions. First, retrieve the HTML that describes the stock from a web page:
val stock = new Stock("AAPL", "Apple") val url = StockUtils.buildUrl(stock.symbol) val html = NetUtils.getUrlContent(url)
Once you have the HTML, extract the desired stock information, get the date, and create the Stock
instance:
val price = StockUtils.getPrice(html) val volume = StockUtils.getVolume(html) val high = StockUtils.getHigh(html) val low = StockUtils.getLow(html) val date = DateUtils.currentDate val stockInstance = StockInstance(symbol, date, price, volume, high, low)
Notice that all of the variables are immutable, and each line is an expression.
The code is simple, so you can eliminate all the intermediate variables, if desired:
val html = NetUtils.getUrlContent(url) val stockInstance = StockInstance( symbol, DateUtils.currentDate, StockUtils.getPrice(html), StockUtils.getVolume(html), StockUtils.getHigh(html), StockUtils.getLow(html))
As mentioned earlier, the methods getPrice
, getVolume
, getHigh
, and getLow
are all pure functions. But what about methods like getDate
? It’s not a pure function, but the fact is, you need the date and time to solve the problem. This is part of what’s meant by having a healthy, balanced attitude about pure functions.
As a final note about this example, there’s no need for the Stock
class to maintain a mutable list of stock instances. Assuming that the stock information is stored in a database, you can create a StockDao
to retrieve the data:
object StockDao { def getStockInstances(symbol: String): Vector[StockInstance] = { ... } // other code ... }
Though getStockInstances
isn’t a pure function, the Vector
class is immutable, so you can feel free to share it without worrying that it might be modified somewhere else in your application.
Although I use the prefix get
in many of those method names, it’s not at all necessary to follow a JavaBeans-like naming convention. In fact, in part because you write “setter” methods in Scala without beginning their names with set
, and also to follow the Uniform Access Principle, many Scala APIs don’t use get
or set
at all.
For example, think of case
classes. The accessors and mutators they generate don’t use get
or set
:
case class Person(name: String) val p = Person("Mark") p.name // accessor p.name = "Bubba" // mutator
That being said, although it’s best to follow the Scala standards, use whatever method names best fit your API.
Discussion
A benefit of this coding style is that pure functions are easier to test. For instance, attempting to test the set*
methods in the original code is harder than it needs to be. For each field (price
, volume
, high
, and low
), you have to follow these steps:
- Set the
html
field in the object. - Call the current
set
method, such assetPriceFromHtml
. - Internally, this method reads the private
html
class field. - When the method runs, it mutates a field in the class (
price
). - You have to “get” that field to verify that it was changed.
- In more complicated classes, it’s possible that the
html
andprice
fields may be mutated by other methods in the class.
The test code for the original class looks like this:
val stock = new Stock("AAPL", "Apple", 0, 0) stock.buildUrl val html = stock.getUrlContent stock.getPriceFromHtml(html) assert(stock.getPrice == 500.0)
This is a simple example of testing one method that has side effects, but of course this can get much more complicated in a large application.
By contrast, testing a pure function is easier:
- Call the function, passing in a known value.
- Get a result back from the function.
- Verify that the result is what you expected.
The functional approach results in test code like this:
val url = NetUtils.buildUrl("AAPL") val html = NetUtils.getUrlContent(url) val price = StockUtils.getPrice(html) assert(price == 500.0)
Although the code shown isn’t much shorter, it is much simpler.
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |
StockUtils
or Stock
object?
The methods that were moved to the StockUtils
class in the previous examples could be placed in the companion object of the Stock
class. That is, you could have placed the Stock
class and object in a file named Stock.scala:
case class Stock(symbol: String, company: String) object Stock { def buildUrl(stockSymbol: String): String = { ... } def getPrice(symbol: String, html: String): String = { ... } def getVolume(symbol: String, html: String): String = { ... } def getHigh(symbol: String, html: String): String = { ... } def getLow(symbol: String, html: String): String = { ... } }
For the purposes of this example, I put these methods in a StockUtils
class to be clear about separating the concerns of the Stock
class and object. In your own practice, use whichever approach you prefer.