Scala best practice: Create methods that have no side effects (pure functions)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 20.1, “Scala best practice: Create methods that have no side effects (pure functions).”

Problem

In keeping with the best practices of Functional Programming (FP), you want to write “pure functions.”

Solution

In general, when writing a function (or method), your goal should be to write it as a pure function. This raises the question, “What is a pure function?” Before we tackle that question we need to look at another term, referential transparency, because it’s part of the description of a pure function.

Referential transparency

If you like algebra, you’ll like referential transparency. An expression is referentially transparent (RT) if it can be replaced by its resulting value without changing the behavior of the program. This must be true regardless of where the expression is used in the program.

For instance, assume that `x` and `y` are immutable variables within some scope of an application, and within that scope they’re used to form this expression:

`x + y`

You can assign this expression to a third variable, like this:

`val z = x + y`

Now, throughout the given scope of your program, anywhere the expression `x + y` is used, it can be replaced by `z` without affecting the result of the program.

Note that although I stated that `x` and `y` are immutable variables, they can also be the result of RT functions. For instance, `"hello".length + "world".length` will always be 10. This result could be assigned to `z`, and then `z` could be used everywhere instead of this expression.

Although this is a simple example, this is referential transparency in a nutshell.

Pure functions

Wikipedia defines a pure function as follows:

1. The function always evaluates to the same result value given the same argument value(s). It cannot depend on any hidden state or value, and it cannot depend on any I/O.
2. Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.

The book Functional Programming in Scala by Chiusano and Bjarnason (Manning Publications), states this a little more precisely:

“A function `f` is pure if the expression `f(x)` is referentially transparent for all referentially transparent values `x`.”

To summarize, a pure function is referentially transparent and has no side effects.

Regarding side effects, the authors of the book, Programming in Scala, make a great observation:

“A telltale sign of a function with side effects is that its result type is `Unit`.”

From these definitions, we can make these statements about pure functions:

• A pure function is given one or more input parameters.
• Its result is based solely off of those parameters and its algorithm. The algorithm will not be based on any hidden state in the class or object it’s contained in.
• It won’t mutate the parameters it’s given.
• It won’t mutate the state of its class or object.
• It doesn’t perform any I/O operations, such as reading from disk, writing to disk, prompting for input, or reading input.

These are some examples of pure functions:

• Mathematical functions, such as addition, subtraction, multiplication.
• Methods like `split` and `length` on the `String` class.
• The `to*` methods on the `String` class (`toInt`, `toDouble`, etc.)
• Methods on immutable collections, including `map`, `drop`, `take`, `filter`, etc.
• The functions that extract values from an HTML string in Recipe 20.3.

The following functions are not pure functions:

• Methods like `getDayOfWeek`, `getHour`, or `getMinute`. They return a different value depending on when they are called.
• A `getRandomNumber` function.
• A function that reads user input or prints output.
• A function that writes to an external data store, or reads from a data store.

If you’re coming to Scala from a pure OOP background, it can be difficult to write pure functions. Speaking for myself, historically my code has followed the OOP paradigm of encapsulating data and behavior in classes, and as a result, my methods often mutated the internal state of objects.

At this point you may be wondering how you can get anything done in a program consisting only of pure functions. If you can’t read input from a user or database, and can’t write output, how will your application ever work?

The best advice I can share about FP is to follow the 80/20 rule: write 80% of your program using pure functions (the “cake”), then create a 20% layer of other code on top of the functional base (the “icing”) to handle the user interface, printing, database interactions, and other methods that have “side effects”.

Obviously any interesting application will have I/O, and this balanced approach lets you have the best of both worlds.

The Java/OOP approach

To look at how to write pure functions, you’ll convert the methods in an OOP class into pure functions. The following code shows how you might create a `Stock` class that follows the Java/OOP paradigm. The following class intentionally has a few flaws. It not only has the ability to store information about a `Stock`, but it can also access the Internet to get the current stock price, and further maintains a list of historical prices for the stock:

```// a poorly written class
class Stock (var symbol: String, var company: String,
var price: BigDecimal, var volume: Long) {
var html: String = _
def buildUrl(stockSymbol: String): String = { ... }
def getUrlContent(url: String):String = { ... }
def setPriceFromHtml(html: String) { this.price = ... }
def setVolumeFromHtml(html: String) { this.volume = ... }
def setHighFromHtml(html: String) { this.high = ... }
def setLowFromHtml(html: String) { this.low = ... }
// some dao-like functionality
private val _history: ArrayBuffer[Stock] = { ... }
val getHistory = _history
}```

Beyond attempting to do too many things, from an FP perspective, it has these other problems:

• All of its fields are mutable
• All of the `set` methods mutate the class fields
• The `getHistory` method returns a mutable data structure

The `getHistory` method is easily fixed by only sharing an immutable data structure, but this class has deeper problems. Let’s fix them.

Fixing the problems

The first fix is to separate two concepts that are buried in the class. First, there should be a concept of a `Stock`, where a `Stock` consists only of a `symbol` and `company` name. You can make this a `case` class:

`case class Stock(symbol: String, company: String)`

Examples of this are `Stock("AAPL", "Apple")` and `Stock("GOOG", "Google")`.

Second, at any moment in time there is information related to a stock’s performance on the stock market. You can call this data structure a `StockInstance`, and also define it as a `case` class:

```case class StockInstance(
symbol: String,
datetime: String,
price: BigDecimal,
volume: Long)```

A `StockInstance` example looks like this:

`StockInstance("AAPL", "Nov. 2, 2012 5:00pm", 576.80, 20431707)`

Going back to the original class, the `getUrlContent` method isn’t specific to a stock, and should be moved to a different object, such as a general-purpose `NetworkUtils` object:

```object NetworkUtils {
def getUrlContent(url: String): String = { ... }
}```

This method takes a URL as a parameter and returns the HTML content from that URL.

Similarly, the ability to build a URL from a stock symbol should be moved to an object. Because this behavior is specific to a stock, you’ll put it in an object named `StockUtils`:

```object StockUtils {
def buildUrl(stockSymbol: String): String = { ... }
}```

The ability to extract the stock price from the HTML can also be written as a pure function and should be moved into the same object:

```object StockUtils {
def buildUrl(stockSymbol: String): String = { ... }
def getPrice(html: String): String = { ... }
}```

In fact, all of the methods named `set*` in the previous class should be `get*` methods in `StockUtils`:

```object StockUtils {
def buildUrl(stockSymbol: String): String = { ... }
def getPrice(symbol: String, html: String): String = { ... }
def getVolume(symbol: String, html: String): String = { ... }
def getHigh(symbol: String, html: String): String = { ... }
def getLow(symbol: String, html: String): String = { ... }
}```

The methods `getPrice`, `getVolume`, `getHigh`, and `getLow` are all pure functions: given the same HTML string and stock symbol, they will always return the same values, and they don’t have side effects.

Following this thought process, the date and time methods are moved to a `DateUtils` object:

```object DateUtils {
def currentDate: String = { ... }
def currentTime: String = { ... }
}```

With this new design, you create an instance of a `Stock` for the current date and time as a simple series of expressions. First, retrieve the HTML that describes the stock from a web page:

```val stock = new Stock("AAPL", "Apple")
val url = StockUtils.buildUrl(stock.symbol)
val html = NetUtils.getUrlContent(url)```

Once you have the HTML, extract the desired stock information, get the date, and create the `Stock` instance:

```val price = StockUtils.getPrice(html)
val volume = StockUtils.getVolume(html)
val high = StockUtils.getHigh(html)
val low = StockUtils.getLow(html)
val date = DateUtils.currentDate
val stockInstance = StockInstance(symbol, date, price, volume, high, low)```

Notice that all of the variables are immutable, and each line is an expression.

The code is simple, so you can eliminate all the intermediate variables, if desired:

```val html = NetUtils.getUrlContent(url)
val stockInstance = StockInstance(
symbol,
DateUtils.currentDate,
StockUtils.getPrice(html),
StockUtils.getVolume(html),
StockUtils.getHigh(html),
StockUtils.getLow(html))```

As mentioned earlier, the methods `getPrice`, `getVolume`, `getHigh`, and `getLow` are all pure functions. But what about methods like `getDate`? It’s not a pure function, but the fact is, you need the date and time to solve the problem. This is part of what’s meant by having a healthy, balanced attitude about pure functions.

As a final note about this example, there’s no need for the `Stock` class to maintain a mutable list of stock instances. Assuming that the stock information is stored in a database, you can create a `StockDao` to retrieve the data:

```object StockDao {
def getStockInstances(symbol: String): Vector[StockInstance] = { ... }
// other code ...
}```

Though `getStockInstances` isn’t a pure function, the `Vector` class is immutable, so you can feel free to share it without worrying that it might be modified somewhere else in your application.

Although I use the prefix `get` in many of those method names, it’s not at all necessary to follow a JavaBeans-like naming convention. In fact, in part because you write “setter” methods in Scala without beginning their names with `set`, and also to follow the Uniform Access Principle, many Scala APIs don’t use `get` or `set` at all.

For example, think of `case` classes. The accessors and mutators they generate don’t use `get` or `set`:

```case class Person(name: String)
val p = Person("Mark")
p.name             // accessor
p.name = "Bubba"   // mutator```

That being said, although it’s best to follow the Scala standards, use whatever method names best fit your API.

Discussion

A benefit of this coding style is that pure functions are easier to test. For instance, attempting to test the `set*` methods in the original code is harder than it needs to be. For each field (`price`, `volume`, `high`, and `low`), you have to follow these steps:

1. Set the `html` field in the object.
2. Call the current `set` method, such as `setPriceFromHtml`.
3. Internally, this method reads the private `html` class field.
4. When the method runs, it mutates a field in the class (`price`).
5. You have to “get” that field to verify that it was changed.
6. In more complicated classes, it’s possible that the `html` and `price` fields may be mutated by other methods in the class.

The test code for the original class looks like this:

```val stock = new Stock("AAPL", "Apple", 0, 0)
stock.buildUrl
val html = stock.getUrlContent
stock.getPriceFromHtml(html)
assert(stock.getPrice == 500.0)```

This is a simple example of testing one method that has side effects, but of course this can get much more complicated in a large application.

By contrast, testing a pure function is easier:

1. Call the function, passing in a known value.
2. Get a result back from the function.
3. Verify that the result is what you expected.

The functional approach results in test code like this:

```val url = NetUtils.buildUrl("AAPL")
val html = NetUtils.getUrlContent(url)
val price = StockUtils.getPrice(html)
assert(price == 500.0)```

Although the code shown isn’t much shorter, it is much simpler.

StockUtils or Stock object?

The methods that were moved to the `StockUtils` class in the previous examples could be placed in the companion object of the `Stock` class. That is, you could have placed the `Stock` class and object in a file named Stock.scala:

```case class Stock(symbol: String, company: String)

object Stock {
def buildUrl(stockSymbol: String): String = { ... }
def getPrice(symbol: String, html: String): String = { ... }
def getVolume(symbol: String, html: String): String = { ... }
def getHigh(symbol: String, html: String): String = { ... }
def getLow(symbol: String, html: String): String = { ... }
}```

For the purposes of this example, I put these methods in a `StockUtils` class to be clear about separating the concerns of the `Stock` class and object. In your own practice, use whichever approach you prefer.

• Pure Functions
• Referential Transparency
• The Uniform Access Principle