Scala best practice: Create methods that have no side effects (pure functions)

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 20.1, “Scala best practice: Create methods that have no side effects (pure functions).”

Back to top

Problem

In keeping with the best practices of Functional Programming (FP), you want to write “pure functions.”

Back to top

Solution

In general, when writing a function (or method), your goal should be to write it as a pure function. This raises the question, “What is a pure function?” Before we tackle that question we need to look at another term, referential transparency, because it’s part of the description of a pure function.

Back to top

Referential transparency

If you like algebra, you’ll like referential transparency. An expression is referentially transparent (RT) if it can be replaced by its resulting value without changing the behavior of the program. This must be true regardless of where the expression is used in the program.

For instance, assume that x and y are immutable variables within some scope of an application, and within that scope they’re used to form this expression:

x + y

You can assign this expression to a third variable, like this:

val z = x + y

Now, throughout the given scope of your program, anywhere the expression x + y is used, it can be replaced by z without affecting the result of the program.

Note that although I stated that x and y are immutable variables, they can also be the result of RT functions. For instance, "hello".length + "world".length will always be 10. This result could be assigned to z, and then z could be used everywhere instead of this expression.

Although this is a simple example, this is referential transparency in a nutshell.

Back to top

Pure functions

Wikipedia defines a pure function as follows:

  1. The function always evaluates to the same result value given the same argument value(s). It cannot depend on any hidden state or value, and it cannot depend on any I/O.
  2. Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.

The book Functional Programming in Scala by Chiusano and Bjarnason (Manning Publications), states this a little more precisely:

“A function f is pure if the expression f(x) is referentially transparent for all referentially transparent values x.”

To summarize, a pure function is referentially transparent and has no side effects.

Regarding side effects, the authors of the book, Programming in Scala, make a great observation:

“A telltale sign of a function with side effects is that its result type is Unit.”

From these definitions, we can make these statements about pure functions:

  • A pure function is given one or more input parameters.
  • Its result is based solely off of those parameters and its algorithm. The algorithm will not be based on any hidden state in the class or object it’s contained in.
  • It won’t mutate the parameters it’s given.
  • It won’t mutate the state of its class or object.
  • It doesn’t perform any I/O operations, such as reading from disk, writing to disk, prompting for input, or reading input.

These are some examples of pure functions:

  • Mathematical functions, such as addition, subtraction, multiplication.
  • Methods like split and length on the String class.
  • The to* methods on the String class (toInt, toDouble, etc.)
  • Methods on immutable collections, including map, drop, take, filter, etc.
  • The functions that extract values from an HTML string in Recipe 20.3.

The following functions are not pure functions:

  • Methods like getDayOfWeek, getHour, or getMinute. They return a different value depending on when they are called.
  • A getRandomNumber function.
  • A function that reads user input or prints output.
  • A function that writes to an external data store, or reads from a data store.

If you’re coming to Scala from a pure OOP background, it can be difficult to write pure functions. Speaking for myself, historically my code has followed the OOP paradigm of encapsulating data and behavior in classes, and as a result, my methods often mutated the internal state of objects.

At this point you may be wondering how you can get anything done in a program consisting only of pure functions. If you can’t read input from a user or database, and can’t write output, how will your application ever work?

The best advice I can share about FP is to follow the 80/20 rule: write 80% of your program using pure functions (the “cake”), then create a 20% layer of other code on top of the functional base (the “icing”) to handle the user interface, printing, database interactions, and other methods that have “side effects”.

I read about this cake/icing philosophy somewhere, but haven’t been able to find the original reference.

Obviously any interesting application will have I/O, and this balanced approach lets you have the best of both worlds.

Back to top

The Java/OOP approach

To look at how to write pure functions, you’ll convert the methods in an OOP class into pure functions. The following code shows how you might create a Stock class that follows the Java/OOP paradigm. The following class intentionally has a few flaws. It not only has the ability to store information about a Stock, but it can also access the Internet to get the current stock price, and further maintains a list of historical prices for the stock:

// a poorly written class
class Stock (var symbol: String, var company: String,
             var price: BigDecimal, var volume: Long) {
    var html: String = _
    def buildUrl(stockSymbol: String): String = { ... }
    def getUrlContent(url: String):String = { ... }
    def setPriceFromHtml(html: String) { this.price = ... }
    def setVolumeFromHtml(html: String) { this.volume = ... }
    def setHighFromHtml(html: String) { this.high = ... }
    def setLowFromHtml(html: String) { this.low = ... }
    // some dao-like functionality
    private val _history: ArrayBuffer[Stock] = { ... }
    val getHistory = _history
}

Beyond attempting to do too many things, from an FP perspective, it has these other problems:

  • All of its fields are mutable
  • All of the set methods mutate the class fields
  • The getHistory method returns a mutable data structure

The getHistory method is easily fixed by only sharing an immutable data structure, but this class has deeper problems. Let’s fix them.

Back to top

Fixing the problems

The first fix is to separate two concepts that are buried in the class. First, there should be a concept of a Stock, where a Stock consists only of a symbol and company name. You can make this a case class:

case class Stock(symbol: String, company: String)

Examples of this are Stock("AAPL", "Apple") and Stock("GOOG", "Google").

Second, at any moment in time there is information related to a stock’s performance on the stock market. You can call this data structure a StockInstance, and also define it as a case class:

case class StockInstance(
    symbol: String,
    datetime: String,
    price: BigDecimal,
    volume: Long)

A StockInstance example looks like this:

StockInstance("AAPL", "Nov. 2, 2012 5:00pm", 576.80, 20431707)

Going back to the original class, the getUrlContent method isn’t specific to a stock, and should be moved to a different object, such as a general-purpose NetworkUtils object:

object NetworkUtils {
    def getUrlContent(url: String): String = { ... }
}

This method takes a URL as a parameter and returns the HTML content from that URL.

Similarly, the ability to build a URL from a stock symbol should be moved to an object. Because this behavior is specific to a stock, you’ll put it in an object named StockUtils:

object StockUtils {
    def buildUrl(stockSymbol: String): String = { ... }
}

The ability to extract the stock price from the HTML can also be written as a pure function and should be moved into the same object:

object StockUtils {
    def buildUrl(stockSymbol: String): String = { ... }
    def getPrice(html: String): String = { ... }
}

In fact, all of the methods named set* in the previous class should be get* methods in StockUtils:

object StockUtils {
    def buildUrl(stockSymbol: String): String = { ... }
    def getPrice(symbol: String, html: String): String = { ... }
    def getVolume(symbol: String, html: String): String = { ... }
    def getHigh(symbol: String, html: String): String = { ... }
    def getLow(symbol: String, html: String): String = { ... }
}

The methods getPrice, getVolume, getHigh, and getLow are all pure functions: given the same HTML string and stock symbol, they will always return the same values, and they don’t have side effects.

Following this thought process, the date and time methods are moved to a DateUtils object:

object DateUtils {
    def currentDate: String = { ... }
    def currentTime: String = { ... }
}

With this new design, you create an instance of a Stock for the current date and time as a simple series of expressions. First, retrieve the HTML that describes the stock from a web page:

val stock = new Stock("AAPL", "Apple")
val url = StockUtils.buildUrl(stock.symbol)
val html = NetUtils.getUrlContent(url)

Once you have the HTML, extract the desired stock information, get the date, and create the Stock instance:

val price = StockUtils.getPrice(html)
val volume = StockUtils.getVolume(html)
val high = StockUtils.getHigh(html)
val low = StockUtils.getLow(html)
val date = DateUtils.currentDate
val stockInstance = StockInstance(symbol, date, price, volume, high, low)

Notice that all of the variables are immutable, and each line is an expression.

The code is simple, so you can eliminate all the intermediate variables, if desired:

val html = NetUtils.getUrlContent(url)
val stockInstance = StockInstance(
      symbol,
      DateUtils.currentDate,
      StockUtils.getPrice(html),
      StockUtils.getVolume(html),
      StockUtils.getHigh(html),
      StockUtils.getLow(html))

As mentioned earlier, the methods getPrice, getVolume, getHigh, and getLow are all pure functions. But what about methods like getDate? It’s not a pure function, but the fact is, you need the date and time to solve the problem. This is part of what’s meant by having a healthy, balanced attitude about pure functions.

As a final note about this example, there’s no need for the Stock class to maintain a mutable list of stock instances. Assuming that the stock information is stored in a database, you can create a StockDao to retrieve the data:

object StockDao {
    def getStockInstances(symbol: String): Vector[StockInstance] = { ... }
    // other code ...
}

Though getStockInstances isn’t a pure function, the Vector class is immutable, so you can feel free to share it without worrying that it might be modified somewhere else in your application.

Although I use the prefix get in many of those method names, it’s not at all necessary to follow a JavaBeans-like naming convention. In fact, in part because you write “setter” methods in Scala without beginning their names with set, and also to follow the Uniform Access Principle, many Scala APIs don’t use get or set at all.

For example, think of case classes. The accessors and mutators they generate don’t use get or set:

case class Person(name: String)
val p = Person("Mark")
p.name             // accessor
p.name = "Bubba"   // mutator

That being said, although it’s best to follow the Scala standards, use whatever method names best fit your API.

Back to top

Discussion

A benefit of this coding style is that pure functions are easier to test. For instance, attempting to test the set* methods in the original code is harder than it needs to be. For each field (price, volume, high, and low), you have to follow these steps:

  1. Set the html field in the object.
  2. Call the current set method, such as setPriceFromHtml.
  3. Internally, this method reads the private html class field.
  4. When the method runs, it mutates a field in the class (price).
  5. You have to “get” that field to verify that it was changed.
  6. In more complicated classes, it’s possible that the html and price fields may be mutated by other methods in the class.

The test code for the original class looks like this:

val stock = new Stock("AAPL", "Apple", 0, 0)
stock.buildUrl
val html = stock.getUrlContent
stock.getPriceFromHtml(html)
assert(stock.getPrice == 500.0)

This is a simple example of testing one method that has side effects, but of course this can get much more complicated in a large application.

By contrast, testing a pure function is easier:

  1. Call the function, passing in a known value.
  2. Get a result back from the function.
  3. Verify that the result is what you expected.

The functional approach results in test code like this:

val url = NetUtils.buildUrl("AAPL")
val html = NetUtils.getUrlContent(url)
val price = StockUtils.getPrice(html)
assert(price == 500.0)

Although the code shown isn’t much shorter, it is much simpler.

Back to top

StockUtils or Stock object?

The methods that were moved to the StockUtils class in the previous examples could be placed in the companion object of the Stock class. That is, you could have placed the Stock class and object in a file named Stock.scala:

case class Stock(symbol: String, company: String)

object Stock {
    def buildUrl(stockSymbol: String): String = { ... }
    def getPrice(symbol: String, html: String): String = { ... }
    def getVolume(symbol: String, html: String): String = { ... }
    def getHigh(symbol: String, html: String): String = { ... }
    def getLow(symbol: String, html: String): String = { ... }
}

For the purposes of this example, I put these methods in a StockUtils class to be clear about separating the concerns of the Stock class and object. In your own practice, use whichever approach you prefer.

Back to top

See Also

  • Pure Functions
  • Referential Transparency
  • The Uniform Access Principle
Back to top

The Scala Cookbook

This tutorial is sponsored by the Scala Cookbook, which I wrote for O’Reilly:

You can find the Scala Cookbook at these locations:

Back to top

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.