One More Thing: Algebraic Substitution

Algebraic Substitution

Before we move on from this Word Count example I need to add one very important point: Because I wrote wc using only pure functions and immutable values, both you and a compiler are free to reduce this code.

What this means is that at the end of the previous lesson I had this function:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks = 
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    val lcListOfWords = 
        lowerCaseAllWords(listOfWords)
    val initialMap = 
        convertWordListToWordCount(lcListOfWords)
    val sortedMap = 
        sortMapByHighestValue(initialMap)
    sortedMap

But because initialMap (near the end of the function) is exactly equal to convertWordListToWordCount(lcListOfWords) — and always will be, because the values are immutable and the functions are pure — you can replace initialMap in the second-to-last line with its equivalent function call, like this:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks = 
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    val lcListOfWords = 
        lowerCaseAllWords(listOfWords)
    // ELIMINATED 'initialMap'
    val sortedMap = 
        sortMapByHighestValue(convertWordListToWordCount(lcListOfWords))
    sortedMap

By the same argument, lcListOfWords is exactly equal to:

lowerCaseAllWords(listOfWords)

so I can eliminate that intermediate variable:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks = 
        squeezeBlankSpaces(stringWithoutFormatting)
    val listOfWords = 
        convertStringToListOfWords(stringWithSingleBlanks)
    // ELIMINATED 'lcListOfWords'
    val sortedMap = 
        sortMapByHighestValue(
            convertWordListToWordCount(
                lowerCaseAllWords(listOfWords)
            )
        )
    sortedMap

Next, listOfWords is exactly equal to:

convertStringToListOfWords(stringWithSingleBlanks)

and always will be — so I can remove listOfWords as an intermediate value:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    val stringWithSingleBlanks = 
        squeezeBlankSpaces(stringWithoutFormatting)
    // ELIMINATED 'listOfWords'
    val sortedMap = 
        sortMapByHighestValue(
            convertWordListToWordCount(
                lowerCaseAllWords(
                    convertStringToListOfWords(stringWithSingleBlanks)
                )
            )
        )
    sortedMap

The same is true of stringWithSingleBlanks, so I reduce the code again:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    val stringWithoutFormatting = 
        stripFormattingCharacters(stringWithoutNewlines)
    // ELIMINATED 'stringWithSingleBlanks'
    val sortedMap = 
        sortMapByHighestValue(
            convertWordListToWordCount(
                lowerCaseAllWords(
                    convertStringToListOfWords(
                        squeezeBlankSpaces(stringWithoutFormatting)
                    )
                )
            )
        )
    sortedMap

and then do the same with stringWithoutFormatting:

def wc(document: String): VectorMap[String, Int] = 
    val stringWithoutNewlines = 
        replaceNewlinesWithBlanks(document)
    // ELIMINATED 'stringWithoutFormatting'
    val sortedMap = 
        sortMapByHighestValue(
            convertWordListToWordCount(
                lowerCaseAllWords(
                    convertStringToListOfWords(
                        squeezeBlankSpaces(
                            stripFormattingCharacters(stringWithoutNewlines)
                        )
                    )
                )
            )
        )
    sortedMap

and then stringWithoutNewlines:

def wc(document: String): VectorMap[String, Int] = 
    // ELIMINATED 'stringWithoutNewlines'
    val sortedMap = 
        sortMapByHighestValue(
            convertWordListToWordCount(
                lowerCaseAllWords(
                    convertStringToListOfWords(
                        squeezeBlankSpaces(
                            stripFormattingCharacters(
                                replaceNewlinesWithBlanks(document)
                            )
                        )
                    )
                )
            )
        )
    sortedMap

And if you want to, you can eliminate sortedMap:

def wc(document: String): VectorMap[String, Int] = 
    // ELIMINATED 'sortedMap'
    sortMapByHighestValue(
        convertWordListToWordCount(
            lowerCaseAllWords(
                convertStringToListOfWords(
                    squeezeBlankSpaces(
                        stripFormattingCharacters(
                            replaceNewlinesWithBlanks(document)
                        )
                    )
                )
            )
        )
    )

To be clear, you DO NOT have to write code like this. I’m just showing it to demonstrate a possibility of this style of programming.

Algebraic substitution

We call this process algebraic substitution, and it works because:

  • The functions are pure
  • The variables are immutable
  • The data structures are immutable

When code is written like this, both you and the compiler are perfectly free to write it in any of these ways, and you’re always guaranteed to get the same result.

KEY: But if you use mutable variables, mutable data types, and impure methods, the same cannot be said.

Technical goo: “Referential Transparency”

If you like algebra, you’ll like referential transparency. We say that an expression is referentially transparent (RT) if it can be replaced by its resulting value without changing the behavior of the program. This must be true regardless of where the expression is used in the program. So this is basically a way of saying, “Algebraic substitution can be done here.”

What I did in the previous example is actually the opposite of that — I replaced the intermediate variables with their expressions — but that’s the cool part: all of those intermediate variables and their pure function calls are 100% equivalent.

Finally, notice that I’m not doing any “fancy FP” stuff in this code. I’m just writing EOP code with immutable variables, immutable data types, and pure functions.