Scala - Word counting on a file (iterator, memory, getLines, flatMap)

Quite often, you need to work through a file line by line, rather than reading the entire thing in as a single string as we did above. For example, you might need to process each line differently, so just having it as a single String isn’t particular convenient. Or, you might be working with a large file that cannot easily fit into memory (which is what happens when you read in the entire string). You can obtain the lines in the file as an Iterator[String], in which each item is a single line from the file, using the getLines method.

scala> Source.fromFile("pg1661.txt").getLines
res4: Iterator[String] = non-empty iterator

This iterator is ready for you to consume lines, but it doesn’t read all of the file into memory right away — instead it buffers it such that each line will be available for you as you ask for it, essentially reading off disk as you demand more lines. You can think of this as streaming the file to your Scala program, much like modern audio and video content is streamed to your computer: it is never actually stored, but is just transferred in parts to where it is needed, when it is needed.

Of course, Iterators share much with sequence data structures like Lists: once we have an Iterator, we can use foreach, for, map, etc. on it.