By Alvin Alexander. Last updated: March 10, 2018
At the moment I can’t remember why I wrote the following Scala “line count” code, but without any introduction, I thought I’d share it here:
object LineCount extends App { def using[A <: { def close(): Unit }, B](resource: A)(f: A => B): B = { try { f(resource) } finally { resource.close() } } def timer[A](blockOfCode: => A) = { val startTime = System.nanoTime val result = blockOfCode val stopTime = System.nanoTime val delta = stopTime - startTime (result, delta/1000000d) } def countLines(filename: String): Long = { val NEWLINE = 10 var newlineCount = 0L using(io.Source.fromFile(filename)) { source => { for { char <- source if char.toByte == NEWLINE } newlineCount += 1 newlineCount } } } // took 87 secs (10M lines) def countLines2(filename: String): Option[Long] = { val NEWLINE = 10 var newlineCount = 0L var source: io.BufferedSource = null try { source = io.Source.fromFile(filename) for { char <- source if char.toByte == 10 } newlineCount += 1 Some(newlineCount) } catch { case e: Exception => None } finally { if (source != null) source.close } } // took 27 secs (10M lines) def countLines3(filename: String): Option[Long] = { val NEWLINE = 10 var newlineCount = 0L var source: io.BufferedSource = null try { source = io.Source.fromFile(filename) for (line <- source.getLines) { newlineCount += 1 } Some(newlineCount) } catch { case e: Exception => None } finally { if (source != null) source.close } } val (lines, time) = timer{ countLines3("tenmillionlines.txt") } println(s"Counted $lines in $time ms") }
If I remember right I was looking at the performance of various line count functions, as shown in the comments. You can see the performance of the last two functions in the comments when run on a very old iMac. I don’t remember the performance of the first function.
Notes: This code also shows the using
and timer
methods, which I use quite a bit.
Update: I just found my original post, which I titled, Scala file reading performance.