bufferedsource

Five good ways (and two bad ways) to read large text files with Scala

I’m working on a small project to parse large Apache access log files, with the file this week weighing in at 9.2 GB and 33,444,922 lines. So I gave myself 90 minutes to try a few different ways to write a simple “line count” program in Scala. (Not my final goal, but something I could use to measure file-reading speed without applying my algorithm.)

Scala code to read a text file to an Array (or Seq)

As a quick note, I use code like this read a text file into an Array, List, or Seq using Scala:

def readFile(filename: String): Seq[String] = {
    val bufferedSource = io.Source.fromFile(filename)
    val lines = (for (line <- bufferedSource.getLines()) yield line).toList
    bufferedSource.close
    lines
}

How to open and read text files in Scala

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 12.1, “How to open and read a text file in Scala.”

Back to top

Problem

You want to open a plain-text file in Scala and process the lines in that file.

Back to top

Solution

There are two primary ways to open and read a text file:

Table of Contents

  1. Problem
  2. Solution
Back to top

Scala file reading performance: Line counting algorithms

Out of curiosity about Scala’s file-reading performance, I decided to write a “line count” program in Scala. One obvious approach was to count the newline characters in the file: