Scala file-reading: How to open and read text files in Scala

This is an excerpt from the 1st Edition of the Scala Cookbook (partially modified for the internet). This is Recipe 12.1, “How to open and read a text file in Scala.”

Problem

You want to open a plain-text file in Scala and then read and process the lines in that file.

Solution

There are two primary ways to open and read a text file:

  • Use a concise, one-line syntax. This has the side effect of leaving the file open, but can be useful in short-lived programs, like shell scripts.
  • Use a slightly longer approach that properly closes the file.

This solution shows both approaches.

Solution 1) Using the concise “fromFile” syntax

In Scala shell scripts, where the JVM is started and stopped in a relatively short period of time, it may not matter that the file is closed, so you can use the Scala scala.io.Source.fromFile method as shown in the following examples.

One line at a time

To handle each line in the file as it’s read, use this approach:

import scala.io.Source

val filename = "fileopen.scala"
for (line <- Source.fromFile(filename).getLines) {
    println(line)
}

Read the file into a list or array

As a variation of this, use the following approach to get all of the lines from the file as a List or Array:

val lines = Source.fromFile("/Users/Al/.bash_profile").getLines.toList
val lines = Source.fromFile("/Users/Al/.bash_profile").getLines.toArray

The fromFile method returns a BufferedSource, and its getLines method treats “any of \r\n, \r, or \n as a line separator (longest match),” so each element in the sequence is a line from the file.

Read the file into a string

Use this approach to get all of the lines from the file as one String:

val fileContents = Source.fromFile(filename).getLines.mkString

This approach has the side effect of leaving the file open as long as the JVM is running, but for short-lived shell scripts, this shouldn’t be an issue; the file is closed when the JVM shuts down.

Solution 2) Properly closing the file

To properly close the file, get a reference to the BufferedSource when opening the file, and manually close it when you’re finished with the file:

val bufferedSource = Source.fromFile("example.txt")
for (line <- bufferedSource.getLines) {
    println(line.toUpperCase)
}

bufferedSource.close

For automated methods of closing the file, see the “Loan Pattern” examples in the Discussion.

The getLines method of the Source class returns a scala.collection.Iterator. The iterator returns each line without any newline characters. An iterator has many methods for working with a collection, and for the purposes of working with a file, it works well with the for loop, as shown.

Leaving files open

As mentioned, the first solution leaves the file open as long as the JVM is running:

// leaves the file open
for (line <- io.Source.fromFile("/etc/passwd").getLines) {
    println(line)
}

// also leaves the file open
val contents = io.Source.fromFile("/etc/passwd").mkString

On Unix systems, you can show whether a file is left open by executing one of these fromFile statements in the REPL with a real file (like /etc/passwd), and then running an lsof (“list open files”) command like this at the Unix command line:

$ sudo lsof -u Al | grep '/etc/passwd'

That command lists all the open files for the user named Al, and then searches the output for the /etc/passwd file. If this filename is in the output, it means that it’s open. On my Mac OS X system I see a line of output like this when the file is left open:

java  17148  Al  40r  REG  14,2  1475 174214161 /etc/passwd

When I shut down the REPL — thereby stopping the JVM process — the file no longer appears in the lsof output. So while this approach has this flaw, it can be used in short-lived JVM processes, such as a shell script. (You can demonstrate the same result using a Scala shell script. Just add a Thread.sleep call after the for loop so you can keep the script running long enough to check the lsof command.)

Automatically closing the resource

When working with files and other resources that need to be properly closed, it’s best to use the “Loan Pattern.” According this the Loan Pattern web page, this pattern “ensures that a resource is deterministically disposed of once it goes out of scope.” In Scala, this can be ensured with a try/finally clause, which the Loan Pattern website shows like this:

// required when using reflection, like `using` does
import scala.language.reflectiveCalls

def using[A](r : Resource)(f : Resource => A) : A =
    try {
        f(r)
    } finally {
        r.dispose()
    }

One way to implement the Loan Pattern when working with files is to use Joshua Suereth’s ARM library. To demonstrate this library, create an SBT project, and then add the following line to its build.sbt file to pull in the required dependencies:

libraryDependencies += "com.jsuereth" %% "scala-arm" % "1.3"

Next, create a file named TestARM.scala in the root directory of your SBT project with these contents:

import resource._

object TestARM extends App {
    for (source <- managed(scala.io.Source.fromFile("example.txt"))) {
        for (line <- source.getLines) {
            println(line)
        }
    }
}

This code prints all of the lines from the file named example.txt. The managed method from the ARM library makes sure that the resource is closed automatically when the resource goes out of scope. The ARM website shows several other ways the library can be used.

David Pollak’s “using” control structure

A second way to demonstrate the Loan Pattern is with the using method described on the Loan Pattern website. The best implementation I’ve seen of a using method is in the book Beginning Scala (Apress), by David Pollak. Here’s a very slight modification of his code:

// required when using reflection, like `using` does
import scala.language.reflectiveCalls

object Control {
    def using[A <: { def close(): Unit }, B](resource: A)(f: A => B): B =
        try {
            f(resource)
        } finally {
            resource.close()
        }
}

This using method takes two parameters:

  • An object that has a close() method
  • A block of code to be executed, which transforms the input type A to the output type B

The body of this using method does exactly what’s shown on the Loan Pattern web page, wrapping the block of code it’s given in a try/finally block.

The following code demonstrates how to use this method when reading from a file:

import Control._

object TestUsing extends App {
    using(io.Source.fromFile("example.txt")) { source => {
        for (line <- source.getLines) {
            println(line)
        }
    }}
}

Both the ARM library and the using method end up with the same result, implementing the Loan Pattern to make sure your resource is closed automatically.

Handling exceptions

You can generate exceptions any time you try to open a file, and if you want to handle your exceptions, use Scala’s try/catch syntax:

import scala.io.Source
import java.io.{FileNotFoundException, IOException}

val filename = "no-such-file.scala"
try {
    for (line <- Source.fromFile(filename).getLines) {
        println(line)
    }
} catch {
    case e: FileNotFoundException => println("Couldn't find that file.")
    case e: IOException => println("Got an IOException!")
}

The following code demonstrates how the fromFile method can be used with using to create a method that returns the entire contents of a file as a List[String], wrapped in an Option:

import Control._

def readTextFile(filename: String): Option[List[String]] = {
    try {
        val lines = using(io.Source.fromFile(filename)) { source =>
            (for (line <- source.getLines) yield line).toList
        }
        Some(lines)
    } catch {
        case e: Exception => None
  }
}

This method returns a Some(List[String]) on success, and None if something goes wrong, such as a FileNotFoundException. It can be used in the following ways:

val filename = "/etc/passwd"

println("--- FOREACH ---")
val result = readTextFile(filename)
result foreach { strings =>
    strings.foreach(println)
}

println("\n--- MATCH ---")
readTextFile(filename) match {
    case Some(lines) => lines.foreach(println)
    case None => println("couldn't read file")
}

If the process of opening and reading a file fails, you may prefer to return a Try or an empty List[String]. See Recipes 20.5 and 20.6 for examples of those approaches.

Update: Reading a file with Try, Success, and Failure

This code shows how you can read a text file into a list of strings — List[String] — using Scala’s Try, Success, and Failure classes:

import scala.util.{Try, Success, Failure}

def readTextFileWithTry(filename: String): Try[List[String]] = {
    Try {
        val lines = using(io.Source.fromFile(filename)) { source =>
            (for (line <- source.getLines) yield line).toList
        }
        lines
    }
}

The benefit of using Try is that you can get the cause of the exception back when you call this method and an exception occurs. Here’s one example of how you can use this method with a match/case expression:

val passwdFile = readTextFileWithTry("/etc/passwd")
passwdFile match {
    case Success(lines) => lines.foreach(println)
    case Failure(s) => println(s"Failed, message is: $s")
}

In the Success case you handle the condition where the file was opened properly, and in the Failure case you deal with the exception however you want to.

Multiple fromFile methods

In Scala 2.10, there are eight variations of the fromFile method that let you specify a character encoding, buffer size, codec, and URI. For instance, you can specify an expected character encoding for a file like this:

// specify the encoding
Source.fromFile("example.txt", "UTF-8")

See the Scaladoc for the scala.io.Source object (not the Source class, which is an abstract class) for more information.

Because Scala works so well with Java, you can use the Java FileReader and BufferedReader classes, as well as other Java libraries, like the Apache Commons “FileUtils” library.

I hope it has been helpful. All the best, Al.

See Also