read

Five good ways (and two bad ways) to read large text files with Scala

I’m working on a small project to parse large Apache access log files, with the file this week weighing in at 9.2 GB and 33,444,922 lines. So I gave myself 90 minutes to try a few different ways to write a simple “line count” program in Scala. (Not my final goal, but something I could use to measure file-reading speed without applying my algorithm.)

Some Java file utilities

As a bit of warning, this is some old Java code, but if you want to create your own Java file utilities (utility methods), this code might help you get started:

If you could give one tip for reaching heights in tech today, what would it be?

When asked, “If you could give one tip for reaching heights in tech today, what would it be?”, this was the initial response from Jonas Bonér, creator of Akka:

  • Work hard at minimizing your ego & attachment to identity
  • Learn deliberately, seek out weaknesses & work hard at them
  • Eliminate bad habits, replace them with good, one at a time
  • Read a lot, foundational stuff, not just latest hyped thing

How to read from two databases at the same time with ScalikeJdbc

This example shows how to connect-to and read-from multiple databases with ScalikeJdbc (a Scala JDBC library). I assume you already know how to use ScalikeJdbc with one database, so I’m only going to show the code and configuration file. (I’m not going to explain the details.)

The ScalikeJdbc configuration file

My ScalikeJdbc code is in an SBT project, so the ScalikeJdbc configuration file is at src/main/resources/application.conf:

“I have known no wise people who didn’t read all the time”

“In my whole life, I have known no wise people (over a broad subject matter area) who didn’t read all the time — none, zero. You’d be amazed at how much Warren reads — and at how much I read. My children laugh at me. They think I’m a book with a couple of legs sticking out.”

~ Charlie Munger, talking about Warren Buffett and himself

The beginning of a Scala “FileUtils” class

In production code I recommend that you use a good “Files” library like Apache Commons IO, but if you want to create your own Scala FileUtils class, here’s some source code that can help you get started.

First, here’s some code for the FileUtils class (an object, technically):

How to process every line in a file with a Unix/Linux shell script

Unix/Linux shell script FAQ: How do I write a Unix or Linux shell script where I "do something" for every line in a text file?

Solution: An easy way to process every line in a text file is to use a Unix/Linux while loop in combination with the Linux cat command, like this:

Scala code to read a text file to an Array (or Seq)

As a quick note, I use code like this read a text file into an Array, List, or Seq using Scala:

def readFile(filename: String): Seq[String] = {
    val bufferedSource = io.Source.fromFile(filename)
    val lines = (for (line <- bufferedSource.getLines()) yield line).toList
    bufferedSource.close
    lines
}