As a brief note today, if you need to read a binary file with Scala, here’s an approach I just tested and used. It uses the Java FileInputStream
and BufferedInputStream
classes along with Iterator.continually
:
package file_binary
import java.io.{FileInputStream, BufferedInputStream}
@main def readBinaryFile =
val filename = "access.log"
val bis = BufferedInputStream(FileInputStream(filename))
Iterator.continually(bis.read())
.takeWhile(_ != -1)
.foreach(b => b) // do whatever you want with each byte
bis.close
(As a note to self) this code is a replacement for reading a file with a while
loop in Scala.
Discussion
This example uses some proposed Scala 3 (Dotty) significant indentation syntax, but it’s easily converted to Scala 2.
The Iterator.continually approach lets you loop over each byte in the file. When the end of file is reached a -1
value is returned by the read
method, so that’s why the takeWhile
method is used as shown.
This is something of a unique problem because the read
function returns a Byte
, but it continues to return bytes as long as they exist in the file. As a result, the Iterator.continually method is a good way to handle this particular problem.
Note that you can use LazyList.continually instead of Iterator.continually, if you prefer. (LazyList
is a replacement for the older Scala Stream
class.)
A performance note
Also note that I wrap FileInputStream
with BufferedInputStream
. If you only use FileInputStream
, it takes about 181 seconds to read an Apache access log file on my laptop that has 650,000 lines, but it only takes about 1.6 seconds to read the same file if you wrap that with BufferedInputStream
.
The Iterator and Stream objects
For your convenience and reading pleasure, here are links to those objects:
this post is sponsored by my books: | |||
#1 New Release |
FP Best Seller |
Learn Scala 3 |
Learn FP Fast |