file

Scala file reading performance - Line counting algorithms

Out of curiosity about Scala’s file-reading performance, I decided to write a “line count” program in Scala. One obvious approach was to count the newline characters in the file:

// took 101 secs (10M lines)
// work on one character at a time
def countLines1(source: Source): Long = {
  var newlineCount = 0L
  for {
    c <- source
    if c.toByte == NEWLINE
  } newlineCount += 1
  newlineCount
}

As the comment shows, this took 101 seconds to read a file that has 10M lines. (An Apache access log file for this website.)

A simple Scala Try, Success, Failure example (reading a file)

Sorry, not much free time these days, so without any discussion, here’s a simple Scala Try/Success/Failure example:

20+ Unix/Linux find command examples

Linux/Unix FAQ: Can you share some find command examples?

Sure. The Unix/Linux find command is very powerful. It can search the entire filesystem to find files and directories according to the search criteria you specify. Besides using the find command to locate files, you can also execute other Linux commands (grep, mv, rm, etc.) on the files and directories you find, which makes find extremely powerful. 

How to load (open and read) an XML file in Scala

Scala FAQ: How do I load an XML file in Scala? (How do I open and read an XML file in Scala?)

I demonstrated this in my earlier Scala XML - Searching XMLNS namespaces, XPath tutorial, but you can load an XML file in Scala like this:

How to search multiple jar files for a string or pattern

Here's a shell script that I use that search Java jar files for any type of pattern. You can use it to search for the name of a class, the name of a package, or any other string/pattern that will show up if you manually ran jar tvf on each jar file. The advantage of this script -- if you're a Unix, Linux, or Cygwin user -- is that this script will search through all jar files in the current directory.

Tell Git not to track a file any more (remove from repo)

Git rm FAQ: How do I tell Git not to track a file (or files) any more? That is, I want to remove the file from the Git repo?

While working on an application named "sarah" yesterday (named for the house known as "SARAH" in the tv series Eureka), I accidentally checked some files into Git that I didn't mean to. These were were primarily binary files in my project's "bin" and "target" directories.

Scala - How to open and read files in Scala

Scala file FAQ: How do I open and read files in Scala?

When you're writing Scala scripts, you often want to read text files. Fortunately it's pretty easy to open and read from a file in Scala. You can just use an approach like this:

MySQL backup - How to backup a MySQL database

MySQL backup FAQ: How do I back up a MySQL database?

I can't speak about backing up MySQL databases that are modified twenty-four hours a day seven days a week, but on all the MySQL databases I currently work with, there are always times when I can guarantee that there won't be any SQL INSERTs, DELETEs, or UPDATEs occurring, so I find it's really easy to perform a MySQL backup using the mysqldump utility program. Here's how it works.

MySQL restore - How to restore a MySQL database from a backup

MySQL database FAQ: How do I restore a MySQL backup? (Also written as, "How do I restore a MySQL database dump?")

Use sed to edit files in place (and make a backup copy)

Yesterday I ran into a situation where I had to edit 250,000 files, and of course I instantly thought of the Unix/Linux sed command. I knew what edit commands I wanted to run (simple swap/replace commands), but my bigger problem was how to edit the files in place.

A quick look at the sed man page showed that I needed to use the -i argument of the sed command:

Syndicate content