One million Scala developers

Yesterday I just churned the numbers from the surveys, but last night I started thinking how cool it is that there are one million Scala developers in the world.

I remember when I was wandering around Alaska in 2011 and first stumbled upon Programming in Scala, I found that very few people knew about Scala, maybe numbering in the thousands or tens of thousands at most. I hope Martin Odersky & Company are having a little celebration this year for their success. (And on to two million!)

Scala code to find (and move or remove) duplicate files

My MacBook recently told me I was running out of disk space. I knew that the way I was backing up my iPhone was resulting in me having multiple copies of photos and videos, so I finally decided to fix that problem by getting rid of all of the duplicate copies of those files.

So I wrote a little Scala program to find all the duplicates and move them to another location, where I could check them before deleting them. The short story is that I started with over 28,000 photos and videos, and the code shown below helped me find nearly 5,000 duplicate photos and videos under my ~/Pictures directory that were taking up over 18GB of storage space. (Put another way, deleting those files saved me 18GB of storage.)

How to convert an array of bytes to a hex string in Scala

If you need to convert an array of bytes to a hex string in Scala, I can confirm that this code works:

def convertBytesToHex(bytes: Seq[Byte]): String = {
    val sb = new StringBuilder
    for (b <- bytes) {

I just used this code as part of a checksum algorithm (SHA-1, SHA-256, etc.), and I tested it against command line checksum commands to verify that it works properly.

Scala version of Collective Intelligence Euclidean distance algorithm

While reading the excellent book, Programming Collective Intelligence recently, I decided to code up the first algorithm in the book using Scala instead of Python (which the book uses). This is a Euclidean distance algorithm, and it provides one way to compare two sets of data to each other, and attempts to score the similarity between the data sets.

Without any further introduction (and assuming you have the Collective Intelligence book), here's the Scala source code for the Euclidean distance algorithm as described in the book: