Scala code to find (and move or remove) duplicate files

My MacBook recently told me I was running out of disk space. I knew that the way I was backing up my iPhone was resulting in me having multiple copies of photos and videos, so I finally decided to fix that problem by getting rid of all of the duplicate copies of those files.

So I wrote a little Scala program to find all the duplicates and move them to another location, where I could check them before deleting them. The short story is that I started with over 28,000 photos and videos, and the code shown below helped me find nearly 5,000 duplicate photos and videos under my ~/Pictures directory that were taking up over 18GB of storage space. (Put another way, deleting those files saved me 18GB of storage.)

The Kotlin forEach println syntax

It’s a little hard to move back and forth between Scala and Kotlin because of some of the differences between the languages. Skipping the long story, here’s an example of how to print every line in a list of strings in Kotlin using forEach and println. First the setup:

fun readFile(filename: String): List<String> = File(filename).readLines()
val lines = readFile("/etc/passwd")

Then here are two different ways to use forEach with println:

A big collection of Unix/Linux ‘find’ command examples

Linux/Unix FAQ: Can you share some Linux find command examples?

Sure. The Unix/Linux find command is very powerful. It can search the entire filesystem to find files and directories according to the search criteria you specify. Besides using the find command to locate files, you can also execute other Linux commands (grep, mv, rm, etc.) on the files and directories you find, which makes find extremely powerful. 

A large collection of Unix/Linux ‘grep’ command examples

Linux grep commands FAQ: Can you share some Linux/Unix grep command examples?

Sure. The name grep means "general regular expression parser", but you can think of the grep command as a "search" command for Unix and Linux systems: it's used to search for text strings and more-complicated "regular expressions" within one or more files.

I think it's easiest to learn how to use the grep command by showing examples, so let's dive right in.

How to use ‘awk’ to print columns from a text file (in any order) alvin March 26, 2018 - 4:03pm

One of my favorite ways to use the Unix awk command is to print columns of information from text files, including printing columns in a different order than they are in in the text file. Here are some examples of how awk works in this use case.

My “Scala Flat File Database” project now handles newline characters alvin January 15, 2018 - 9:26pm

I updated my Scala Flat File Database project so it now handles newline (\n) characters. The solution isn’t perfect, but it’s a start, and makes the approach much more usable. (I didn’t need this functionality until today, so I didn’t know it was a problem.) I also updated it to work with Scala 2.12.