Scala code to find (and move or remove) duplicate files

My MacBook recently told me I was running out of disk space. I knew that the way I was backing up my iPhone was resulting in me having multiple copies of photos and videos, so I finally decided to fix that problem by getting rid of all of the duplicate copies of those files.

So I wrote a little Scala program to find all the duplicates and move them to another location, where I could check them before deleting them. The short story is that I started with over 28,000 photos and videos, and the code shown below helped me find nearly 5,000 duplicate photos and videos under my ~/Pictures directory that were taking up over 18GB of storage space. (Put another way, deleting those files saved me 18GB of storage.)

How to read and write binary files in Scala

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is Recipe 12.3, “How to read and write binary files in Scala.”


You want to read data from a binary file or write data to a binary file.


Scala doesn’t offer any special conveniences for reading or writing binary files, so use the Java FileInputStream and FileOutputStream classes.

How to copy a file in Java alvin February 2, 2010 - 7:10am

The easiest way to copy a file in Java is to download the Apache Commons IO library; just download their library, then use the methods of their FileUtils class to copy a file.

However, if you're just as interested in the technical details of how to copy a file in Java, or just want a method to copy a file in Java, the method below, taken from my Java file utilities class, shows how this actually works: