files

Scala code to find (and move or remove) duplicate files

My MacBook recently told me I was running out of disk space. I knew that the way I was backing up my iPhone was resulting in me having multiple copies of photos and videos, so I finally decided to fix that problem by getting rid of all of the duplicate copies of those files.

So I wrote a little Scala program to find all the duplicates and move them to another location, where I could check them before deleting them. The short story is that I started with over 28,000 photos and videos, and the code shown below helped me find nearly 5,000 duplicate photos and videos under my ~/Pictures directory that were taking up over 18GB of storage space. (Put another way, deleting those files saved me 18GB of storage.)

Scala: How to list files and directories under a directory

When using Scala, if you ever need to list the subdirectories in a directory, or the files under a directory, I hope this example is helpful:

import java.io.File

object FileTests extends App {

    // list only the folders directly under this directory (does not recurse)
    val folders: Array[File] = (new File("/Users/al"))
        .listFiles
        .filter(_.isDirectory)  //isFile to find files
    folders.foreach(println)

}

If it helps to see it, a longer version of that solution looks like this:

How to show the largest files under a directory on Mac OS X (Unix)

Here’s an example that shows how to find the largest files under a directory on MacOS and Linux/Unix systems.

A du/sort command to show the largest files under a directory on Mac OS X

The Unix/Linux command that worked for me on my MacOS system is this:

$ du -a * | sort -r -n | head -10

du is the disk usage command, and the -a flag says, “Display an entry for each file in a file hierarchy.” Then I use the sort command to sort the du output numerically and in reverse. After that, head -10 shows only the first ten lines of output. In the Music folder on my Mac the command and output look like this:

Linux: Recursive file searching with grep -r (like grep + find)

Linux grep FAQ: How can I perform a recursive search with the grep command in Linux?

Solution: find + grep

For years I always used variations of the following Linux find and grep commands to recursively search subdirectories for files that match a grep pattern:

find . -type f -exec grep -l 'alvin' {} \;

This command can be read as, “Search all files in all subdirectories of the current directory for the string ‘alvin’, and print the filenames that contain this pattern.” It’s an extremely powerful approach for recursively searching files in all subdirectories that match the pattern I specify.

A Linux shell script (and commands) to find large files

I made a mistake in configuring logrotate on a new Linux system, and almost ran into a problem because of that. Fortunately I saw the problem before it became a BIG problem, but as a result, I decided to add a script to my Linux system to check for large files, typically log files that have grown out of control for one reason or another.

Here then is a simple Linux shell script I named LargeFileCheck.sh, which searches the filesystem for files that are larger than 1GB in size:

A Linux shell script to rename files with a counter and copy them

As a brief note today, I was recently looking for all Messages/iMessage files that are stored on my Mac, and I used this shell script to copy all of those files — many of which have the same name — into a directory named tmpdir, giving them all new names during the copy process:

count=1
for i in `cat myfiles`
do
    fname=`basename $i`
    cp $i tmpdir/${count}-${fname}
    count=`expr $count + 1`
done

SBT errors summary plugin

The sbt-errors-summary plugin looks cool. Here’s a summary from its author:

“A simple plugin that makes the error reporter a bit more concise. I find it useful when doing refactoring: I get a lot of compilation errors, and I waste a lot of time switching between files and looking for line numbers in the error message, when I can immediately see what's wrong when looking at the faulty line.”