Scala: How to search a directory tree with SimpleFileVisitor and Files.walkFileTree alvin December 7, 2019 - 2:01pm

As a brief note to self, if you ever want to write some code using Scala that recursively descends through a directory tree, here’s a solution that uses the Java SimpleFileVisitor and Files.walkFileTree method. First, here’s a skeleton class for the SimpleFileVisitor part of the solution:

How to search for multiple regex patterns in many files with `ffx`

I recently created a command I named ffx that lets you search your filesystem for files that contain multiple strings or regular expressions. This post describes and demonstrates its capabilities. (There’s a little video down below if you want to see how it works before reading about it.)

A Scala “file find” utility

I wanted some specific features in a “find” utility, and when I couldn’t figure out how to get them with combinations of find, awk, and other Unix commands, I wrote what I wanted in Scala. Those features are (a) showing matching filenames, (b) showing the line that matches my search pattern, and underlining the pattern in the output, (c) showing the line numbers of the matches, and (d) showing an optional number of lines from the file before and after each match.

Unix/Linux: Find all files that contain multiple strings/patterns

When using Unix or Linux, if you ever need to find all files that contain multiple strings/patterns, — such as finding all Scala files that contain 'try', 'catch', and 'finally' — this find/awk command seems to do the trick:

find . -type f -name *scala -exec awk 'BEGIN {RS=""; FS="\n"} /try/ && /catch/ && /finally/ {print FILENAME}' {} \;

As shown in the image below, all of the matching filenames are printed out. As Monk says, you’ll thank me later. :)

(I should mention that I got part of the solution from this page.)

Update: My File Find utility

For a potentially better solution, see my File Find utility, which lets you search for multiple regex patterns in files.

Five good ways (and two bad ways) to read large text files with Scala

I’m working on a small project to parse large Apache access log files, with the file this week weighing in at 9.2 GB and 33,444,922 lines. So I gave myself 90 minutes to try a few different ways to write a simple “line count” program in Scala. (Not my final goal, but something I could use to measure file-reading speed without applying my algorithm.)

Linux: Recursive file searching with `grep -r` (like grep + find) alvin July 23, 2019 - 7:46pm

Linux grep FAQ: How can I perform a recursive search with the grep command in Linux?

Solution: find + grep

For years I always used variations of the following Linux find and grep commands to recursively search subdirectories for files that match a grep pattern:

find . -type f -exec grep -l 'alvin' {} \;

This command can be read as, “Search all files in all subdirectories of the current directory for the string ‘alvin’, and print the filenames that contain this pattern.” It’s an extremely powerful approach for recursively searching files in all subdirectories that match the pattern I specify.

My Scala Sed project: More features, returning strings

Table of Contents1 - Basic use2 - Using a Map3 - Match expressions4 - Sed limitations5 - My Sed project6 - Bonus: Factories and HOFs

My Scala Sed project is still a work in progress, but I made some progress on a new version this week. My initial need this week was to have Sed return a String rather than printing directly to STDOUT. This change gave me more ability to post-process a file. After that I realized it would really be useful if the custom function I pass to Sed had two more pieces of information available to it:

  • The line number of the string Sed passed to it
  • A Map of key/value pairs the helper function could use while processing the file

Note: In this article “Sed” refers to my project, and “sed” refers to the Unix command-line utility.

Back to top

Basic use

In a “basic use” scenario, this is how I use the new version of Sed in a Scala shell script to change the “layout:” lines in 55 Markdown files whose names are in the files-to-process.txt file:

Scala code to find (and move or remove) duplicate files

My MacBook recently told me I was running out of disk space. I knew that the way I was backing up my iPhone was resulting in me having multiple copies of photos and videos, so I finally decided to fix that problem by getting rid of all of the duplicate copies of those files.

So I wrote a little Scala program to find all the duplicates and move them to another location, where I could check them before deleting them. The short story is that I started with over 28,000 photos and videos, and the code shown below helped me find nearly 5,000 duplicate photos and videos under my ~/Pictures directory that were taking up over 18GB of storage space. (Put another way, deleting those files saved me 18GB of storage.)