A software “Code Bloat” script (lines of source code per file)

I just found this great little “Code Bloat” script on Ward Cunningham's Smallest Federated Wiki website:

wc -l `find . | perl -ne 'next if /jquery/; print if /\.(rb|haml|coffee)$/'`

If you’re familiar with those Linux commands (wc, find) and Perl, you can tell that the intent of the command is to find the number of lines of source code per file, for all files beneath the current subdirectory.

Adding sorting

An improvement to that command might be to sort the results, so a quick little re-write (without using Perl) for a Java project might look like this if you want the largest files shown at the bottom of the list:

wc -l `find . -type f | egrep '.java|.xml'` | sort -n

Or it could look like this if you want the largest files shown at the top of the list:

wc -l `find . -type f | egrep '.java|.xml'` | sort -nr

Changing this to work with PHP, some sample output from this command on a PHP project I'm currently working on looks like this:

    1249 total
     226 ./logfiles/php-tests/mk-sitemap.php
     171 ./drupaldata/backups/dump-drupal-nodes.php
     170 ./logfiles/php-tests/mk-new-links.php
     148 ./logfiles/2-insert-log-records.php
     118 ./logfiles/php-tests/5.php
     101 ./drupaldata/insert-node-records.php
      33 ./logfiles/php-tests/3.php

     (more here ...)

Of course you can also add the head command to this, or a little extra code to just print files that have more than X lines of code in them (500 lines, 1,000 lines, etc.).

Code bloat metric

Either way, it's an interesting way to think about software code: if a source code file gets too large, you might want to take a good, hard look at it.