I just found this great little “Code Bloat” script on Ward Cunningham's Smallest Federated Wiki website:
wc -l `find . | perl -ne 'next if /jquery/; print if /\.(rb|haml|coffee)$/'`
If you’re familiar with those Linux commands (wc
, find
) and Perl, you can tell that the intent of the command is to find the number of lines of source code per file, for all files beneath the current subdirectory.
Adding sorting
An improvement to that command might be to sort the results, so a quick little re-write (without using Perl) for a Java project might look like this if you want the largest files shown at the bottom of the list:
wc -l `find . -type f | egrep '.java|.xml'` | sort -n
Or it could look like this if you want the largest files shown at the top of the list:
wc -l `find . -type f | egrep '.java|.xml'` | sort -nr
Changing this to work with PHP, some sample output from this command on a PHP project I'm currently working on looks like this:
1249 total 226 ./logfiles/php-tests/mk-sitemap.php 171 ./drupaldata/backups/dump-drupal-nodes.php 170 ./logfiles/php-tests/mk-new-links.php 148 ./logfiles/2-insert-log-records.php 118 ./logfiles/php-tests/5.php 101 ./drupaldata/insert-node-records.php 33 ./logfiles/php-tests/3.php (more here ...)
Of course you can also add the head command to this, or a little extra code to just print files that have more than X lines of code in them (500 lines, 1,000 lines, etc.).
Code bloat metric
Either way, it's an interesting way to think about software code: if a source code file gets too large, you might want to take a good, hard look at it.