Perl and Apache - How to parse Apache access log file records in Perl

Perl Apache log file FAQ: Can you demonstrate how to read an Apache access log file in Perl (How to parse an Apache access log file in Perl)?

I've provided Perl examples before that can be used to read and parse an Apache log file ("How many RSS feed readers do I have?", "A Perl program to read an Apache access log file"), but to make this code a little easier to find, I'm breaking that code out here.

A Perl Apache access log script - Parse Apache log file record into fields

So, assuming you already know how to open an Apache access log, and read the lines from it, but don't know how to parse each line into variables, these lines of Perl code will do the trick for you:

# assume we're processing a line from an apache access log file ...

# give the line a perl chomp
chomp;

# condense one or more whitespace character to one single space
s/\s+/ /go;

#  break each apache access_log record into nine variables
($clientAddress,    $rfc1413,      $username, 
$localTime,         $httpRequest,  $statusCode, 
$bytesSentToClient, $referer,      $clientSoftware) =
/^(\S+) (\S+) (\S+) \[(.+)\] \"(.+)\" (\S+) (\S+) \"(.*)\" \"(.*)\"/o;

At this point the Perl variables shown on the left side of the equal sign will now contain the fields from the current Apache access log record.

If you want to further break down the httpRequest variable into its components, you can also run this line of code next:

# determine the value of $uri
($getPost, $uri, $junk) = split(' ', $httpRequest, 3);

Given these snippets of Perl code, you can now dig through your Apache access log file data to perform whatever Apache log file analytics you want.

Perl Apache log file reader - Create a subroutine

Of course what really should be done here is to turn these lines of code into a Perl subroutine so they can easily be re-used in other programs ... I'll leave that as an exercise for the reader, with the hint that a Perl subroutine can actually return more than one variable, which is a very cool feature, not found in most other programming languages.