Last week I wrote an Apache access log parser library in Scala to help me analyze my Apache HTTP access log file records using Apache Spark. The source code for that project is hosted here on Github. You can use this library to parse Apache access log “combined” records using Scala, Java, and other JVM-based programming languages.
Generating a list of URLs from Apache access log files, sorted by hit count, using Apache Spark (and Scala)
I don’t want to make my original Parsing Apache access log records with Spark and Scala article any longer, so I’m putting some new, better code here.
Assuming that you read that article, I’ll jump right in and say that I use this code to load my data into the Spark REPL:
I want to analyze some Apache access log files for this website, and since those log files contain hundreds of millions (billions?) of lines, I thought I’d roll up my sleeves and dig into Apache Spark to see how it works, and how well it works. I used Hadoop several years ago, and as a quick summary, I found the transition to be easy. Here are my notes.
There was a little funky activity on a client's Drupal 6 website that was hosted at GoDaddy, and without having access to an Apache access log file, I wanted to be able to see what was going on. So I wrote the following PHP code snippet to do some manual logging, and placed it in the Drupal theme's
Perl Apache log file FAQ: Can you demonstrate how to read an Apache access log file in Perl (How to parse an Apache access log file in Perl)?
I've provided Perl examples before that can be used to read and parse an Apache log file ("How many RSS feed readers do I have?", "A Perl program to read an Apache access log file"), but to make this code a little easier to find, I'm breaking that code out here.