apache

My Scala Apache access log parser library

Last week I wrote an Apache access log parser library in Scala to help me analyze my Apache HTTP access log file records using Apache Spark. The source code for that project is hosted here on Github. You can use this library to parse Apache access log “combined” records using Scala, Java, and other JVM-based programming languages.

Analyzing Apache access logs with Spark and Scala

I want to analyze some Apache access log files for this website, and since those log files contain hundreds of millions (billions?) of lines, I thought I’d roll up my sleeves and dig into Apache Spark to see how it works, and how well it works. I used Hadoop several years ago, and as a quick summary, I found the transition to be easy. Here are my notes.

Manual PHP and Drupal 6 web access logging

There was a little funky activity on a client's Drupal 6 website that was hosted at GoDaddy, and without having access to an Apache access log file, I wanted to be able to see what was going on. So I wrote the following PHP code snippet to do some manual logging, and placed it in the Drupal theme's page.tpl.php file:

Parsing 'real world' HTML with Scala and HTMLCleaner

While XML parsers work great for well-formed XML, out in the 'real world' internet, you can't count on HTML being XHTML, or even being well-formatted. As a result, various 'HTML cleaner' libraries for Java have appeared. They attempt to clean up the HTML so you can parse it.

A Scala REST 'get content' client function using Apache HttpClient

As quick post here today, if you need a Scala REST client function, the following source code should be able to work for you, or at least be a good starting point. I'm using it in several applications today, and the only thing I think it needs is the ability to set a connection timeout and socket timeout, and I share the code for that down below.

Here's my Scala REST 'get content' client function, using the Apache HttpClient library:

Apache NameVirtualHost configuration using MAMP on Mac OS X

Since I can't seem to ever remember this, here are some notes on how to configure a Name Virtual Host (NameVirtualHost) on an Apache web server. In particular, this is from the httpd.conf configuration file that I use with MAMP on one of my Mac OS X development systems.

In short, as I'm developing two different applications, one named "cato" and another named "zenf", these are the important name-based virtual host lines from my Apache configuration file:

Syndicate content