apache

How to access HTTP response headers after making an HTTP request with Apache HttpClient

This is an excerpt from the Scala Cookbook (partially modified for the internet). This is a very short recipe, Recipe 15.12, “How to access HTTP response headers after making an HTTP request with Apache HttpClient.”

Problem

You need to access the HTTP response headers after making an HTTP request in your Scala code.

Solution

Use the Apache HttpClient library, and get the headers from the HttpResponse object after making a request:

Drupal mobile website/multisite installation and robots.txt

As I’m working on getting a mobile version of this site working, I ran into a problem with having a robots.txt file on a Drupal multisite installation. The root of the problem is that you need to have a robots.txt file like this on your mobile site:

User-agent: *
Disallow: /

That’s to keep the search engines from scanning and storing that content, which will be a duplicate of your main website.

Open source on the internet alvin March 15, 2014 - 9:57am

A nice graphic on the use of open source tools on the internet.

My Scala Apache access log parser library alvin March 12, 2014 - 12:53pm

Last week I wrote an Apache access log parser library in Scala to help me analyze my Apache HTTP access log file records using Apache Spark. The source code for that project is hosted here on Github. You can use this library to parse Apache access log “combined” records using Scala, Java, and other JVM-based programming languages.

Analyzing Apache access logs with Spark and Scala (a tutorial)

I want to analyze some Apache access log files for this website, and since those log files contain hundreds of millions (billions?) of lines, I thought I’d roll up my sleeves and dig into Apache Spark to see how it works, and how well it works. I used Hadoop several years ago, and as a quick summary, I found the transition to be easy. Here are my notes.

Manual PHP and Drupal 6 web access logging

There was a little funky activity on a client's Drupal 6 website that was hosted at GoDaddy, and without having access to an Apache access log file, I wanted to be able to see what was going on. So I wrote the following PHP code snippet to do some manual logging, and placed it in the Drupal theme's page.tpl.php file:

Parsing 'real world' HTML with Scala and HTMLCleaner alvin February 3, 2012 - 8:13am

While XML parsers work great for well-formed XML, out in the 'real world' internet, you can't count on HTML being XHTML, or even being well-formatted. As a result, various 'HTML cleaner' libraries for Java have appeared. They attempt to clean up the HTML so you can parse it.

A Scala REST 'get content' client function using Apache HttpClient

As quick post here today, if you need a Scala REST client function, the following source code should be able to work for you, or at least be a good starting point. I’ve been using it in several applications today, and the only thing I think it needs is the ability to set a connection timeout and socket timeout, and I share the code for that down below.