A Perl script to parse a CSV file, skip lines, pattern match

This is a little Perl script I wrote to parse a CSV file I periodically download from Google AdSense. It does the following things:

  • Opens the CSV file
  • Skips through a bunch of lines until it gets to the first line I’m interested in
  • In the rest of the file its skips other lines that I”m not interested in
  • In both of those cases it uses pattern matching to compare the current line in the file to the desired pattern
  • For the lines it keeps, I extract fields from that line using split, then print those fields as CSV fields

Given that brief introduction, here is a Perl script that I use to process Google AdSense CSV files:

#!/usr/bin/perl

# A program to parse a Google AdSense CSV file, and convert it to
# a better format, with all of the undesirable lines removed
#
# Usage: ./parse.pl GoogleCsvFile.csv > BetterGoogleCsvFile.csv

$numArgs = $#ARGV + 1;
die if ($numArgs != 1);

$file = $ARGV[$argnum];
open (F, $file) || die ("Could not open $file!");

$do_skip_test = 1;
while ($line = <F>)
{
    if ($do_skip_test)
    {
        if ($line =~ /^Page/)
        {
            # got to the starting point, don't skip lines any more
            $do_skip_test = 0;
        }
        else
        {
            next;
        }
    }

    # skip these lines, i don't want/need them
    next if $line =~ /^#/;
    next if $line =~ /search/;
    next if $line =~ /\/node/;
    next if $line =~ /jwarehouse/;

    ($uri, $rev, $clicked, $imps, $ctr, $ecpm) = split ',', $line;
    chomp($ecpm);
    next if ($clicked < 2);
    next if ($imps < 50);

    print "$uri, $rev, $clicked, $imps, $ctr, $ecpm\n";
}

close (F);

Summary

In summary, if you are looking for a Perl script to process Google AdSense CSV files, CSV files in general, how to loop over every line in a file in a Perl script, how to skip lines using pattern matching, or how to convert a line of text into CSV fields, I hope this example is helpful.

Add new comment

The content of this field is kept private and will not be shown publicly.

Anonymous format

  • Allowed HTML tags: <em> <strong> <cite> <code> <ul type> <ol start type> <li> <pre>
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.