Perl split function - how to read a CSV file

Perl FAQ: How do I read a CSV file in Perl?

Problem: You have a file that contains columns of data, and each column is separated by a comma (a CSV file). The width of each column can vary, but the column data is guaranteed not to contain any columns itself, so it's safe to think of the comma as truly being a column separator. So, how do you read the data?

Solution: The Perl split function

A comma-separated data file looks like this:

1, 2, 3, 4
22, 3.5,1777, 90120
1, 22, 333, 4444

You can easily split the data into four fields using the Perl split function. The following example shows how this works.

A split function example

Here’s a simple Perl program that can be used to read a file named input.csv and split the strings that are read into different field (so this is also a Perl split string example):

$file = 'input.csv';

open (F, $file) || die ("Could not open $file!");

while ($line = <F>)
{
  ($field1,$field2,$field3,$field4) = split ',', $line;
  print "$field1 : $field2 : $field3 : $field4";
}

close (F);

Perl split function - discussion

In our Perl program, the line that looks like this:

($field1,$field2,$field3,$field4) = split ',', $line;

does the "Perl split string" work, splitting each line of data (each string) into four separate fields. Simply stated, this line says "Using the Perl split function, take the current line of information, represented by $line, and split it into four fields, where each field is separated by a comma."

Once you have the four fields of information as shown above, you can do whatever else you want to do with the data. In this case I'm just printing out the data, but obviously you'll want to do something else with your data.

A warning about CSV files

As I note in one of my Scala books, you have to be careful with CSV files because some people have a loose definition of what “CSV” means to them. For instance, some people think this is a valid line in a CSV file:

foo, "bar, baz", bax

This is discussed further in this Wikipedia entry. Files like this will require a more complex algorithm than what I show in this article.

Comments

In a larger CSV file, you'll want to use a Perl array to receive the columns of data you're going to get when you use the Perl split function. Here's an example of how to use a Perl array in this situation:

my @columns = split(',', $line);

foreach $field (@columns)
{
  print "$field\n";
}
Permalink

If there are commas in your data a simple split like above will not work.