Perl FAQ: How can I split a string in Perl, such as the strings in a pipe-delimited text file?
Many times you need a Perl script that can open a plain text file, and essentially treat that file as a database. Typically these files have variable-length fields and records, and the fields in each record are delimited by some special character, usually a : or | character. When processing these files, you can use the Perl split function, which I’ll demonstrate in two short programs here.
Perl split string - example #1
In this first “Perl split” example program, I’ll read all of the fields in each record into an array named @fields, and then I’ll show how to print out the first field from each row. This example shows several things, including how to split a record by the : character, which is the column delimiter in the Linux /etc/passwd file.
#!/usr/bin/perl
# perl split function example 1
# purpose: read the /etc/passwd file, whose columns are separated by ':'
# usage: perl read-passwd-file.pl
# sample /etc/passwd record:
# nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
$filename = '/etc/passwd';
open(FILE, $filename) or die "Could not read from $filename, program halting.";
while(<FILE>)
{
# get rid of the pesky newline character
chomp;
# read the fields in the current record into an array
@fields = split(':', $_);
# print the first field (the username)
print "$fields[0]\n";
}
close FILE;
As you can see from that code, each field on each line is split by the : character, and I read each line into the @fields array, and then print the first field from each line with the $fields[0] variable.
Perl split string - example #2
This second Perl split example that shows how to process a text file with variable-length, delimited fields is almost identical to the first program. The only difference is the way I treat each line when I read it. Instead of reading each line into a Perl array, I treat it as a fixed set of variables. Because I know the Linux /etc/passwd file has exactly seven fields I can use this approach.
The format of the /etc/passwd file is well-known, so hopefully the variable names I use here will make sense to you. The “junk” variables represent fields that I don’t care about.
#!/usr/bin/perl
# perl split function example #2
# purpose: read the /etc/passwd file, whose columns are separated by ':'
# usage: perl read-passwd-file.pl
# sample /etc/passwd record:
# nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
$filename = '/etc/passwd';
open(FILE, $filename) or die "Could not read from $filename, program halting.";
while(<FILE>)
{
# get rid of the pesky newline character
chomp;
# read the fields in the current record as separate variables
($username,$junk1,$junk2,$junk3,$description,$home,$shell) = split(':', $_);
# print the interesting fields
print "$username, $description, $home, $shell\n";
}
close FILE;
The Perl split function delimiter character
As you can see from the Perl split function examples above, I split each record by using the : character as the field delimiter. This is what the /etc/passwd file uses as its delimiter, so my program also uses it. As mentioned, I’ve seen other file formats use the | character as a delimiter, and of course CSV files use the “,” character, and any of those characters can be specified with with split function; just replace the : shown above with the split character (delimiter) you need to use in your code.

