Perl FAQ: How can I split a string in Perl, such as the strings in a pipe-delimited text file?
Many times you need a Perl script that can open a plain text file, and essentially treat that file as a database. Typically these files have variable-length fields and records, and the fields in each record are delimited by some special character, usually a :
or |
character. When processing these files, you can use the Perl split
function, which I’ll demonstrate in two short programs here.
Perl split string - example #1
In this first “Perl split” example program, I’ll read all of the fields in each record into an array named @fields
, and then I’ll show how to print out the first field from each row. This example shows several things, including how to split a record by the :
character, which is the column delimiter in the Linux /etc/passwd file.
#!/usr/bin/perl # perl split function example 1 # purpose: read the /etc/passwd file, whose columns are separated by ':' # usage: perl read-passwd-file.pl # sample /etc/passwd record: # nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false $filename = '/etc/passwd'; open(FILE, $filename) or die "Could not read from $filename, program halting."; while(<FILE>) { # get rid of the pesky newline character chomp; # read the fields in the current record into an array @fields = split(':', $_); # print the first field (the username) print "$fields[0]\n"; } close FILE;
As you can see from that code, each field on each line is split by the :
character, and I read each line into the @fields
array, and then print the first field from each line with the $fields[0]
variable.
Perl split string - example #2
This second Perl split
example that shows how to process a text file with variable-length, delimited fields is almost identical to the first program. The only difference is the way I treat each line when I read it. Instead of reading each line into a Perl array, I treat it as a fixed set of variables. Because I know the Linux /etc/passwd file has exactly seven fields I can use this approach.
The format of the /etc/passwd
file is well-known, so hopefully the variable names I use here will make sense to you. The “junk” variables represent fields that I don’t care about.
#!/usr/bin/perl # perl split function example #2 # purpose: read the /etc/passwd file, whose columns are separated by ':' # usage: perl read-passwd-file.pl # sample /etc/passwd record: # nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false $filename = '/etc/passwd'; open(FILE, $filename) or die "Could not read from $filename, program halting."; while(<FILE>) { # get rid of the pesky newline character chomp; # read the fields in the current record as separate variables ($username,$junk1,$junk2,$junk3,$description,$home,$shell) = split(':', $_); # print the interesting fields print "$username, $description, $home, $shell\n"; } close FILE;
The Perl split function delimiter character
As you can see from the Perl split
function examples above, I split each record by using the :
character as the field delimiter. This is what the /etc/passwd file uses as its delimiter, so my program also uses it. As mentioned, I’ve seen other file formats use the |
character as a delimiter, and of course CSV files use the “,” character, and any of those characters can be specified with with split
function; just replace the :
shown above with the split character (delimiter) you need to use in your code.