Here's the complete source code for a simple Ruby script which performs the following tasks:
- Defines a simple Ruby
Person
class with the help of a Struct. - Opens an input CSV file.
- Reads each record from the CSV file.
- Creates a
Person
object to represent that record. - Adds the new
Person
object to an array ofPerson
's. - Sorts the array by the
last_name
field. - Prints the array back out (in sorted order) in CSV format.
This is actually a simplified version of a more complicated Ruby program I just wrote for a similar purpose.
I hope the program is self-documenting, so I'll just share the source code here without any further discussion. One thing to note though: This program expects to read a CSV file that has three columns that contain the last name, first name, and city of each person. To that end, I'm also sharing a simple example input file below.
#!/usr/bin/ruby # program: SortCsvFile.rb # usage: ruby SortCsvFile.rb InputFilename > OutputFilename # define a "Person" class to represent the three expected columns class Person < # a Person has a first name, last name, and city Struct.new(:first_name, :last_name, :city) # a method to print out a csv record for the current Person. # note that you can easily re-arrange columns here, if desired. # also note that this method compensates for blank fields. def print_csv_record last_name.length==0 ? printf(",") : printf("\"%s\",", last_name) first_name.length==0 ? printf(",") : printf("\"%s\",", first_name) city.length==0 ? printf("") : printf("\"%s\"", city) printf("\n") end end #------# # MAIN # #------# # bail out unless we get the right number of command line arguments unless ARGV.length == 1 puts "Dude, not the right number of arguments." puts "Usage: ruby SortCsvFile.rb InputFile.csv > SortedOutputFile.csv\n" exit end # get the input filename from the command line input_file = ARGV[0] # define an array to hold the Person records arr = Array.new # loop through each record in the csv file, adding # each record to our array. f = File.open(input_file, "r") f.each_line { |line| words = line.split(',') p = Person.new # do a little work here to get rid of double-quotes and blanks p.last_name = words[0].tr_s('"', '').strip p.first_name = words[1].tr_s('"', '').strip p.city = words[2].tr_s('"', '').strip arr.push(p) } # sort the data by the last_name field arr.sort! { |a,b| a.last_name.downcase <=> b.last_name.downcase } # print out all the sorted records (just print to stdout) arr.each { |p| p.print_csv_record }
Sample CSV input file
Here's the sample CSV input file I used to test this program. Note that the first column has the last name for each person, and they are not in sorted order.
"Rubble", "Barney", "Bedrock" "Rubble", "Betty", "Bedrock" "Flinstone", "Wilma", "Bedrock" "Flinstone", "Fred", "Bedrock" "Simpson", "Homer", "Springfield" "Simpson", "Bart", "Springfield"
When I run my Ruby program, like this
./SortCsvFile.rb people.csv
or like this
ruby SortCsvFile.rb people.csv
the sorted output looks like this:
"Flinstone","Fred","Bedrock" "Flinstone","Wilma","Bedrock" "Rubble","Barney","Bedrock" "Rubble","Betty","Bedrock" "Simpson","Homer","Springfield" "Simpson","Bart","Springfield"
Note that the data is just sorted by the first column. I'll demonstrate how to sort an array (and a CSV file) by multiple columns in a future tutorial.
The Ruby ternary operator
Thanks to a note from a recent reader, I'm reminded that this tutorial demonstrates the use of the Ruby ternary operator. If you're not used to that syntax, please see my Ruby ternary operator tutorial and examples.