How to open and sort a CSV file with Ruby

Here's the complete source code for a simple Ruby script which performs the following tasks:

  1. Defines a simple Ruby Person class with the help of a Struct.
  2. Opens an input CSV file.
  3. Reads each record from the CSV file.
  4. Creates a Person object to represent that record.
  5. Adds the new Person object to an array of Person's.
  6. Sorts the array by the last_name field.
  7. Prints the array back out (in sorted order) in CSV format.

This is actually a simplified version of a more complicated Ruby program I just wrote for a similar purpose.

I hope the program is self-documenting, so I'll just share the source code here without any further discussion. One thing to note though: This program expects to read a CSV file that has three columns that contain the last name, first name, and city of each person. To that end, I'm also sharing a simple example input file below.

#!/usr/bin/ruby

# program: SortCsvFile.rb
# usage:   ruby SortCsvFile.rb InputFilename > OutputFilename

# define a "Person" class to represent the three expected columns
class Person <

  # a Person has a first name, last name, and city
  Struct.new(:first_name, :last_name, :city)

  # a method to print out a csv record for the current Person.
  # note that you can easily re-arrange columns here, if desired.
  # also note that this method compensates for blank fields.
  def print_csv_record
    last_name.length==0 ? printf(",") : printf("\"%s\",", last_name)
    first_name.length==0 ? printf(",") : printf("\"%s\",", first_name)
    city.length==0 ? printf("") : printf("\"%s\"", city)
    printf("\n")
  end
end

#------#
# MAIN #
#------#

# bail out unless we get the right number of command line arguments
unless ARGV.length == 1
  puts "Dude, not the right number of arguments."
  puts "Usage: ruby SortCsvFile.rb InputFile.csv > SortedOutputFile.csv\n"
  exit
end

# get the input filename from the command line
input_file = ARGV[0]

# define an array to hold the Person records
arr = Array.new

# loop through each record in the csv file, adding
# each record to our array.
f = File.open(input_file, "r")
f.each_line { |line|
  words = line.split(',')
  p = Person.new
  # do a little work here to get rid of double-quotes and blanks
  p.last_name = words[0].tr_s('"', '').strip
  p.first_name = words[1].tr_s('"', '').strip
  p.city = words[2].tr_s('"', '').strip
  arr.push(p)
}

# sort the data by the last_name field
arr.sort! { |a,b| a.last_name.downcase <=> b.last_name.downcase }

# print out all the sorted records (just print to stdout)
arr.each { |p|
  p.print_csv_record
}

Sample CSV input file

Here's the sample CSV input file I used to test this program. Note that the first column has the last name for each person, and they are not in sorted order.

"Rubble", "Barney", "Bedrock"
"Rubble", "Betty", "Bedrock"
"Flinstone", "Wilma", "Bedrock"
"Flinstone", "Fred", "Bedrock"
"Simpson", "Homer", "Springfield"
"Simpson", "Bart", "Springfield"

When I run my Ruby program, like this

./SortCsvFile.rb people.csv

or like this

ruby SortCsvFile.rb people.csv

the sorted output looks like this:

"Flinstone","Fred","Bedrock"
"Flinstone","Wilma","Bedrock"
"Rubble","Barney","Bedrock"
"Rubble","Betty","Bedrock"
"Simpson","Homer","Springfield"
"Simpson","Bart","Springfield"

Note that the data is just sorted by the first column. I'll demonstrate how to sort an array (and a CSV file) by multiple columns in a future tutorial.

The Ruby ternary operator

Thanks to a note from a recent reader, I'm reminded that this tutorial demonstrates the use of the Ruby ternary operator. If you're not used to that syntax, please see my Ruby ternary operator tutorial and examples.