How to sort an array of Ruby objects by multiple class fields

In a previous tutorial I wrote about how to sort an array of Ruby objects by one field in the object. In today's tutorial I'd like to demonstrate how to sort an array of Ruby objects by multiple attributes (or fields) of the class contained by the array.

Sorting a Ruby array by two or more class attributes may sound difficult, but as a friend of mine likes to say, "It's not too hard once you know how to do it."

Some sample data

Before we get into the array-sorting code, the first thing we'll need is some good raw data for sorting. Here are the contents of a sample CSV file I've been working with for my sorting tests.

"Rubble", "Barney", "Bedrock"
"Rubble", "Betty", "Bedrock"
"Flinstone", "Wilma", "Bedrock"
"Flinstone", "Fred", "Bedrock"
"Simpson", "Homer", "Springfield"
"Simpson", "Bart", "Springfield"
"MadeUp", "Bart", "Springfield"

I added that last line so I could test my sorting algorithm by three fields, which I'll get to shortly. Now let's take a look at some Ruby code.

Assumptions

For all of the following examples, please assume that these records have been loaded into an array named people, and the elements in this array are in the order shown above. For completeness I'll share all of the code later, but for now it's easier to skip all those details.

Also, assume that we have a Ruby class named Person which is used to hold each of those records. My Person class specifically contains the following fields:

last_name
first_name
city

Given those assumptions, let's look at some examples of sorting a Ruby array of objects by multiple fields.

Sort by last_name, then first_name

My first example shows how to sort this array by two attributes (fields) of the Person class: last_name, and then first_name.

If you've never sorted a Ruby array by multiple attributes before, you may be thinking that it's very hard, but thanks to the sort_by method of the Enumerable module, it's not hard at all.

Here's the code needed to sort this array of Person objects by last_name, and then by first_name:

people = people.sort_by { |a| [ a.last_name, a.first_name ] }

As you can see, all you have to do is supply the sort_by method a block which tells it how to perform the sort. After performing this sort operation, when I print the people array, like this:

people.each { |p|
  p.print_csv_record
}

I get the sorted output, which looks like this:

"Flinstone","Fred","Bedrock"
"Flinstone","Wilma","Bedrock"
"MadeUp","Bart","Springfield"
"Rubble","Barney","Bedrock"
"Rubble","Betty","Bedrock"
"Simpson","Bart","Springfield"
"Simpson","Homer","Springfield"

Again, that sort was by last name, and then first name.

Sort by first_name, then last_name

As a second example, let's sort the same data, this time by first_name, and then by last_name. Here's the code for this multiple-attribute array sorting algorithm:

people = people.sort_by { |a| [ a.first_name, a.last_name ] }

And here's the sorted output from this algorithm:

"Rubble","Barney","Bedrock"
"MadeUp","Bart","Springfield"
"Simpson","Bart","Springfield"
"Rubble","Betty","Bedrock"
"Flinstone","Fred","Bedrock"
"Simpson","Homer","Springfield"
"Flinstone","Wilma","Bedrock"

Again, that data is sorted by first name, and then last name. Note specifically that "Bart MadeUp" appears in the sorted output before "Bart Simpson". They both have the same first name, but the MadeUp last name comes before Simpson, which is correct.

Three-attribute sorting: Sort by city, last_name, first_name

As a final example, here is the code to sort my array of Ruby Person objects by three attributes: city, then last_name, and then first_name:

people = people.sort_by { |a| [ a.city, a.last_name, a.first_name ] }

By now you're used to the syntax, and this is no big deal.

Here's the output from this sorting code, showing Bedrock before Springfield, then "Bart MadeUp" appearing before "Bart Simpson", which again is correct:

"Flinstone","Fred","Bedrock"
"Flinstone","Wilma","Bedrock"
"Rubble","Barney","Bedrock"
"Rubble","Betty","Bedrock"
"MadeUp","Bart","Springfield"
"Simpson","Bart","Springfield"
"Simpson","Homer","Springfield"

Comments

I have to tell you, I went into this research not knowing what to expect, but it turned out to be pretty easy, didn't it? I never cease to be amazed by the simplicity of the Ruby programming language.

The complete Ruby array-sorting source code

If you'd like to run some tests for yourself, here is the complete Ruby source code that I used to perform these tests. You can use this source code, and the sample data shown earlier, to run your own tests.

# Program: SortCsvMultipleFields.rb
# Usage:   ruby SortCsvMultipleFields.rb InputFilename > OutputFilename
# Author:  alvin alexander, devdaily.com

# define a "Person" class to represent the columns in the CSV file
class Person < 
  Struct.new(:first_name, :last_name, :city)

  # a method to print out a csv record for the current Person.
  # note that you can easily re-arrange columns here, if desired.
  def print_csv_record
    last_name.length==0 ? printf(",") : printf("\"%s\",", last_name)
    first_name.length==0 ? printf(",") : printf("\"%s\",", first_name)
    city.length==0 ? printf("") : printf("\"%s\"", city)
    printf("\n")
  end
end

#------#
# MAIN #
#------#

# bail out unless we get the right number of command line arguments
unless ARGV.length == 1
  puts "Dude, not the right number of arguments."
  puts "Usage: ruby SortCsvMultipleFields.rb InputFile.csv > SortedOutputFile.csv\n"
  exit
end

# get the input filename from the command line
input_file = ARGV[0]
people = Array.new

# loop through each record in the csv file, adding
# each record to our array.
f = File.open(input_file, "r")
f.each_line { |line|
  words = line.split(',')
  p = Person.new
  p.last_name = words[0].tr_s('"', '').strip
  p.first_name = words[1].tr_s('"', '').strip
  p.city = words[2].tr_s('"', '').strip
  people.push(p)
}

# (1) sort by last_name, then first_name
people = people.sort_by { |a| [ a.last_name, a.first_name ] }

# (2) sort by first_name, then last_name
#people = people.sort_by { |a| [ a.first_name, a.last_name ] }

# (3a) sort by city, last_name, then first_name
#people = people.sort_by { |a| [ a.city, a.last_name, a.first_name ] }

# (3b) alternate syntax for sorting:
#people = people.sort_by { |a| [ a[:city], a[:last_name], a[:first_name]] }

# print out all the sorted records
people.each { |p|
  p.print_csv_record
}