Ignoring Case When Sorting Strings
Problem
When you sort a list of strings, the strings beginning with uppercase letters sort before the strings beginning with lowercase letters.
list = ["Albania", "anteater", "zorilla", "Zaire"] list.sort # => ["Albania", "Zaire", "anteater", "zorilla"]
You want an alphabetical sort, regardless of case.
Solution
Use Array#sort_by. This is both the fastest and the shortest solution.
list.sort_by { |x| x.downcase } # => ["Albania", "anteater", "Zaire", "zorilla"]
Discussion
The Array#sort_by method was introduced in Recipe 4.5, but it's worth discussing in detail because it's so useful. It uses a technique called a Schwartzian Transform. This common technique is like writing the following Ruby code (but it's a lot faster, because it's implemented in C):
list.collect { |s| [s.downcase, s] }. sort.collect { |subarray| subarray[1] }
It works like this: Ruby creates a new array containing two-element subarrays. Each subarray contains a value of String#downcase, along with the original string. This new array is sorted, and then the original strings (now sorted by their values for String#downcase) are recovered from the subarrays. String#downcase is called only once for each string.
A sort is the most common occurance of this pattern, but it shows up whenever an algorithm calls a particular method on the same objects over and over again. If you're not sorting, you can't use Ruby's internal Schwartzian Transform, but you can save time by caching, or memoizing, the results of each distinct method call.
If you need to implement a Schwartzian Transform in Ruby, it's faster to use a hash than an array:
m = {} list.sort { |x,y| (m[x] ||= x.downcase) <=> (m[y] ||= y.downcase) }
This technique is especially important if the method you need to call has side effects. You certainly don't want to call such methods more than once!
See Also
- The Ruby FAQ, question 9.15
- Recipe 4.5, "Sorting an Array"