Making Sure a Sorted Array Stays Sorted
Problem
You want to make sure an array stays sorted, even as you replace its elements or add new elements to it.
Solution
Subclass Array and override the methods that add items to the array. The new implementations add every new item to a position that maintains the sortedness of the array.
As you can see below, there are a lot of these methods. If you can guarantee that a particular method will never be called, you can get away with not overriding it.
class SortedArray < Array def initialize(*args, &sort_by) @sort_by = sort_by || Proc.new { |x,y| x <=> y } super(*args) sort! &sort_by end def insert(i, v) # The next line could be further optimized to perform a # binary search. insert_before = index(find { |x| @sort_by.call(x, v) == 1 }) super(insert_before ? insert_before : -1, v) end def <<(v) insert(0, v) end alias push << alias unshift <<
Some methods, like collect!, can modify the items in an array, taking them out of sort order. Some methods, like flatten!, can add new elements to strange places in an array. Rather than figuring out a way to implement these methods in a way that preserves the sortedness of the array, we'll just let them run and then re-sort the array.[1]
[1] We can't use define_method to define these methods because in Ruby 1.8 you can't use define_method to create a method that takes a block argument. See Chapter 10 for more on this.
["collect!", "flatten!", "[]="].each do |method_name| module_eval %{ def #{method_name}(*args) super sort! &@sort_by end } end def reverse! #Do nothing; reversing the array would disorder it. end end
A SortedArray created from an unsorted array will end up sorted:
a = SortedArray.new([3,2,1]) # => [1, 2, 3]
Discussion
Many methods of Array are much faster on sorted arrays, so it's often useful to expend some overhead on keeping an array sorted over time. Removing items from a sorted array won't unsort it, but adding or modifying items can. Keeping a sorted array sorted means intercepting and reimplementing every sneaky way of putting objects into the array.
The SortedArray constructor accepts any code block you can pass into Array#sort, and keeps the array sorted according to that code block. The default code block uses the comparison operator (<=>) used by sort.
unsorted= ["b", "aa", "a", "cccc", "1", "zzzzz", "k", "z"] strings_by_alpha = SortedArray.new(unsorted) # => ["1", "a", "aa", "b", "cccc", "k", "z", "zzzzz"] strings_by_length = SortedArray.new(unsorted) do |x,y| x.length <=> y.length end # => ["b", "z", "a", "k", "1", "aa", "cccc", "zzzzz"]
The methods that add elements to an array specify where in the array they operate: push operates on the end of the array, and insert operates on a specified spot. SortedArray responds to these methods but it ignores the caller's request to put elements in a certain place. Every new element is inserted into a position that keeps the array sorted.
a << -1 # => [-1, 1, 2, 3] a << 1.5 # => [-1, 1, 1.5, 2, 3] a.push(2.5) # => [-1, 1, 1.5, 2, 2.5, 3] a.unshift(1.6) # => [-1, 1, 1.5, 1.6, 2, 2.5, 3]
For methods like collect! and array assignment ([]=)that allow complex changes to an array, the simplest solution is to allow the changes to go through and then re-sort:
a = SortedArray.new([10, 6, 4, -4, 200, 100]) # => [-4, 4, 6, 10, 100, 200] a.collect! { |x| x * -1 } # => [-200, -100, -10, -6, -4, 4] a[3] = 25 a # => [-200, -100, -10, -4, 4, 25] # That is, -6 has been replaced by 25 and the array has been re-sorted. a[1..2] = [6000, 10, 600, 6] a # => [-200, -4, 4, 6, 10, 25, 600, 6000] # That is, -100 and -10 have been replaced by 6000, 10, 600, and 6, # and the array has been re-sorted.
But with a little more work, we can write a more efficient implementation of array assignment that gives the same behavior. What happens when you run a command like a[0]= 10 on a SortedArray? The first element in the SortedArray is replaced by 10, and the SortedArray is re-sorted. This is equivalent to removing the first element in the array, then adding the value 10 to a place in the array that keeps it sorted.
Array#[]= implements three different types of array assignment, but all three can be modeled as a series of removals followed by a series of insertions. We can use this fact to implement a more efficient version of SortedArray#[]=:.
class SortedArray def []=(*args) if args.size == 3 #e.g. "a[6,3] = [1,2,3]" start, length, value = args slice! Range.new(start, start+length, true) (value.respond_to? :each) ? value.each { |x| self << x } : self << value elsif args.size == 2 index, value = args if index.is_a? Numeric #e.g. "a[0] = 10" (the most common form of array assignment) delete_at(index) self << value elsif index.is_a? Range #e.g. "a[0..3] = [1,2,3]" slice! index (value.respond_to? :each) ? value.each { |x| self << x } : self << value else #Not supported. Delegate to superclass; will probably give an error. super sort!(&sort_by) end else #Not supported. Delegate to superclass; will probably give an error. super sort!(&sort_by) end end end
Just as before, the sort will be maintained even when you use array assignment to replace some of a SortedArray's elements with other objects. But this implementation doesn't have to re-sort the array every time.
a = SortedArray.new([1,2,3,4,5,6]) a[0] = 10 a # => [2, 3, 4, 5, 6, 10] a[0, 2] = [100, 200] a # => [4, 5, 6, 10, 100, 200] a[1..2] = [-4, 6] a # => [-4, 4, 6, 10, 100, 200]
It's possible to subvert the sortedness of a SortedArray by modifying an object in place in a way that changes its sort order. Since the SortedArray never hears about the change to this object, it has no way of updating itself to move that object to its new sort position:[2]
[2] One alternative is to modify SortedArray[] so that when you look up an element of the array, you actually get a delegate object that intercepts all of the element's method calls, and re-sorts the array whenever the user calls a method that modifies the element in place. This is probably overkill.
stripes = SortedArray.new(["aardwolf", "zebrafish"]) stripes[1].upcase! stripes # => ["aardwolf", "ZEBRAFISH"] stripes.sort! # => ["ZEBRAFISH", "aardwolf"]
If this bothers you, you can make a SortedArray keep frozen copies of objects instead of the objects themselves. This solution hurts performance and uses more memory, but it will also prevent objects from being modified after being put into the SortedArray. This code adds a convenience method to Object that makes a frozen copy of the object:
class Object def to_frozen f = self unless frozen? begin f = dup.freeze rescue TypeError #This object can't be duped (e.g. Fixnum); fortunately, #it usually can't be modified either end end return f end end
The FrozenCopySortedArray stores frozen copies of objects instead of the objects themselves:
class FrozenCopySortedArray < SortedArray def insert(i, v) insert_before = index(find { |x| x > v }) super(insert_before ? insert_before : -1, v.to_frozen) end ["initialize", "collect!", "flatten!"].each do |method_name| define_method(method_name) do super each_with_index { |x, i| self[i] = x.to_frozen } # No need to sort; by doing an assignment to every element # in the array, we've made #insert keep the array sorted. end end end stripes = SortedArray.new(["aardwolf", "zebrafish"]) stripes[1].upcase! # TypeError: can't modify frozen string
Unlike a regular array, which can have elements of arbitrarily different data classes, all the elements of a SortedArray must be mutually comparable. For instance, you can mix integers and floating-point numbers within a SortedArray, but you can't mix integers and strings. Any data set that would cause Array# sort to fail makes an invalid SortedArray:
[1, "string"].sort # ArgumentError: comparison of Fixnum with String failed a = SortedArray.new([1]) a << "string" # ArgumentError: comparison of Fixnum with String failed
One other pitfall: operations that create a new object, such as |=, +=, and to_a will turn an SortedArray into a (possibly unsorted) array.
a = SortedArray.new([3, 2, 1]) # => [1, 2, 3] a += [1, -10] # => [1, 2, 3, 1, -10] a.class # => Array
The simplest way to avoid this is to override these methods to transform the resulting array back into a SortedArray:
class SortedArray def + (other_array) SortedArray.new(super) end end
See Also
- Recipe 4.11, "Getting the N Smallest Items of an Array," uses a SortedArray
- If you're going to do a lot of insertions and removals, a red-black tree may be faster than a SortedArray; you can choose from a pure Ruby implementation (http://www.germane-software.com/software/Utilities/RBTree/) and one that uses a C extension for speed (http://www.geocities.co.jp/SiliconValley-PaloAlto/3388/rbtree/README.html)