Computing Set Operations on Arrays
Problem
You want to find the union, intersection, difference, or Cartesian product of two arrays, or the complement of a single array with respect to some universe.
Solution
Array objects have overloaded arithmetic and logical operators to provide the three simplest set operations:
#Union [1,2,3] | [1,4,5] # => [1, 2, 3, 4, 5] #Intersection [1,2,3] & [1,4,5] # => [1] #Difference [1,2,3] - [1,4,5] # => [2, 3]
Set objects overload the same operators, as well as the exclusive-or operator (^).If you already have Arrays, though, it's more efficient to deconstruct the XOR operation into its three component operations.
require 'set' a = [1,2,3] b = [3,4,5] a.to_set ^ b.to_set # => # (a | b) - (a & b) # => [1, 2, 4, 5]
Discussion
Set objects are intended to model mathematical sets: where arrays are ordered and can contain duplicate entries, Sets model an unordered collection of unique items. Set not only overrides operators for set operations, it provides English-language aliases for the three most common operators: Set#union, Set#intersection, and Set#difference. An array can only perform a set operation on another array, but a Set can perform a set operation on any Enumerable.
array = [1,2,3] set = [3,4,5].to_s array & set # => TypeError: can't convert Set into Array set & array # => #
You might think that Set objects would be optimized for set operations, but they're actually optimized for constant-time membership checks (internally, a Set is based on a hash). Set union is faster when the left-hand object is a Set object, but intersection and difference are significantly faster when both objects are arrays. It's not worth it to convert arrays into Sets just so you can say you performed set operations on Set objects.
The union and intersection set operations remove duplicate entries from arrays. The difference operation does not remove duplicate entries from an array except as part of a subtraction.
[3,3] & [3,3] # => [3] [3,3] | [3,3] # => [3] [1,2,3,3] - [1] # => [2, 3, 3] [1,2,3,3] - [3] # => [1, 2] [1,2,3,3] - [2,2,3] # => [1]
Complement
If you want the complement of an array with respect to some small universe, create that universe and use the difference operation:
u = [:red, :orange, :yellow, :green, :blue, :indigo, :violet] a = [:red, :blue] u - a # => [:orange, :yellow, :green, :indigo, :violet]
More often, the relevant universe is infinite (the set of natural numbers)or extremely large (the set of three-letter strings). The best strategy here is to define a generator and use it to iterate through the complement. Be sure to break when you're done; you don't want to iterate over an infinite set.
def natural_numbers_except(exclude) exclude_map = {} exclude.each { |x| exclude_map[x] = true } x = 1 while true yield x unless exclude_map[x] x = x.succ end end natural_numbers_except([2,3,6,7]) do |x| break if x > 10 puts x end # 1 # 4 # 5 # 8 # 9 # 10
Cartesian product
To get the Cartesian product of two arrays, write a nested iteration over both lists and append each pair of items to a new array. This code is attached to Enumerable so you can also use it with Sets or any other Enumerable.
module Enumberable def cartesian(other) res = [] each { |x| other.each { |y| res << [x, y] } } return res end end [1,2,3].cartesian(["a",5,6]) # => [[1, "a"], [1, 5], [1, 6], # [2, "a"], [2, 5], [2, 6], # [3, "a"], [3, 5], [3, 6]
This version uses Enumerable#inject to make the code more concise; however, the original version is more efficient.
module Enumerable def cartesian(other) inject([]) { |res, x| other.inject(res) { |res, y| res << [x,y] } } end end
See Also
- See Recipe 2.5, "Generating Random Numbers," for an example (constructing a deck of cards from suits and ranks)that could benefit from a function to calculate the Cartesian product
- Recipe 2.10, "Multiplying Matrices"