Multiplying Matrices
Problem
You want to turn arrays of arrays of numbers into mathematical matrices, and multiply the matrices together.
Solution
You can create Matrix objects from arrays of arrays, and multiply them together with the * operator:
require ' matrix' require 'mathn' a1 = [[1, 1, 0, 1], [2, 0, 1, 2], [3, 1, 1, 2]] m1 = Matrix[*a1] # => Matrix[[1, 1, 0, 1], [2, 0, 1, 2], [3, 1, 1, 2]] a2 = [[1, 0], [3, 1], [1, 0], [2, 2.5]] m2 = Matrix[*a2] # => Matrix[[1, 0], [3, 1], [1, 0], [2, 2.5]] m1 * m2 # => Matrix[[6, 3.5], [7, 5.0], [11, 6.0]]
Note the unusual syntax for creating a Matrix object: you pass the rows of the matrix into the array indexing operator, not into Matrix#new (which is private).
Discussion
Ruby's Matrix class overloads the arithmetic operators to support all the basic matrix arithmetic operations, including multiplication, between matrices of compatible dimension. If you perform an arithmetic operation on incompatible matrices, you'll get an ExceptionForMatrix::ErrDimensionMismatch.
Multiplying one matrix by another is simple enough, but multiplying a chain of matrices together can be faster or slower depending on the order in which you do the multiplications. This follows from the fact that multiplying a matrix with dimensions K x M, by a matrix with dimensions MxN, requires K * M * N operations and gives a matrix with dimension K * N. If K is large for some matrix, you can save time by waiting til the end before doing multiplications involving that matrix.
Consider three matrices A, B, and C, which you want to multiply together. A has 100 rows and 20 columns. B has 20 rows and 10 columns. C has 10 rows and one column.
Since matrix multiplication is associative, you'll get the same results whether you multiply A by B and then the result by C, or multiply B by C and then the result by A. But multiplying A by B requires 20,000 operations (100 * 20 * 10), and multiplying (AB) by C requires another 1,000 (100 * 10 * 1). Multiplying B by C only requires 200 operations (20 * 10 * 1), and multiplying the result by A requires 2,000 more (100 * 20 * 1). It's almost 10 times faster to multiply A(BC) instead of the naive order of (AB)C.
That kind of potential savings justifies doing some up-front work to find the best order for the multiplication. Here is a method that recursively figures out the most efficient multiplication order for a list of Matrix objects, and another method that actually carries out the multiplications. They share an array containing information about where to divide up the list of matrices: where to place the parentheses, if you will.
class Matrix def self.multiply(*matrices) cache = [] matrices.size.times { cache << [nil] * matrices.size } best_split(cache, 0, matrices.size-1, *matrices) multiply_following_cache(cache, 0, matrices.size-1, *matrices) end
Because the methods that do the actual work pass around recursion arguments that the end user doesn't care about, I've created Matrix.multiply, a wrapper method for the methods that do the real work. These methods are defined below (Matrix.best_split and Matrix.multiply_following_cache). Matrix.multiply_following_cache assumes that the optimal way to multiply that list of Matrix objects has already been found and encoded in a variable cache. It recursively performs the matrix multiplications in the optimal order, as determined by the cache.
:private def self.multiply_following_cache(cache, chunk_start, chunk_end, *matrices) if chunk_end == chunk_start # There's only one matrix in the list; no need to multiply. return matrices[chunk_start] elsif chunk_end-chunk_start == 1 # There are only two matrices in the list; just multiply them together. lhs, rhs = matrices[chunk_start..chunk_end] else # There are more than two matrices in the list. Look in the # cache to see where the optimal split is located. Multiply # together all matrices to the left of the split (recursively, # in the optimal order) to get our equation's left-hand # side. Similarly for all matrices to the right of the split, to # get our right-hand side. split_after = cache[chunk_start][chunk_end][1] lhs = multiply_following_cache(cache, chunk_start, split_after, * matrices) rhs = multiply_following_cache(cache, split_after+1, chunk_end, * matrices) end # Begin debug code: this illustrates the order of multiplication, # showing the matrices in terms of their dimensions rather than their # (possibly enormous) contents. if $DEBUG lhs_dim = "#{lhs.row_size}x#{lhs.column_size}" rhs_dim = "#{rhs.row_size}x#{rhs.column_size}" cost = lhs.row_size * lhs.column_size * rhs.column_size puts "Multiplying #{lhs_dim} by #{rhs_dim}: cost #{cost}" end # Do a matrix multiplication of the two matrices, whether they are # the only two matrices in the list or whether they were obtained # through two recursive calls. return lhs * rhs end
Finally, here's the method that actually figures out the best way of splitting up the multiplcations. It builds the cache used by the multiply_following_cache method defined above. It also uses the cache as it builds it, so that it doesn't solve the same subproblems over and over again.
def self.best_split(cache, chunk_start, chunk_end, *matrices) if chunk_end == chunk_start cache[chunk_start][chunk_end] = [0, nil] end return cache[chunk_start][chunk_end] if cache[chunk_start][chunk_end] #Try splitting the chunk at each possible location and find the #minimum cost of doing the split there. Then pick the smallest of #the minimum costs: that's where the split should actually happen. minimum_costs = [] chunk_start.upto(chunk_end-1) do |split_after| lhs_cost = best_split(cache, chunk_start, split_after, *matrices)[0] rhs_cost = best_split(cache, split_after+1, chunk_end, *matrices)[0] lhs_rows = matrices[chunk_start].row_size rhs_rows = matrices[split_after+1].row_size rhs_cols = matrices[chunk_end].column_size merge_cost = lhs_rows * rhs_rows * rhs_cols cost = lhs_cost + rhs_cost + merge_cost minimum_costs << cost end minimum = minimum_costs.min minimum_index = chunk_start + minimum_costs.index(minimum) return cache[chunk_start][chunk_end] = [minimum, minimum_index] end end
A simple test confirms the example set of matrices spelled out earlier. Remember that we had a 100 x 20 matrix (A), a 20 x 10 matrix (B), and a 20 x 1 matrix (C). Our method should be able to figure out that it's faster to multiply A(BC) than the naive multiplication (AB)C. Since we don't care about the contents of the matrices, just the dimensions, we'll first define some helper methods that make it easy to generate matrices with specific dimensions but random contents.
class Matrix # Creates a randomly populated matrix with the given dimensions. def self.with_dimensions(rows, cols) a = [] rows.times { a << []; cols.times { a[-1] << rand(10) } } return Matrix[*a] end # Creates an array of matrices that can be multiplied together def self.multipliable_chain(*rows) matrices = [] 0.upto(rows.size-2) do |i| matrices << Matrix.with_dimensions(rows[i], rows[i+1]) end return matrices end end
After all that, the test is kind of anticlimactic:
# Create an array of matrices 100x20, 20x10, 10x1. chain = Matrix.multipliable_chain(100, 20, 10, 1) # Multiply those matrices two different ways, giving the same result. Matrix.multiply(*chain) == (chain[0] * chain[1] * chain[2]) # Multiplying 20x10 by 10x1: cost 200 # Multiplying 100x20 by 20x1: cost 2000 # => true
We can use the Benchmark library to verify that matrix multiplication goes much faster when we do the multiplications in the right order:
# We'll generate the dimensions and contents of the matrices randomly, # so no one can accuse us of cheating. dimensions = [] 10.times { dimensions << rand(90)+10 } chain = Matrix.multipliable_chain(*dimensions) require 'benchmark' result_1 = nil result_2 = nil Benchmark.bm(11) do |b| b.report("Unoptimized") do result_1 = chain[0] chain[1..chain.size].each { |c| result_1 *= c } end b.report("Optimized") { result_2 = Matrix.multiply(*chain) } end # user system total real # Unoptimized 4.350000 0.400000 4.750000 ( 11.104857) # Optimized 1.410000 0.110000 1.520000 ( 3.559470) # Both multiplications give the same result. result_1 == result_2 # => true
See Also
- Recipe 2.11, "Solving a System of Linear Equations," uses matrices to solve linear equations
- For more on benchmarking, see Recipe 17.13, "Benchmarking Competing Solutions"