I recently moved to Python 3.5 and noticed the new matrix multiplication operator (#) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a # b # Python 3.5+
d = np.dot(a, b)
The # operator returns an array of shape:
c.shape
(8, 13, 13)
while the np.dot() function returns:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?
The # operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
matmul differs from dot in two important ways.
Multiplication by scalars is not allowed.
Stacks of matrices are broadcast together as if the matrices were elements.
The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
For matmul:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For np.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b
Just FYI, # and its numpy equivalents dot and matmul are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(A, x):
return A # x
def numpy_dot(A, x):
return numpy.dot(A, x)
def numpy_matmul(A, x):
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(15)],
)
The answer by #ajcr explains how the dot and matmul (invoked by the # symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
To clarify the differences take a 4x4 array and return the dot product and matmul product with a 3x4x2 'stack of matricies' or tensor.
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
threebyfourbytwo = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))
The products of each operation appear below. Notice how the dot product is,
...a sum product over the last axis of a and the second-to-last of b
and how the matrix product is formed by broadcasting the matrix together.
4x4*3x4x2 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*3x4x2 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]
In mathematics, I think the dot in numpy makes more sense
dot(a,b)_{i,j,k,a,b,c} =
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
matmul(a,b)_{i,j,k,c} =
So, you can see that matmul(a,b) returns an array with a small shape,
which has smaller memory consumption and make more sense in applications.
In particular, combining with broadcasting, you can get
matmul(a,b)_{i,j,k,l} =
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
To use dot(a,b) you need
t3=s4;
To use matmul(a,b) you need
t3=s4
t2=s2, or one of t2 and s2 is 1
t1=s1, or one of t1 and s1 is 1
Use the following piece of code to convince yourself.
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] # you will not see them
Here is a comparison with np.einsum to show how the indices are projected
np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b) # True
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a#b) # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b)) # True
My experience with MATMUL and DOT
I was constantly getting "ValueError: Shape of passed values is (200, 1), indices imply (200, 3)" when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don't get any error using DOT. I get the correct answer
with MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"
with DOT
YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877, 1.06842193, 2.18533396, 2.11366346, 0.28505879, …
YY.shape
>>> (200, )
Related
Have been completely stuck on a rather silly issue: I'm trying to compute the dot product of some attributes between objects, but keep getting a Value Error - Shape Mismatch - but the shapes are identical (2,1) and (2,1), since the arrays are just attributes of different instances of the same class:
class MyClass(Object):
def __init__(self, a,b, x,y):
self.prop_1 = np.array((a,b))
self.prop_2 = np.array((x,y))
where all a, b, x, and y are scalars. then further down I'm trying
def MyFunction(Obj1, Obj2):
results = np.dot(Obj1.prop_1 - Obj2.prop_1, Obj2.prop_2 - Obj2.prop_3)
which keeps throwing the Value Error
ValueError: shapes (2,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)
Mathematically, this dot product should be fine - but the final bit of the error message kind of suggests I have to transpose one of the arrays. I'd be very thankful for a short explanation of the numpy shape interpretation to avoid this kind of error!
EDIT:
Think I misphrased this a bit. When I initiate my objects via (case a)
a,b = np.random.rand(2)
x,y = np.random.rand(2)
MyClass(a, b, x, y)
Everything works like a charm. If instead however I initiate as (case b)
a = np.random.rand(1)
b = np.random.rand(1)
x = np.random.rand(1)
y = np.random.rand(1)
MyClass(a, b, x, y)
the dot product later on fails to work because of the shape mismatch.
I have noticed that in case b, each individual value is of shape (1,) and it's clear to me that combining two of these will result in shape (2,1) instead of shape () in case a - but why do these two ways of declaring a variable result in different shapes?
As you can tell I'm relatively new to Python and thought this was just a neat way to perform multiple assignments - turns out there is some further reasoning behind it, and i'd be interested to hear about that.
Part 1
The issue is that your arrays are full-blown 2-D matrices, not 1D "vectors" in the sense that np.dot understands it. To get your multiplication working, you need to either (a) convert your vectors to vectors:
np.dot(a.reshape(-1), b.reshape(-1))
(b) set up the matrix multiplication so that the dimensions work. Remember that the dot product of two Nx1 matrices is ATB:
np.dot(a.T, b)
or (c), use np.einsum to explicitly set the dimension of the sum:
np.einsum('ij,ij->j', a, b).item()
For all of the examples using dot, you can use np.matmul (or equivalently the # operator), or np.tensordot, because you have 2D arrays.
In general, keep the following rules in mind when working with dot. Table cells are einsum subscripts
A
| 1D | 2D | ND |
---+-------------------+---------------------+-------------------------------+
1D | i,i-> | ij,j->i | a...yz,z->a...y |
---+-------------------+---------------------+-------------------------------+
B 2D | i,ij->j | ij,jk->ik | a...xy,yz->a...xz |
---+-------------------+---------------------+-------------------------------+
ND | y,a...xyz->a...xz | ay,b...xyz->ab...xz | a...mxy,n...wyz->a...mxn...wz |
---+-------------------+---------------------+-------------------------------+
Basically, dot follows normal rules for matrix multiplication along the last two dimensions, but the leading dimensions are always combined. If you want the leading dimensions to be broadcast together for arrays > 2D (i.e., multiplying corresponding elements in a stack of matrices, rather all possible combinations), use matmul or # instead.
Part 2
When you initialize the inputs as a, b = np.random.rand(2), you are unpacking the two elements of the array into scalars:
>>> a, b = np.random.rand(2)
>>> a
0.595823752387523
>>> type(a)
numpy.float64
>>> a.shape
()
Note that the type is not numpy.ndarray in this case. However, when you do a = np.random.rand(1), the result is a 1D array of one element:
>>> a = np.random.rand(1)a
>>> a
array([0.21983553])
>>> type(a)
numpy.ndarray
>>> a.shape
(1,)
When you create a numpy array from numpy arrays, the result is a 2D array:
>>> np.array([1, 2]).shape
(2,)
>>> np.array([np.array([1]), np.array([2])]).shape
(2, 1)
Going forward, you have two options. You can either be more careful with your inputs, or you can sanitize the array after you've created it.
You can expand the arrays that you feed in:
ab = np.random.rand(2)
xy = np.random.rand(2)
MyClass(*ab, *xy)
Or you can just flatten/ravel your arrays once you've created them:
def __init__(self, a, b, x, y):
self.prop_1 = np.array([a, b]).ravel()
self.prop_2 = np.array([x, y]).ravel()
You can use ....reshape(-1) instead of ...ravel().
I am trying to rotate some coordinates in numpy using a 2x2 matrix P and the coordinates internal (stored as an np.array in a row). However, I get weird behavior when calculating P # internal. The code below reproduces the behavior:
>>> import numpy as np
>>> a = np.array([1, 0], dtype=np.float)
>>> c, s = np.cos(np.pi), np.sin(np.pi)
>>> p = np.matrix([[c, s], [-s, c]])
>>> b = p # a
>>> b
matrix([[-1.0000000e+00, -1.2246468e-16]])
>>> b.shape
(1, 2)
>>> b[0].shape
(1, 2)
>>> b[0][0].shape
(1, 2)
>>> b[0][0][0].shape
(1, 2)
As can be seen, I cannot index into the matrix, and I suddenly have an extra dimension in what should be a 1D array. In the documentation for numpy it states "If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed." However, I am failing to see this behavior, instead just seeing the weird nested shape.
Why does this happen?
As you pointed out, b is a matrix. This is a deprecated subclass of ndarray, which is always 2D. Initializing a matrix with an (N,)-element vector appends 1 to the shape, turning it into an (N, 1) column, as expected. b[0] is also a matrix, but this time it is a row. The first row of the row is also a row, so the shape will stay the same no matter how many times you access the first row.
That being said, you can access individual matrix elements be using a row-column index:
>>> b[0, 0]
-1.0
TL;DR
Don't use matrix: it's deprecated and has issues/lack of support. Do p = np.array([[c, s], [-s, c]]) instead, and you will see the expected behavior.
I have an n-by-3-by-3 numpy array A and an n-by-3 numpy array B. I'd now like to multiply every row of every one of the n 3-by-3 matrices with the corresponding scalar in B, i.e.,
import numpy as np
A = np.random.rand(10, 3, 3)
B = np.random.rand(10, 3)
for a, b in zip(A, B):
a = (a.T * b).T
print(a)
Can this be done without the loop as well?
You can use NumPy broadcasting to let the elementwise multiplication happen in a vectorized manner after extending B to 3D after adding a singleton dimension at the end with np.newaxis or its alias/shorthand None. Thus, the implementation would be A*B[:,:,None] or simply A*B[...,None].
I was confused with matrix operation on Python Numpy.
It seems that dot and outer operations don't behave like what I have learn in Linear Algebra class.
import numpy
n = numpy.arange(-5, 6)
w = numpy.arange(-20, 21)
n.shape
w.shape
outer = numpy.outer(w, n)
outer.shape
dot = numpy.dot(n, outer.transpose())
dot.shape
Here n is (11, 1) matrix, w is (41, 1) matrix. I think the size of w and n doesn't match.((41, 1) outer(11, 1))
Again, I think the dot is strange. n is (11, 1) matrix, outer.transpose() is (11, 41) matrix. I think the size is also not matched.
According to the documentation http://docs.scipy.org/doc/numpy/reference/generated/numpy.outer.html , the outer function of two row vectors A(1xn) and B(1xm) is a matrix M(nxm) - and the transpose will be of dimension mxn. This is exactly what you are seeing.
Thus, the dot product of a vector and a matrix is again described in the documentation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html#numpy.dot - where it is essentially described as the matrix multiplication of the row vector (first argument) with the transpose of the second argument (matrix).
When I print out the shapes of the various objects your code creates, I get:
n.shape: (11,)
w.shape: (41,)
outer.shape: (41, 11)h
dot.shape: (41,)
Which is entirely consistent with the above. What is your confusion? What result is not what you were expecting?
What is the difference between
import numpy as np
np.dot(a,b)
and
import numpy as np
np.inner(a,b)
all examples I tried returned the same result. Wikipedia has the same article for both?! In the description of inner() it says, that its behavior is different in higher dimensions, but I couldn't produce any different output. Which one should I use?
numpy.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
numpy.inner:
Ordinary inner product of vectors for 1-D arrays (without complex conjugation), in higher dimensions a sum product over the last axes.
(Emphasis mine.)
As an example, consider this example with 2D arrays:
>>> a=np.array([[1,2],[3,4]])
>>> b=np.array([[11,12],[13,14]])
>>> np.dot(a,b)
array([[37, 40],
[85, 92]])
>>> np.inner(a,b)
array([[35, 41],
[81, 95]])
Thus, the one you should use is the one that gives the correct behaviour for your application.
Performance testing
(Note that I am testing only the 1D case, since that is the only situation where .dot and .inner give the same result.)
>>> import timeit
>>> setup = 'import numpy as np; a=np.random.random(1000); b = np.random.random(1000)'
>>> [timeit.timeit('np.dot(a,b)',setup,number=1000000) for _ in range(3)]
[2.6920320987701416, 2.676928997039795, 2.633111000061035]
>>> [timeit.timeit('np.inner(a,b)',setup,number=1000000) for _ in range(3)]
[2.588860034942627, 2.5845699310302734, 2.6556360721588135]
So maybe .inner is faster, but my machine is fairly loaded at the moment, so the timings are not consistent nor are they necessarily very accurate.
np.dot and np.inner are identical for 1-dimensions arrays, so that is probably why you aren't noticing any differences. For N-dimension arrays, they correspond to common tensor operations.
np.inner is sometimes called a "vector product" between a higher and lower order tensor, particularly a tensor times a vector, and often leads to "tensor contraction". It includes matrix-vector multiplication.
np.dot corresponds to a "tensor product", and includes the case mentioned at the bottom of the Wikipedia page. It is generally used for multiplication of two similar tensors to produce a new tensor. It includes matrix-matrix multiplication.
If you're not using tensors, then you don't need to worry about these cases and they behave identically.
For 1 and 2 dimensional arrays numpy.inner works as transpose the second matrix then multiply.
So for:
A = [[a1,b1],[c1,d1]]
B = [[a2,b2],[c2,d2]]
numpy.inner(A,B)
array([[a1*a2 + b1*b2, a1*c2 + b1*d2],
[c1*a2 + d1*b2, c1*c2 + d1*d2])
I worked this out using examples like:
A=[[1 ,10], [100,1000]]
B=[[1,2], [3,4]]
numpy.inner(A,B)
array([[ 21, 43],
[2100, 4300]])
This also explains the behaviour in one dimension, numpy.inner([a,b],[c,b]) = ac+bd and numpy.inner([[a],[b]], [[c],[d]]) = [[ac,ad],[bc,bd]].
This is the extent of my knowledge, no idea what it does for higher dimensions.
inner is not working properly with complex 2D arrays, Try to multiply
and its transpose
array([[ 1.+1.j, 4.+4.j, 7.+7.j],
[ 2.+2.j, 5.+5.j, 8.+8.j],
[ 3.+3.j, 6.+6.j, 9.+9.j]])
you will get
array([[ 0. +60.j, 0. +72.j, 0. +84.j],
[ 0.+132.j, 0.+162.j, 0.+192.j],
[ 0.+204.j, 0.+252.j, 0.+300.j]])
effectively multiplying the rows to rows rather than rows to columns
There is a lot difference between inner product and dot product in higher dimensional space. below is an example of a 2x2 matrix and 3x2 matrix
x = [[a1,b1],[c1,d1]]
y= [[a2,b2].[c2,d2],[e2,f2]
np.inner(x,y)
output = [[a1xa2+b1xb2 ,a1xc2+b1xd2, a1xe2+b1f2],[c1xa2+d1xb2, c1xc2+d1xd2, c1xe2+d1xf2]]
But in the case of dot product the output shows the below error as you cannot multiply a 2x2 matrix with a 3x2.
ValueError: shapes (2,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)
I made a quick script to practice inner and dot product math. It really helped me get a feel for the difference:
You can find the code here:
https://github.com/geofflangenderfer/practice_inner_dot