Numpy Dot Product - Shape Error with identical shape

Numpy Dot Product - Shape Error with identical shape - python

Have been completely stuck on a rather silly issue: I'm trying to compute the dot product of some attributes between objects, but keep getting a Value Error - Shape Mismatch - but the shapes are identical (2,1) and (2,1), since the arrays are just attributes of different instances of the same class:
class MyClass(Object):
def __init__(self, a,b, x,y):
self.prop_1 = np.array((a,b))
self.prop_2 = np.array((x,y))
where all a, b, x, and y are scalars. then further down I'm trying
def MyFunction(Obj1, Obj2):
results = np.dot(Obj1.prop_1 - Obj2.prop_1, Obj2.prop_2 - Obj2.prop_3)
which keeps throwing the Value Error
ValueError: shapes (2,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)
Mathematically, this dot product should be fine - but the final bit of the error message kind of suggests I have to transpose one of the arrays. I'd be very thankful for a short explanation of the numpy shape interpretation to avoid this kind of error!
EDIT:
Think I misphrased this a bit. When I initiate my objects via (case a)
a,b = np.random.rand(2)
x,y = np.random.rand(2)
MyClass(a, b, x, y)
Everything works like a charm. If instead however I initiate as (case b)
a = np.random.rand(1)
b = np.random.rand(1)
x = np.random.rand(1)
y = np.random.rand(1)
MyClass(a, b, x, y)
the dot product later on fails to work because of the shape mismatch.
I have noticed that in case b, each individual value is of shape (1,) and it's clear to me that combining two of these will result in shape (2,1) instead of shape () in case a - but why do these two ways of declaring a variable result in different shapes?
As you can tell I'm relatively new to Python and thought this was just a neat way to perform multiple assignments - turns out there is some further reasoning behind it, and i'd be interested to hear about that.

Part 1
The issue is that your arrays are full-blown 2-D matrices, not 1D "vectors" in the sense that np.dot understands it. To get your multiplication working, you need to either (a) convert your vectors to vectors:
np.dot(a.reshape(-1), b.reshape(-1))
(b) set up the matrix multiplication so that the dimensions work. Remember that the dot product of two Nx1 matrices is ATB:
np.dot(a.T, b)
or (c), use np.einsum to explicitly set the dimension of the sum:
np.einsum('ij,ij->j', a, b).item()
For all of the examples using dot, you can use np.matmul (or equivalently the # operator), or np.tensordot, because you have 2D arrays.
In general, keep the following rules in mind when working with dot. Table cells are einsum subscripts
A
| 1D | 2D | ND |
---+-------------------+---------------------+-------------------------------+
1D | i,i-> | ij,j->i | a...yz,z->a...y |
---+-------------------+---------------------+-------------------------------+
B 2D | i,ij->j | ij,jk->ik | a...xy,yz->a...xz |
---+-------------------+---------------------+-------------------------------+
ND | y,a...xyz->a...xz | ay,b...xyz->ab...xz | a...mxy,n...wyz->a...mxn...wz |
---+-------------------+---------------------+-------------------------------+
Basically, dot follows normal rules for matrix multiplication along the last two dimensions, but the leading dimensions are always combined. If you want the leading dimensions to be broadcast together for arrays > 2D (i.e., multiplying corresponding elements in a stack of matrices, rather all possible combinations), use matmul or # instead.
Part 2
When you initialize the inputs as a, b = np.random.rand(2), you are unpacking the two elements of the array into scalars:
>>> a, b = np.random.rand(2)
>>> a
0.595823752387523
>>> type(a)
numpy.float64
>>> a.shape
()
Note that the type is not numpy.ndarray in this case. However, when you do a = np.random.rand(1), the result is a 1D array of one element:
>>> a = np.random.rand(1)a
>>> a
array([0.21983553])
>>> type(a)
numpy.ndarray
>>> a.shape
(1,)
When you create a numpy array from numpy arrays, the result is a 2D array:
>>> np.array([1, 2]).shape
(2,)
>>> np.array([np.array([1]), np.array([2])]).shape
(2, 1)
Going forward, you have two options. You can either be more careful with your inputs, or you can sanitize the array after you've created it.
You can expand the arrays that you feed in:
ab = np.random.rand(2)
xy = np.random.rand(2)
MyClass(*ab, *xy)
Or you can just flatten/ravel your arrays once you've created them:
def __init__(self, a, b, x, y):
self.prop_1 = np.array([a, b]).ravel()
self.prop_2 = np.array([x, y]).ravel()
You can use ....reshape(-1) instead of ...ravel().

Related

Element wise divide like MATLAB's ./ operator?

I am trying to normalize some Nx3 data. If X is a Nx3 array and D is a Nx1 array, in MATLAB, I can do
Y = X./D
If I do the following in Python, I get an error
X = np.random.randn(100,3)
D = np.linalg.norm(X,axis=1)
Y = X/D
ValueError: operands could not be broadcast together with shapes (100,3) (100,)
Any suggestions?
Edit: Thanks to dm2.
Y = X/D.reshape((100,1))
Another way is to use scikitlearn.
from sklearn import preprocessing
Y = preprocessing.normalize(X)

From numpy documentation on array broadcasting:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing (i.e. rightmost) dimensions
and works its way left. Two dimensions are compatible when
they are equal, or
one of them is 1
Both of your arrays have the same first dimension, but your X array is 2-dimensional, while your D array is 1-dimensional, which means the shapes of these two arrays do not meet the requirements to be broadcast together.
To make sure they do, you could reshape your D array into a 2-dimensional array of shape (100,1), which would satisfy the requirements to broadcast: rightmost dimensions are 3 and 1 (one of them is 1) and the other dimensions are equal (100 and 100).
So:
Y = X/D.reshape((-1,1))
or
Y = X/D.reshape((100,1))
or
Y = X/D[:,np.newaxis]
Should give you the result you're after.

Does np.dot automatically transpose vectors?

I am trying to calculate the first and second order moments for a portfolio of stocks (i.e. expected return and standard deviation).
expected_returns_annual
Out[54]:
ticker
adj_close CNP 0.091859
F -0.007358
GE 0.095399
TSLA 0.204873
WMT -0.000943
dtype: float64
type(expected_returns_annual)
Out[55]: pandas.core.series.Series
weights = np.random.random(num_assets)
weights /= np.sum(weights)
returns = np.dot(expected_returns_annual, weights)
So normally the expected return is calculated by
(x1,...,xn' * (R1,...,Rn)
with x1,...,xn are weights with a constraint that all the weights have to sum up to 1 and ' means that the vector is transposed.
Now I am wondering a bit about the numpy dot function, because
returns = np.dot(expected_returns_annual, weights)
and
returns = np.dot(expected_returns_annual, weights.T)
give the same results.
I tested also the shape of weights.T and weights.
weights.shape
Out[58]: (5,)
weights.T.shape
Out[59]: (5,)
The shape of weights.T should be (,5) and not (5,), but numpy displays them as equal (I also tried np.transpose, but there is the same result)
Does anybody know why numpy behave this way? In my opinion the np.dot product automatically shape the vector the right why so that the vector product work well. Is that correct?
Best regards
Tom

The semantics of np.dot are not great
As Dominique Paul points out, np.dot has very heterogenous behavior depending on the shapes of the inputs. Adding to the confusion, as the OP points out in his question, given that weights is a 1D array, np.array_equal(weights, weights.T) is True (array_equal tests for equality of both value and shape).
Recommendation: use np.matmul or the equivalent # instead
If you are someone just starting out with Numpy, my advice to you would be to ditch np.dot completely. Don't use it in your code at all. Instead, use np.matmul, or the equivalent operator #. The behavior of # is more predictable than that of np.dot, while still being convenient to use. For example, you would get the same dot product for the two 1D arrays you have in your code like so:
returns = expected_returns_annual # weights
You can prove to yourself that this gives the same answer as np.dot with this assert:
assert expected_returns_annual # weights == expected_returns_annual.dot(weights)
Conceptually, # handles this case by promoting the two 1D arrays to appropriate 2D arrays (though the implementation doesn't necessarily do this). For example, if you have x with shape (N,) and y with shape (M,), if you do x # y the shapes will be promoted such that:
x.shape == (1, N)
y.shape == (M, 1)
Complete behavior of matmul/#
Here's what the docs have to say about matmul/# and the shapes of inputs/outputs:
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.
Notes: the arguments for using # over dot
As hpaulj points out in the comments, np.array_equal(x.dot(y), x # y) for all x and y that are 1D or 2D arrays. So why do I (and why should you) prefer #? I think the best argument for using # is that it helps to improve your code in small but significant ways:
# is explicitly a matrix multiplication operator. x # y will raise an error if y is a scalar, whereas dot will make the assumption that you actually just wanted elementwise multiplication. This can potentially result in a hard-to-localize bug in which dot silently returns a garbage result (I've personally run into that one). Thus, # allows you to be explicit about your own intent for the behavior of a line of code.
Because # is an operator, it has some nice short syntax for coercing various sequence types into arrays, without having to explicitly cast them. For example, [0,1,2] # np.arange(3) is valid syntax.
To be fair, while [0,1,2].dot(arr) is obviously not valid, np.dot([0,1,2], arr) is valid (though more verbose than using #).
When you do need to extend your code to deal with many matrix multiplications instead of just one, the ND cases for # are a conceptually straightforward generalization/vectorization of the lower-D cases.

I had the same question some time ago. It seems that when one of your matrices is one dimensional, then numpy will figure out automatically what you are trying to do.
The documentation for the dot function has a more specific explanation of the logic applied:
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using
matmul or a # b is preferred.
If either a or b is 0-D (scalar), it is equivalent to multiply and
using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum product over
the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a sum
product over the last axis of a and the second-to-last axis of b:

In NumPy, a transpose .T reverses the order of dimensions, which means that it doesn't do anything to your one-dimensional array weights.
This is a common source of confusion for people coming from Matlab, in which one-dimensional arrays do not exist. See Transposing a NumPy Array for some earlier discussion of this.
np.dot(x,y) has complicated behavior on higher-dimensional arrays, but its behavior when it's fed two one-dimensional arrays is very simple: it takes the inner product. If we wanted to get the equivalent result as a matrix product of a row and column instead, we'd have to write something like
np.asscalar(x # y[:, np.newaxis])
adding a trailing dimension to y to turn it into a "column", multiplying, and then converting our one-element array back into a scalar. But np.dot(x,y) is much faster and more efficient, so we just use that.
Edit: actually, this was dumb on my part. You can, of course, just write matrix multiplication x # y to get equivalent behavior to np.dot for one-dimensional arrays, as tel's excellent answer points out.

The shape of weights.T should be (,5) and not (5,),
suggests some confusion over the shape attribute. shape is an ordinary Python tuple, i.e. just a set of numbers, one for each dimension of the array. That's analogous to the size of a MATLAB matrix.
(5,) is just the way of displaying a 1 element tuple. The , is required because of older Python history of using () as a simple grouping.
In [22]: tuple([5])
Out[22]: (5,)
Thus the , in (5,) does not have a special numpy meaning, and
In [23]: (,5)
File "<ipython-input-23-08574acbf5a7>", line 1
(,5)
^
SyntaxError: invalid syntax
A key difference between numpy and MATLAB is that arrays can have any number of dimensions (upto 32). MATLAB has a lower boundary of 2.
The result is that a 5 element numpy array can have shapes (5,), (1,5), (5,1), (1,5,1)`, etc.
The handling of a 1d weight array in your example is best explained the np.dot documentation. Describing it as inner product seems clear enough to me. But I'm also happy with the
sum product over the last axis of a and the second-to-last axis of b
description, adjusted for the case where b has only one axis.
(5,) with (5,n) => (n,) # 5 is the common dimension
(n,5) with (5,) => (n,)
(n,5) with (5,1) => (n,1)
In:
(x1,...,xn' * (R1,...,Rn)
are you missing a )?
(x1,...,xn)' * (R1,...,Rn)
And the * means matrix product? Not elementwise product (.* in MATLAB)? (R1,...,Rn) would have size (n,1). (x1,...,xn)' size (1,n). The product (1,1).
By the way, that raises another difference. MATLAB expands dimensions to the right (n,1,1...). numpy expands them to the left (1,1,n) (if needed by broadcasting). The initial dimensions are the outermost ones. That's not as critical a difference as the lower size 2 boundary, but shouldn't be ignored.

Difference between numpy dot() and Python 3.5+ matrix multiplication #

I recently moved to Python 3.5 and noticed the new matrix multiplication operator (#) sometimes behaves differently from the numpy dot operator. In example, for 3d arrays:
import numpy as np
a = np.random.rand(8,13,13)
b = np.random.rand(8,13,13)
c = a # b # Python 3.5+
d = np.dot(a, b)
The # operator returns an array of shape:
c.shape
(8, 13, 13)
while the np.dot() function returns:
d.shape
(8, 13, 8, 13)
How can I reproduce the same result with numpy dot? Are there any other significant differences?

The # operator calls the array's __matmul__ method, not dot. This method is also present in the API as the function np.matmul.
>>> a = np.random.rand(8,13,13)
>>> b = np.random.rand(8,13,13)
>>> np.matmul(a, b).shape
(8, 13, 13)
From the documentation:
matmul differs from dot in two important ways.
Multiplication by scalars is not allowed.
Stacks of matrices are broadcast together as if the matrices were elements.
The last point makes it clear that dot and matmul methods behave differently when passed 3D (or higher dimensional) arrays. Quoting from the documentation some more:
For matmul:
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
For np.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b

Just FYI, # and its numpy equivalents dot and matmul are all equally fast. (Plot created with perfplot, a project of mine.)
Code to reproduce the plot:
import perfplot
import numpy
def setup(n):
A = numpy.random.rand(n, n)
x = numpy.random.rand(n)
return A, x
def at(A, x):
return A # x
def numpy_dot(A, x):
return numpy.dot(A, x)
def numpy_matmul(A, x):
return numpy.matmul(A, x)
perfplot.show(
setup=setup,
kernels=[at, numpy_dot, numpy_matmul],
n_range=[2 ** k for k in range(15)],
)

The answer by #ajcr explains how the dot and matmul (invoked by the # symbol) differ. By looking at a simple example, one clearly sees how the two behave differently when operating on 'stacks of matricies' or tensors.
To clarify the differences take a 4x4 array and return the dot product and matmul product with a 3x4x2 'stack of matricies' or tensor.
import numpy as np
fourbyfour = np.array([
[1,2,3,4],
[3,2,1,4],
[5,4,6,7],
[11,12,13,14]
])
threebyfourbytwo = np.array([
[[2,3],[11,9],[32,21],[28,17]],
[[2,3],[1,9],[3,21],[28,7]],
[[2,3],[1,9],[3,21],[28,7]],
])
print('4x4*3x4x2 dot:\n {}\n'.format(np.dot(fourbyfour,threebyfourbytwo)))
print('4x4*3x4x2 matmul:\n {}\n'.format(np.matmul(fourbyfour,threebyfourbytwo)))
The products of each operation appear below. Notice how the dot product is,
...a sum product over the last axis of a and the second-to-last of b
and how the matrix product is formed by broadcasting the matrix together.
4x4*3x4x2 dot:
[[[232 152]
[125 112]
[125 112]]
[[172 116]
[123 76]
[123 76]]
[[442 296]
[228 226]
[228 226]]
[[962 652]
[465 512]
[465 512]]]
4x4*3x4x2 matmul:
[[[232 152]
[172 116]
[442 296]
[962 652]]
[[125 112]
[123 76]
[228 226]
[465 512]]
[[125 112]
[123 76]
[228 226]
[465 512]]]

In mathematics, I think the dot in numpy makes more sense
dot(a,b)_{i,j,k,a,b,c} =
since it gives the dot product when a and b are vectors, or the matrix multiplication when a and b are matrices
As for matmul operation in numpy, it consists of parts of dot result, and it can be defined as
matmul(a,b)_{i,j,k,c} =
So, you can see that matmul(a,b) returns an array with a small shape,
which has smaller memory consumption and make more sense in applications.
In particular, combining with broadcasting, you can get
matmul(a,b)_{i,j,k,l} =
for example.
From the above two definitions, you can see the requirements to use those two operations. Assume a.shape=(s1,s2,s3,s4) and b.shape=(t1,t2,t3,t4)
To use dot(a,b) you need
t3=s4;
To use matmul(a,b) you need
t3=s4
t2=s2, or one of t2 and s2 is 1
t1=s1, or one of t1 and s1 is 1
Use the following piece of code to convince yourself.
import numpy as np
for it in xrange(10000):
a = np.random.rand(5,6,2,4)
b = np.random.rand(6,4,3)
c = np.matmul(a,b)
d = np.dot(a,b)
#print 'c shape: ', c.shape,'d shape:', d.shape
for i in range(5):
for j in range(6):
for k in range(2):
for l in range(3):
if not c[i,j,k,l] == d[i,j,k,j,l]:
print it,i,j,k,l,c[i,j,k,l]==d[i,j,k,j,l] # you will not see them

Here is a comparison with np.einsum to show how the indices are projected
np.allclose(np.einsum('ijk,ijk->ijk', a,b), a*b) # True
np.allclose(np.einsum('ijk,ikl->ijl', a,b), a#b) # True
np.allclose(np.einsum('ijk,lkm->ijlm',a,b), a.dot(b)) # True

My experience with MATMUL and DOT
I was constantly getting "ValueError: Shape of passed values is (200, 1), indices imply (200, 3)" when trying to use MATMUL. I wanted a quick workaround and found DOT to deliver the same functionality. I don't get any error using DOT. I get the correct answer
with MATMUL
X.shape
>>>(200, 3)
type(X)
>>>pandas.core.frame.DataFrame
w
>>>array([0.37454012, 0.95071431, 0.73199394])
YY = np.matmul(X,w)
>>> ValueError: Shape of passed values is (200, 1), indices imply (200, 3)"
with DOT
YY = np.dot(X,w)
# no error message
YY
>>>array([ 2.59206877, 1.06842193, 2.18533396, 2.11366346, 0.28505879, …
YY.shape
>>> (200, )

Divide an array of arrays by an array of scalars

If I have an array, A, with shape (n, m, o) and an array, B, with shape (n, m), is there a way to divide each array at A[n, m] by the scalar at B[n, m] without a list comprehension?
>>> A.shape
(4,173,1469)
>>> B.shape
(4,173)
>>> # Better way to do:
>>> np.array([[A[i, j] / B[i, j] for j in range(len(B[i]))] for i in range(len(B))])
The problem with a list comprehension is that it is slow, it doesn't return an array (so you have to np.array(_) it, which makes it even slower), it is hard to read, and the whole point of numpy was to move loops from Python to C++ or Fortran.
If A was of shape (n) and B was a scalar (of shape ( )), then this would be trivial: A / B, but this property does not scale with dimensions
>>> A / B
ValueError: operands could not be broadcast together with shapes (4,173,1469) (4,173)
I am looking for a fast way to do this (preferably not by tiling B to an array of shape (n, m, o), and preferably using native numpy tools).

You are absolutely right, there is a better way, I think you are getting the spirit of numpy.
The solution in your case is that you have to add a new dimension to B that consists of one entry in that dimension:
so if your A is of shape (n,m,o) your B has to be of shape (n,m,1) and then you can use the native broadcasting to get your operation "A/B" done.
You can just add that dimension to be by adding a "newaxis" to B there.
import numpy as np
A = np.ones(10,5,3)
B = np.ones(10,5)
Result = A/B[:,:,np.newaxis]
B[:,:,np.newaxis] --> this will turn B into an array of shape of (10,5,1)

From here, the rules of broadcasting are:
When operating on two arrays, NumPy compares their shapes
element-wise. It starts with the trailing dimensions, and works its
way forward. Two dimensions are compatible when
they are equal, or
one of them is 1
Your dimensions are n,m,o and n,m so not compatible.
The / division operator will work using broadcasting if you use:
o,n,m divided by n,m
n,m,o divided by n,m,1

numpy.shape gives inconsistent responses - why?

Why does the program
import numpy as np
c = np.array([1,2])
print(c.shape)
d = np.array([[1],[2]]).transpose()
print(d.shape)
give
(2,)
(1,2)
as its output? Shouldn't it be
(1,2)
(1,2)
instead? I got this in both python 2.7.3 and python 3.2.3

When you invoke the .shape attribute of a ndarray, you get a tuple with as many elements as dimensions of your array. The length, ie, the number of rows, is the first dimension (shape[0])
You start with an array : c=np.array([1,2]). That's a plain 1D array, so its shape will be a 1-element tuple, and shape[0] is the number of elements, so c.shape = (2,)
Consider c=np.array([[1,2]]). That's a 2D array, with 1 row. The first and only row is [1,2], that gives us two columns. Therefore, c.shape=(1,2) and len(c)=1
Consider c=np.array([[1,],[2,]]). Another 2D array, with 2 rows, 1 column: c.shape=(2,1) and len(c)=2.
Consider d=np.array([[1,],[2,]]).transpose(): this array is the same as np.array([[1,2]]), therefore its shape is (1,2).
Another useful attribute is .size: that's the number of elements across all dimensions, and you have for an array c c.size = np.product(c.shape).
More information on the shape in the documentation.

len(c.shape) is the "depth" of the array.
For c, the array is just a list (a vector), the depth is 1.
For d, the array is a list of lists, the depth is 2.
Note:
c.transpose()
# array([1, 2])
which is not d, so this behaviour is not inconsistent.
dt = d.transpose()
# array([[1],
# [2]])
dt.shape # (2,1)

Quick Fix: check the .ndim property - if its 2, then the .shape property will work as you expect.
Reason Why: if the .ndim property is 2, then numpy reports a shape value that agrees with the convention. If the .ndim property is 1, then numpy just reports shape in a different way.
More talking: When you pass np.array a lists of lists, the .shape property will agree with standard notions of the dimensions of a matrix: (rows, columns).
If you pass np.array just a list, then numpy doesn't think it has a matrix on its hands, and reports the shape in a different way.
The question is: does numpy think it has a matrix, or does it think it has something else on its hands.

transpose does not change the number of dimensions of the array. If c.ndim == 1, c.transpose() == c. Try:
c = np.array([1,2])
print c.shape
print c.T.shape
c = np.atleast_2d(c)
print c.shape
print c.T.shape

Coming from Matlab, I also find it difficult that a single-dimensional array is not organized as (row_count, colum_count)
My function had to respond consistently on a single-dimensional ndarray like [x1, x2, x3] or a list of arrays [[x1, x2, x3], [x1, x2, x3], [x1, x2, x3]].
This worked for me:
dim = np.shape(subtract_matrix)[-1]
Picking the last dimension.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.