Numpy dot product problems

Numpy dot product problems - python

A=np.array([
[1,2],
[3,4]
])
B=np.ones(2)
A is clearly of shape 2X2
How does numpy allow me to compute a dot product np.dot(A,B)
[1,2] (dot) [1,1]
[3,4]
B has to have dimensions of 2X1 for a dot product or rather this
[1,2] (dot) [1]
[3,4] [1]
This is a very silly question but i am not able to figure out where i am going wrong here?
Earlier i used to think that np.ones(2) would give me this:
[1]
[1]
But it gives me this:
[1,1]

I'm copying part of an answer I wrote earlier today:
You should resist the urge to think of numpy arrays as having rows
and columns, but instead consider them as having dimensions and
shape. This is an important point which differentiates np.array and np.matrix:
x = np.array([1, 2, 3])
print(x.ndim, x.shape) # 1 (3,)
y = np.matrix([1, 2, 3])
print(y.ndim, y.shape) # 2 (1, 3)
An n-D array can only use n integer(s) to represent its shape.
Therefore, a 1-D array only uses 1 integer to specify its shape.
In practice, combining calculations between 1-D and 2-D arrays is not
a problem for numpy, and syntactically clean since # matrix
operation was introduced in Python 3.5. Therefore, there is rarely a
need to resort to np.matrix in order to satisfy the urge to see
expected row and column counts.

This behavior is by design. The NumPy docs state:
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.

Most of the rules for vector and matrix shapes relating to the dot product exist mostly in order to have a coherent method that scales up into higher tensor orders. But they aren't very important when dealing with 1st order (vectors) and 2nd order (matrix) tensors. And those orders are what the vast majority of numpy users need.
As a result, # and np.dot are optimized (both mathematically and input parsing) for those orders, always summing over the last axis of the first and the second to last axis (if applicable) of the second. The "if applicable" is sort of an idiot-proofing to assure the output is what is expected in the vast majority of cases, even if the shapes don't technically fit.
Those of us who use higher-order tensors, meanwhile, are relegated to np.tensordot or np.einsum, which come complete with all the niggling little rules about dimension matching.

Related

Does np.dot automatically transpose vectors?

I am trying to calculate the first and second order moments for a portfolio of stocks (i.e. expected return and standard deviation).
expected_returns_annual
Out[54]:
ticker
adj_close CNP 0.091859
F -0.007358
GE 0.095399
TSLA 0.204873
WMT -0.000943
dtype: float64
type(expected_returns_annual)
Out[55]: pandas.core.series.Series
weights = np.random.random(num_assets)
weights /= np.sum(weights)
returns = np.dot(expected_returns_annual, weights)
So normally the expected return is calculated by
(x1,...,xn' * (R1,...,Rn)
with x1,...,xn are weights with a constraint that all the weights have to sum up to 1 and ' means that the vector is transposed.
Now I am wondering a bit about the numpy dot function, because
returns = np.dot(expected_returns_annual, weights)
and
returns = np.dot(expected_returns_annual, weights.T)
give the same results.
I tested also the shape of weights.T and weights.
weights.shape
Out[58]: (5,)
weights.T.shape
Out[59]: (5,)
The shape of weights.T should be (,5) and not (5,), but numpy displays them as equal (I also tried np.transpose, but there is the same result)
Does anybody know why numpy behave this way? In my opinion the np.dot product automatically shape the vector the right why so that the vector product work well. Is that correct?
Best regards
Tom

The semantics of np.dot are not great
As Dominique Paul points out, np.dot has very heterogenous behavior depending on the shapes of the inputs. Adding to the confusion, as the OP points out in his question, given that weights is a 1D array, np.array_equal(weights, weights.T) is True (array_equal tests for equality of both value and shape).
Recommendation: use np.matmul or the equivalent # instead
If you are someone just starting out with Numpy, my advice to you would be to ditch np.dot completely. Don't use it in your code at all. Instead, use np.matmul, or the equivalent operator #. The behavior of # is more predictable than that of np.dot, while still being convenient to use. For example, you would get the same dot product for the two 1D arrays you have in your code like so:
returns = expected_returns_annual # weights
You can prove to yourself that this gives the same answer as np.dot with this assert:
assert expected_returns_annual # weights == expected_returns_annual.dot(weights)
Conceptually, # handles this case by promoting the two 1D arrays to appropriate 2D arrays (though the implementation doesn't necessarily do this). For example, if you have x with shape (N,) and y with shape (M,), if you do x # y the shapes will be promoted such that:
x.shape == (1, N)
y.shape == (M, 1)
Complete behavior of matmul/#
Here's what the docs have to say about matmul/# and the shapes of inputs/outputs:
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.
Notes: the arguments for using # over dot
As hpaulj points out in the comments, np.array_equal(x.dot(y), x # y) for all x and y that are 1D or 2D arrays. So why do I (and why should you) prefer #? I think the best argument for using # is that it helps to improve your code in small but significant ways:
# is explicitly a matrix multiplication operator. x # y will raise an error if y is a scalar, whereas dot will make the assumption that you actually just wanted elementwise multiplication. This can potentially result in a hard-to-localize bug in which dot silently returns a garbage result (I've personally run into that one). Thus, # allows you to be explicit about your own intent for the behavior of a line of code.
Because # is an operator, it has some nice short syntax for coercing various sequence types into arrays, without having to explicitly cast them. For example, [0,1,2] # np.arange(3) is valid syntax.
To be fair, while [0,1,2].dot(arr) is obviously not valid, np.dot([0,1,2], arr) is valid (though more verbose than using #).
When you do need to extend your code to deal with many matrix multiplications instead of just one, the ND cases for # are a conceptually straightforward generalization/vectorization of the lower-D cases.

I had the same question some time ago. It seems that when one of your matrices is one dimensional, then numpy will figure out automatically what you are trying to do.
The documentation for the dot function has a more specific explanation of the logic applied:
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using
matmul or a # b is preferred.
If either a or b is 0-D (scalar), it is equivalent to multiply and
using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum product over
the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a sum
product over the last axis of a and the second-to-last axis of b:

In NumPy, a transpose .T reverses the order of dimensions, which means that it doesn't do anything to your one-dimensional array weights.
This is a common source of confusion for people coming from Matlab, in which one-dimensional arrays do not exist. See Transposing a NumPy Array for some earlier discussion of this.
np.dot(x,y) has complicated behavior on higher-dimensional arrays, but its behavior when it's fed two one-dimensional arrays is very simple: it takes the inner product. If we wanted to get the equivalent result as a matrix product of a row and column instead, we'd have to write something like
np.asscalar(x # y[:, np.newaxis])
adding a trailing dimension to y to turn it into a "column", multiplying, and then converting our one-element array back into a scalar. But np.dot(x,y) is much faster and more efficient, so we just use that.
Edit: actually, this was dumb on my part. You can, of course, just write matrix multiplication x # y to get equivalent behavior to np.dot for one-dimensional arrays, as tel's excellent answer points out.

The shape of weights.T should be (,5) and not (5,),
suggests some confusion over the shape attribute. shape is an ordinary Python tuple, i.e. just a set of numbers, one for each dimension of the array. That's analogous to the size of a MATLAB matrix.
(5,) is just the way of displaying a 1 element tuple. The , is required because of older Python history of using () as a simple grouping.
In [22]: tuple([5])
Out[22]: (5,)
Thus the , in (5,) does not have a special numpy meaning, and
In [23]: (,5)
File "<ipython-input-23-08574acbf5a7>", line 1
(,5)
^
SyntaxError: invalid syntax
A key difference between numpy and MATLAB is that arrays can have any number of dimensions (upto 32). MATLAB has a lower boundary of 2.
The result is that a 5 element numpy array can have shapes (5,), (1,5), (5,1), (1,5,1)`, etc.
The handling of a 1d weight array in your example is best explained the np.dot documentation. Describing it as inner product seems clear enough to me. But I'm also happy with the
sum product over the last axis of a and the second-to-last axis of b
description, adjusted for the case where b has only one axis.
(5,) with (5,n) => (n,) # 5 is the common dimension
(n,5) with (5,) => (n,)
(n,5) with (5,1) => (n,1)
In:
(x1,...,xn' * (R1,...,Rn)
are you missing a )?
(x1,...,xn)' * (R1,...,Rn)
And the * means matrix product? Not elementwise product (.* in MATLAB)? (R1,...,Rn) would have size (n,1). (x1,...,xn)' size (1,n). The product (1,1).
By the way, that raises another difference. MATLAB expands dimensions to the right (n,1,1...). numpy expands them to the left (1,1,n) (if needed by broadcasting). The initial dimensions are the outermost ones. That's not as critical a difference as the lower size 2 boundary, but shouldn't be ignored.

What does (n,) mean in the context of numpy and vectors?

I've tried searching StackOverflow, googling, and even using symbolhound to do character searches, but was unable to find an answer. Specifically, I'm confused about Ch. 1 of Nielsen's Neural Networks and Deep Learning, where he says "It is assumed that the input a is an (n, 1) Numpy ndarray, not a (n,) vector."
At first I thought (n,) referred to the orientation of the array - so it might refer to a one-column vector as opposed to a vector with only one row. But then I don't see why we need (n,) and (n, 1) both - they seem to say the same thing. I know I'm misunderstanding something but am unsure.
For reference a refers to a vector of activations that will be input to a given layer of a neural network, before being transformed by the weights and biases to produce the output vector of activations for the next layer.
EDIT: This question equivocates between a "one-column vector" (there's no such thing) and a "one-column matrix" (does actually exist). Same for "one-row vector" and "one-row matrix".
A vector is only a list of numbers, or (equivalently) a list of scalar transformations on the basis vectors of a vector space. A vector might look like a matrix when we write it out, if it only has one row (or one column). Confusingly, we will sometimes refer to a "vector of activations" but actually mean "a single-row matrix of activation values transposed so that it is a single-column."
Be aware that in neither case are we discussing a one-dimensional vector, which would be a vector defined by only one number (unless, trivially, n==1, in which case the concept of a "column" or "row" distinction would be meaningless).

In numpy an array can have a number of different dimensions, 0, 1, 2 etc.
The typical 2d array has dimension (n,m) (this is a Python tuple). We tend to describe this as having n rows, m columns. So a (n,1) array has just 1 column, and a (1,m) has 1 row.
But because an array may have just 1 dimension, it is possible to have a shape (n,) (Python notation for a 1 element tuple: see here for more).
For many purposes (n,), (1,n), (n,1) arrays are equivalent (also (1,n,1,1) (4d)). They all have n terms, and can be reshaped to each other.
But sometimes that extra 1 dimension matters. A (1,m) array can multiply a (n,1) array to produce a (n,m) array. A (n,1) array can be indexed like a (n,m), with 2 indices, x[:,0] where as a (n,) only accepts x[0].
MATLAB matrices are always 2d (or higher). So people transfering ideas from MATLAB tend to expect 2 dimensions. There is a np.matrix subclass that supposed to imitate that.
For numpy programmers the distinctions between vector, row vector, column vector, matrix are loose and relatively unimportant. Or the use is derived from the application rather than from numpy itself. I think that's what's happening with this network book - the notation and expectations come from outside of numpy.
See as well this answer for how to interpret the shapes with respect to the data stored in ndarrays. It also provides insight on how to use .reshape: https://stackoverflow.com/a/22074424/3277902

(n,) is a tuple of length 1, whose only element is n. (The syntax isn't (n) because that's just n instead of making a tuple.)
If an array has shape (n,), that means it's a 1-dimensional array with a length of n along its only dimension. It's not a row vector or a column vector; it doesn't have rows or columns. It's just a vector.

Axis elimination

I'm having a trouble understanding the concept of Axis elimination in numpy. Suppose I have the following 2D matrix:
A =
1 2 3
3 4 5
6 7 8
Ok I understand that sum(A, axis=0) will sum each column down and will give a 1D array with 3 elements. I also understand that sum(A, axis=1) will sum each row.
But my trouble is when I read that axis=0 eliminates the 0th axis and axis=1 eliminates the 1th axis. Also sometime people mention "reduce" instead of "eliminate". I'm unable to understand what does that eliminate. For example sum(A, axis=0) will sum each column from top to bottom, but I don't see elimination or reduction here. What's the point? The same also for sum(A,axis=1).
AND how is it for higher dimensions?
p.s. I always confused between matrix dimensions and array dimensions. I wished that people who write the numpy documentation makes this distinction very clear.

http://docs.scipy.org/doc/numpy/reference/generated/numpy.ufunc.reduce.html
Reduces a‘s dimension by one, by applying ufunc along one axis.
For example, add.reduce() is equivalent to sum().
In numpy, the base class is ndarray - a multidimensional array (can 0d, 1d, or more)
http://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html
Matrix is a subclass of array
http://docs.scipy.org/doc/numpy/reference/arrays.classes.html
Matrix objects are always two-dimensional
The history of the numpy Matrix is old, but basically it's meant to resemble the MATLAB matrix object. In the original MATLAB nearly everything was a matrix, which was always 2d. Later they generalized it to allow more dimensions. But it can't have fewer dimensions. MATLAB does have 'vectors', but they are just matrices with one dimension being 1 (row vector versus column vector).
'axis elimination' is not a common term when working with numpy. It could, conceivably, refer to any of several ways that reduce the number of dimensions of an array. Reduction, as in sum(), is one. Indexing is another: a[:,0,:]. Reshaping can also change the number of dimensions. np.squeeze is another.

Confusion in array operation in numpy

I generally use MATLAB and Octave, and i recently switching to python numpy.
In numpy when I define an array like this
>>> a = np.array([[2,3],[4,5]])
it works great and size of the array is
>>> a.shape
(2, 2)
which is also same as MATLAB
But when i extract the first entire column and see the size
>>> b = a[:,0]
>>> b.shape
(2,)
I get size (2,), what is this? I expect the size to be (2,1). Perhaps i misunderstood the basic concept. Can anyone make me clear about this??

A 1D numpy array* is literally 1D - it has no size in any second dimension, whereas in MATLAB, a '1D' array is actually 2D, with a size of 1 in its second dimension.
If you want your array to have size 1 in its second dimension you can use its .reshape() method:
a = np.zeros(5,)
print(a.shape)
# (5,)
# explicitly reshape to (5, 1)
print(a.reshape(5, 1).shape)
# (5, 1)
# or use -1 in the first dimension, so that its size in that dimension is
# inferred from its total length
print(a.reshape(-1, 1).shape)
# (5, 1)
Edit
As Akavall pointed out, I should also mention np.newaxis as another method for adding a new axis to an array. Although I personally find it a bit less intuitive, one advantage of np.newaxis over .reshape() is that it allows you to add multiple new axes in an arbitrary order without explicitly specifying the shape of the output array, which is not possible with the .reshape(-1, ...) trick:
a = np.zeros((3, 4, 5))
print(a[np.newaxis, :, np.newaxis, ..., np.newaxis].shape)
# (1, 3, 1, 4, 5, 1)
np.newaxis is just an alias of None, so you could do the same thing a bit more compactly using a[None, :, None, ..., None].
* An np.matrix, on the other hand, is always 2D, and will give you the indexing behavior you are familiar with from MATLAB:
a = np.matrix([[2, 3], [4, 5]])
print(a[:, 0].shape)
# (2, 1)
For more info on the differences between arrays and matrices, see here.

Typing help(np.shape) gives some insight in to what is going on here. For starters, you can get the output you expect by typing:
b = np.array([a[:,0]])
Basically numpy defines things a little differently than MATLAB. In the numpy environment, a vector only has one dimension, and an array is a vector of vectors, so it can have more. In your first example, your array is a vector of two vectors, i.e.:
a = np.array([[vec1], [vec2]])
So a has two dimensions, and in your example the number of elements in both dimensions is the same, 2. Your array is therefore 2 by 2. When you take a slice out of this, you are reducing the number of dimensions that you have by one. In other words, you are taking a vector out of your array, and that vector only has one dimension, which also has 2 elements, but that's it. Your vector is now 2 by _. There is nothing in the second spot because the vector is not defined there.
You could think of it in terms of spaces too. Your first array is in the space R^(2x2) and your second vector is in the space R^(2). This means that the array is defined on a different (and bigger) space than the vector.
That was a lot to basically say that you took a slice out of your array, and unlike MATLAB, numpy does not represent vectors (1 dimensional) in the same way as it does arrays (2 or more dimensions).

difference between numpy dot() and inner()

What is the difference between
import numpy as np
np.dot(a,b)
and
import numpy as np
np.inner(a,b)
all examples I tried returned the same result. Wikipedia has the same article for both?! In the description of inner() it says, that its behavior is different in higher dimensions, but I couldn't produce any different output. Which one should I use?

numpy.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
numpy.inner:
Ordinary inner product of vectors for 1-D arrays (without complex conjugation), in higher dimensions a sum product over the last axes.
(Emphasis mine.)
As an example, consider this example with 2D arrays:
>>> a=np.array([[1,2],[3,4]])
>>> b=np.array([[11,12],[13,14]])
>>> np.dot(a,b)
array([[37, 40],
[85, 92]])
>>> np.inner(a,b)
array([[35, 41],
[81, 95]])
Thus, the one you should use is the one that gives the correct behaviour for your application.
Performance testing
(Note that I am testing only the 1D case, since that is the only situation where .dot and .inner give the same result.)
>>> import timeit
>>> setup = 'import numpy as np; a=np.random.random(1000); b = np.random.random(1000)'
>>> [timeit.timeit('np.dot(a,b)',setup,number=1000000) for _ in range(3)]
[2.6920320987701416, 2.676928997039795, 2.633111000061035]
>>> [timeit.timeit('np.inner(a,b)',setup,number=1000000) for _ in range(3)]
[2.588860034942627, 2.5845699310302734, 2.6556360721588135]
So maybe .inner is faster, but my machine is fairly loaded at the moment, so the timings are not consistent nor are they necessarily very accurate.

np.dot and np.inner are identical for 1-dimensions arrays, so that is probably why you aren't noticing any differences. For N-dimension arrays, they correspond to common tensor operations.
np.inner is sometimes called a "vector product" between a higher and lower order tensor, particularly a tensor times a vector, and often leads to "tensor contraction". It includes matrix-vector multiplication.
np.dot corresponds to a "tensor product", and includes the case mentioned at the bottom of the Wikipedia page. It is generally used for multiplication of two similar tensors to produce a new tensor. It includes matrix-matrix multiplication.
If you're not using tensors, then you don't need to worry about these cases and they behave identically.

For 1 and 2 dimensional arrays numpy.inner works as transpose the second matrix then multiply.
So for:
A = [[a1,b1],[c1,d1]]
B = [[a2,b2],[c2,d2]]
numpy.inner(A,B)
array([[a1*a2 + b1*b2, a1*c2 + b1*d2],
[c1*a2 + d1*b2, c1*c2 + d1*d2])
I worked this out using examples like:
A=[[1 ,10], [100,1000]]
B=[[1,2], [3,4]]
numpy.inner(A,B)
array([[ 21, 43],
[2100, 4300]])
This also explains the behaviour in one dimension, numpy.inner([a,b],[c,b]) = ac+bd and numpy.inner([[a],[b]], [[c],[d]]) = [[ac,ad],[bc,bd]].
This is the extent of my knowledge, no idea what it does for higher dimensions.

inner is not working properly with complex 2D arrays, Try to multiply
and its transpose
array([[ 1.+1.j, 4.+4.j, 7.+7.j],
[ 2.+2.j, 5.+5.j, 8.+8.j],
[ 3.+3.j, 6.+6.j, 9.+9.j]])
you will get
array([[ 0. +60.j, 0. +72.j, 0. +84.j],
[ 0.+132.j, 0.+162.j, 0.+192.j],
[ 0.+204.j, 0.+252.j, 0.+300.j]])
effectively multiplying the rows to rows rather than rows to columns

There is a lot difference between inner product and dot product in higher dimensional space. below is an example of a 2x2 matrix and 3x2 matrix
x = [[a1,b1],[c1,d1]]
y= [[a2,b2].[c2,d2],[e2,f2]
np.inner(x,y)
output = [[a1xa2+b1xb2 ,a1xc2+b1xd2, a1xe2+b1f2],[c1xa2+d1xb2, c1xc2+d1xd2, c1xe2+d1xf2]]
But in the case of dot product the output shows the below error as you cannot multiply a 2x2 matrix with a 3x2.
ValueError: shapes (2,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)

I made a quick script to practice inner and dot product math. It really helped me get a feel for the difference:
You can find the code here:
https://github.com/geofflangenderfer/practice_inner_dot

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.