Python command np.sum(x, axis=0) and softmax function

Python command np.sum(x, axis=0) and softmax function - python

I have the following problem: I want to compute the softmax function in Python and get an unexpected result. The code is the following:
import numpy as np
def softmax(x):
"""Compute softmax values for each sets of scores in x."""
return np.exp(x) / np.sum(np.exp(x), axis=0)
It works perfectly but I don´t know why: It works on matrices as follows: If I insert a 2x2 matrix A, the output is yet another 2x2 matrix. Why is that? Shouldn´t it return a differently sized array since every element of the matrix, i.e. $x=A[0,0]$, yields 2 output values (namely $exp(x)/(exp(A[0,0])+exp(A[1,0]))$ and $exp(x)/(exp(A[0,1])+exp(A[1,1]))$, because or the axis=0 command? That would lead to an 8-element output array, but the actual result only has 4 elements. Also, how exactly does the axis=0 command work? If I type A=np.array([2, 4]), then the logical result of np.sum(A, axis=0) should be array([2, 4]), since the columns are summed up. But the result is array([6]). And the command np.sum(A, axis=1) strangely yields "'axis' entry is out of bounds", although the result should be array([6]) since the rows are summed up. Maybe my two problems are linked.
Any help will be appreciated!
Thanks,
Leon

I will jump into the "final" problem:
matrix_22 / vector_2
Because that does not make mathematical sense, numpy uses a certain assumption. Just as:
matrix_22 * 5
what that does is multiplying each element of the matrix by 5. Then if we consider a matrix_22 as a vector of vectors, then the result of the matrix_22 / vector_2 results on applying the operation division for each vector on the matrix.
You can easily check that behaviour executing the following:
np.array([[14, 28], [70, 56]]) / np.array([2, 7])
Notation: matrix_22 is "some variable which contains a numpy array of shape 2x2, so it is a 2x2 matrix". And vector_2 is a numpy array of two elements.

Related

Does np.dot automatically transpose vectors?

I am trying to calculate the first and second order moments for a portfolio of stocks (i.e. expected return and standard deviation).
expected_returns_annual
Out[54]:
ticker
adj_close CNP 0.091859
F -0.007358
GE 0.095399
TSLA 0.204873
WMT -0.000943
dtype: float64
type(expected_returns_annual)
Out[55]: pandas.core.series.Series
weights = np.random.random(num_assets)
weights /= np.sum(weights)
returns = np.dot(expected_returns_annual, weights)
So normally the expected return is calculated by
(x1,...,xn' * (R1,...,Rn)
with x1,...,xn are weights with a constraint that all the weights have to sum up to 1 and ' means that the vector is transposed.
Now I am wondering a bit about the numpy dot function, because
returns = np.dot(expected_returns_annual, weights)
and
returns = np.dot(expected_returns_annual, weights.T)
give the same results.
I tested also the shape of weights.T and weights.
weights.shape
Out[58]: (5,)
weights.T.shape
Out[59]: (5,)
The shape of weights.T should be (,5) and not (5,), but numpy displays them as equal (I also tried np.transpose, but there is the same result)
Does anybody know why numpy behave this way? In my opinion the np.dot product automatically shape the vector the right why so that the vector product work well. Is that correct?
Best regards
Tom

The semantics of np.dot are not great
As Dominique Paul points out, np.dot has very heterogenous behavior depending on the shapes of the inputs. Adding to the confusion, as the OP points out in his question, given that weights is a 1D array, np.array_equal(weights, weights.T) is True (array_equal tests for equality of both value and shape).
Recommendation: use np.matmul or the equivalent # instead
If you are someone just starting out with Numpy, my advice to you would be to ditch np.dot completely. Don't use it in your code at all. Instead, use np.matmul, or the equivalent operator #. The behavior of # is more predictable than that of np.dot, while still being convenient to use. For example, you would get the same dot product for the two 1D arrays you have in your code like so:
returns = expected_returns_annual # weights
You can prove to yourself that this gives the same answer as np.dot with this assert:
assert expected_returns_annual # weights == expected_returns_annual.dot(weights)
Conceptually, # handles this case by promoting the two 1D arrays to appropriate 2D arrays (though the implementation doesn't necessarily do this). For example, if you have x with shape (N,) and y with shape (M,), if you do x # y the shapes will be promoted such that:
x.shape == (1, N)
y.shape == (M, 1)
Complete behavior of matmul/#
Here's what the docs have to say about matmul/# and the shapes of inputs/outputs:
If both arguments are 2-D they are multiplied like conventional matrices.
If either argument is N-D, N > 2, it is treated as a stack of matrices residing in the last two indexes and broadcast accordingly.
If the first argument is 1-D, it is promoted to a matrix by prepending a 1 to its dimensions. After matrix multiplication the prepended 1 is removed.
If the second argument is 1-D, it is promoted to a matrix by appending a 1 to its dimensions. After matrix multiplication the appended 1 is removed.
Notes: the arguments for using # over dot
As hpaulj points out in the comments, np.array_equal(x.dot(y), x # y) for all x and y that are 1D or 2D arrays. So why do I (and why should you) prefer #? I think the best argument for using # is that it helps to improve your code in small but significant ways:
# is explicitly a matrix multiplication operator. x # y will raise an error if y is a scalar, whereas dot will make the assumption that you actually just wanted elementwise multiplication. This can potentially result in a hard-to-localize bug in which dot silently returns a garbage result (I've personally run into that one). Thus, # allows you to be explicit about your own intent for the behavior of a line of code.
Because # is an operator, it has some nice short syntax for coercing various sequence types into arrays, without having to explicitly cast them. For example, [0,1,2] # np.arange(3) is valid syntax.
To be fair, while [0,1,2].dot(arr) is obviously not valid, np.dot([0,1,2], arr) is valid (though more verbose than using #).
When you do need to extend your code to deal with many matrix multiplications instead of just one, the ND cases for # are a conceptually straightforward generalization/vectorization of the lower-D cases.

I had the same question some time ago. It seems that when one of your matrices is one dimensional, then numpy will figure out automatically what you are trying to do.
The documentation for the dot function has a more specific explanation of the logic applied:
If both a and b are 1-D arrays, it is inner product of vectors
(without complex conjugation).
If both a and b are 2-D arrays, it is matrix multiplication, but using
matmul or a # b is preferred.
If either a or b is 0-D (scalar), it is equivalent to multiply and
using numpy.multiply(a, b) or a * b is preferred.
If a is an N-D array and b is a 1-D array, it is a sum product over
the last axis of a and b.
If a is an N-D array and b is an M-D array (where M>=2), it is a sum
product over the last axis of a and the second-to-last axis of b:

In NumPy, a transpose .T reverses the order of dimensions, which means that it doesn't do anything to your one-dimensional array weights.
This is a common source of confusion for people coming from Matlab, in which one-dimensional arrays do not exist. See Transposing a NumPy Array for some earlier discussion of this.
np.dot(x,y) has complicated behavior on higher-dimensional arrays, but its behavior when it's fed two one-dimensional arrays is very simple: it takes the inner product. If we wanted to get the equivalent result as a matrix product of a row and column instead, we'd have to write something like
np.asscalar(x # y[:, np.newaxis])
adding a trailing dimension to y to turn it into a "column", multiplying, and then converting our one-element array back into a scalar. But np.dot(x,y) is much faster and more efficient, so we just use that.
Edit: actually, this was dumb on my part. You can, of course, just write matrix multiplication x # y to get equivalent behavior to np.dot for one-dimensional arrays, as tel's excellent answer points out.

The shape of weights.T should be (,5) and not (5,),
suggests some confusion over the shape attribute. shape is an ordinary Python tuple, i.e. just a set of numbers, one for each dimension of the array. That's analogous to the size of a MATLAB matrix.
(5,) is just the way of displaying a 1 element tuple. The , is required because of older Python history of using () as a simple grouping.
In [22]: tuple([5])
Out[22]: (5,)
Thus the , in (5,) does not have a special numpy meaning, and
In [23]: (,5)
File "<ipython-input-23-08574acbf5a7>", line 1
(,5)
^
SyntaxError: invalid syntax
A key difference between numpy and MATLAB is that arrays can have any number of dimensions (upto 32). MATLAB has a lower boundary of 2.
The result is that a 5 element numpy array can have shapes (5,), (1,5), (5,1), (1,5,1)`, etc.
The handling of a 1d weight array in your example is best explained the np.dot documentation. Describing it as inner product seems clear enough to me. But I'm also happy with the
sum product over the last axis of a and the second-to-last axis of b
description, adjusted for the case where b has only one axis.
(5,) with (5,n) => (n,) # 5 is the common dimension
(n,5) with (5,) => (n,)
(n,5) with (5,1) => (n,1)
In:
(x1,...,xn' * (R1,...,Rn)
are you missing a )?
(x1,...,xn)' * (R1,...,Rn)
And the * means matrix product? Not elementwise product (.* in MATLAB)? (R1,...,Rn) would have size (n,1). (x1,...,xn)' size (1,n). The product (1,1).
By the way, that raises another difference. MATLAB expands dimensions to the right (n,1,1...). numpy expands them to the left (1,1,n) (if needed by broadcasting). The initial dimensions are the outermost ones. That's not as critical a difference as the lower size 2 boundary, but shouldn't be ignored.

Numpy array and matrix multiplication

I am trying to get rid of the for loop and instead do an array-matrix multiplication to decrease the processing time when the weights array is very large:
import numpy as np
sequence = [np.random.random(10), np.random.random(10), np.random.random(10)]
weights = np.array([[0.1,0.3,0.6],[0.5,0.2,0.3],[0.1,0.8,0.1]])
Cov_matrix = np.matrix(np.cov(sequence))
results = []
for w in weights:
result = np.matrix(w)*Cov_matrix*np.matrix(w).T
results.append(result.A)
Where:
Cov_matrix is a 3x3 matrix
weights is an array of n lenght with n 1x3 matrices in it.
Is there a way to multiply/map weights to Cov_matrix and bypass the for loop? I am not very familiar with all the numpy functions.

I'd like to reiterate what's already been said in another answer: the np.matrix class has much more disadvantages than advantages these days, and I suggest moving to the use of the np.array class alone. Matrix multiplication of arrays can be easily written using the # operator, so the notation is in most cases as elegant as for the matrix class (and arrays don't have several restrictions that matrices do).
With that out of the way, what you need can be done in terms of a call to np.einsum. We need to contract certain indices of three matrices while keeping one index alone in two matrices. That is, we want to perform w_{ij} * Cov_{jk} * w.T_{ki} with a summation over j, k, giving us an array with i indices. The following call to einsum will do:
res = np.einsum('ij,jk,ik->i', weights, Cov_matrix, weights)
Note that the above will give you a single 1d array, whereas you originally had a list of arrays with shape (1,1). I suspect the above result will even make more sense. Also, note that I omitted the transpose in the second weights argument, and this is why the corresponding summation indices appear as ik rather than ki. This should be marginally faster.
To prove that the above gives the same result:
In [8]: results # original
Out[8]: [array([[0.02803215]]), array([[0.02280609]]), array([[0.0318784]])]
In [9]: res # einsum
Out[9]: array([0.02803215, 0.02280609, 0.0318784 ])

The same can be achieved by working with the weights as a matrix and then looking at the diagonal elements of the result. Namely:
np.diag(weights.dot(Cov_matrix).dot(weights.transpose()))
which gives:
array([0.03553664, 0.02394509, 0.03765553])
This does more calculations than necessary (calculates off-diagonals) so maybe someone will suggest a more efficient method.
Note: I'd suggest slowly moving away from np.matrix and instead work with np.array. It takes a bit of getting used to not being able to do A*b but will pay dividends in the long run. Here is a related discussion.

Numpy dot product problems

A=np.array([
[1,2],
[3,4]
])
B=np.ones(2)
A is clearly of shape 2X2
How does numpy allow me to compute a dot product np.dot(A,B)
[1,2] (dot) [1,1]
[3,4]
B has to have dimensions of 2X1 for a dot product or rather this
[1,2] (dot) [1]
[3,4] [1]
This is a very silly question but i am not able to figure out where i am going wrong here?
Earlier i used to think that np.ones(2) would give me this:
[1]
[1]
But it gives me this:
[1,1]

I'm copying part of an answer I wrote earlier today:
You should resist the urge to think of numpy arrays as having rows
and columns, but instead consider them as having dimensions and
shape. This is an important point which differentiates np.array and np.matrix:
x = np.array([1, 2, 3])
print(x.ndim, x.shape) # 1 (3,)
y = np.matrix([1, 2, 3])
print(y.ndim, y.shape) # 2 (1, 3)
An n-D array can only use n integer(s) to represent its shape.
Therefore, a 1-D array only uses 1 integer to specify its shape.
In practice, combining calculations between 1-D and 2-D arrays is not
a problem for numpy, and syntactically clean since # matrix
operation was introduced in Python 3.5. Therefore, there is rarely a
need to resort to np.matrix in order to satisfy the urge to see
expected row and column counts.

This behavior is by design. The NumPy docs state:
If a is an N-D array and b is a 1-D array, it is a sum product over the last axis of a and b.

Most of the rules for vector and matrix shapes relating to the dot product exist mostly in order to have a coherent method that scales up into higher tensor orders. But they aren't very important when dealing with 1st order (vectors) and 2nd order (matrix) tensors. And those orders are what the vast majority of numpy users need.
As a result, # and np.dot are optimized (both mathematically and input parsing) for those orders, always summing over the last axis of the first and the second to last axis (if applicable) of the second. The "if applicable" is sort of an idiot-proofing to assure the output is what is expected in the vast majority of cases, even if the shapes don't technically fit.
Those of us who use higher-order tensors, meanwhile, are relegated to np.tensordot or np.einsum, which come complete with all the niggling little rules about dimension matching.

Multiplying an array by a designated row vector of another matrix

Good afternoon all relatively simple`question here from a mechanical standpoint.
I'm currently performing PCA and have successfully written a code that computes the covariance matrix and correlation matrix, and the associated eigenspectrum.
Now, I have created an array that represents the eigenvectors row wise, and i would like to compute the transformation C*v^t, where c is the observation matrix and v^t is the element wise entries of the eigen vector transposed.
Now, since some of these matrices are pretty big-i'd like to be able to tell python which row of the eigenvector matrix to mulitply C by. So far I have tried some of the numpy functions, but to no avail.
(for those of you wondering, i don't want to compute the matrix product of all the eigen vecotrs, i only need to multiply by a small subset of them-the ones associated with the largest eigenvalues)
Thanks!

To "slice" a vector of row n out of 2-dimensional array A, you use a syntax like A[n]. If it's slicing columns you wanted instead, the syntax is A[:,n].
For transformations with numpy arrays and vectors, the syntax is with matrix multiplication operator:
>>> A = np.array([[0, -1], [1, 0]])
>>> vs = np.array([[1, 2], [3, 4]])
>>> A # vs[0] # this is a rotation of the first row of vs by A
array([-2, 1])
>>> A # vs[1] # this is a rotation of second row of vs by A
array([-4, 3])
Note: If you're on older python version (< 3.5), you might not have # available yet. Then you'll have to use a function np.dot(array, vector) instead of the operator.

difference between numpy dot() and inner()

What is the difference between
import numpy as np
np.dot(a,b)
and
import numpy as np
np.inner(a,b)
all examples I tried returned the same result. Wikipedia has the same article for both?! In the description of inner() it says, that its behavior is different in higher dimensions, but I couldn't produce any different output. Which one should I use?

numpy.dot:
For 2-D arrays it is equivalent to matrix multiplication, and for 1-D arrays to inner product of vectors (without complex conjugation). For N dimensions it is a sum product over the last axis of a and the second-to-last of b:
numpy.inner:
Ordinary inner product of vectors for 1-D arrays (without complex conjugation), in higher dimensions a sum product over the last axes.
(Emphasis mine.)
As an example, consider this example with 2D arrays:
>>> a=np.array([[1,2],[3,4]])
>>> b=np.array([[11,12],[13,14]])
>>> np.dot(a,b)
array([[37, 40],
[85, 92]])
>>> np.inner(a,b)
array([[35, 41],
[81, 95]])
Thus, the one you should use is the one that gives the correct behaviour for your application.
Performance testing
(Note that I am testing only the 1D case, since that is the only situation where .dot and .inner give the same result.)
>>> import timeit
>>> setup = 'import numpy as np; a=np.random.random(1000); b = np.random.random(1000)'
>>> [timeit.timeit('np.dot(a,b)',setup,number=1000000) for _ in range(3)]
[2.6920320987701416, 2.676928997039795, 2.633111000061035]
>>> [timeit.timeit('np.inner(a,b)',setup,number=1000000) for _ in range(3)]
[2.588860034942627, 2.5845699310302734, 2.6556360721588135]
So maybe .inner is faster, but my machine is fairly loaded at the moment, so the timings are not consistent nor are they necessarily very accurate.

np.dot and np.inner are identical for 1-dimensions arrays, so that is probably why you aren't noticing any differences. For N-dimension arrays, they correspond to common tensor operations.
np.inner is sometimes called a "vector product" between a higher and lower order tensor, particularly a tensor times a vector, and often leads to "tensor contraction". It includes matrix-vector multiplication.
np.dot corresponds to a "tensor product", and includes the case mentioned at the bottom of the Wikipedia page. It is generally used for multiplication of two similar tensors to produce a new tensor. It includes matrix-matrix multiplication.
If you're not using tensors, then you don't need to worry about these cases and they behave identically.

For 1 and 2 dimensional arrays numpy.inner works as transpose the second matrix then multiply.
So for:
A = [[a1,b1],[c1,d1]]
B = [[a2,b2],[c2,d2]]
numpy.inner(A,B)
array([[a1*a2 + b1*b2, a1*c2 + b1*d2],
[c1*a2 + d1*b2, c1*c2 + d1*d2])
I worked this out using examples like:
A=[[1 ,10], [100,1000]]
B=[[1,2], [3,4]]
numpy.inner(A,B)
array([[ 21, 43],
[2100, 4300]])
This also explains the behaviour in one dimension, numpy.inner([a,b],[c,b]) = ac+bd and numpy.inner([[a],[b]], [[c],[d]]) = [[ac,ad],[bc,bd]].
This is the extent of my knowledge, no idea what it does for higher dimensions.

inner is not working properly with complex 2D arrays, Try to multiply
and its transpose
array([[ 1.+1.j, 4.+4.j, 7.+7.j],
[ 2.+2.j, 5.+5.j, 8.+8.j],
[ 3.+3.j, 6.+6.j, 9.+9.j]])
you will get
array([[ 0. +60.j, 0. +72.j, 0. +84.j],
[ 0.+132.j, 0.+162.j, 0.+192.j],
[ 0.+204.j, 0.+252.j, 0.+300.j]])
effectively multiplying the rows to rows rather than rows to columns

There is a lot difference between inner product and dot product in higher dimensional space. below is an example of a 2x2 matrix and 3x2 matrix
x = [[a1,b1],[c1,d1]]
y= [[a2,b2].[c2,d2],[e2,f2]
np.inner(x,y)
output = [[a1xa2+b1xb2 ,a1xc2+b1xd2, a1xe2+b1f2],[c1xa2+d1xb2, c1xc2+d1xd2, c1xe2+d1xf2]]
But in the case of dot product the output shows the below error as you cannot multiply a 2x2 matrix with a 3x2.
ValueError: shapes (2,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)

I made a quick script to practice inner and dot product math. It really helped me get a feel for the difference:
You can find the code here:
https://github.com/geofflangenderfer/practice_inner_dot

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.