Using broadcasting with sparse scipy matrices - python

I have a numpy array Z with shape (k,N) and a second array X with shape (N,n).
Using numpy broadcasting, I can easily obtain a new array H with shape (n,k,N) whose slices are the array Z whose rows have been multiplied by the columns of X:
H = Z.reshape((1, k, N)) * X.T.reshape((n, 1, N))
This works fine and is surprisingly fast.
Now, X is extremely sparse, and I want to further speed up this operation using sparse matrix operations.
However if I perform the following operations:
import scipy.sparse as sprs
spX = sprs.csr_matrix(X)
H = (Z.reshape((1,k,N))*spX.T.reshape((n,1,N))).dot(Z.T)
I get the following error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python27\lib\site-packages\scipy\sparse\base.py", line 126, in reshape
self.__class__.__name__)
NotImplementedError: Reshaping not implemented for csc_matrix.
Is there a way to use broadcasting with sparse scipy matrices?

Scipy sparse matrices are limited to 2D shapes. But you can use Numpy in a "sparse" way:
H = np.zeros((n,k,N), np.result_type(Z, X))
I, J = np.nonzero(X)
Z_ = np.broadcast_to(Z, H.shape)
H[J,:,I] = Z_[J,:,I] * X[I,J,None]
Unfortunately the result H is still a dense array.
N.b. indexing with None is a handy way to add a unit-length dimension at the desired axis. The order of the result when combining advanced indexing with slicing is explained in the docs.

Related

Vector dot product of corresponding rows using numpy

I need to compute the vector dot product of the corresponding rows of two 2 dimensional arrays u and v in numpy. The rows of u are unit vectors. Here is some sample code that illustrates what I'm trying to do:
import numpy as np
u = np.array([[1, 0], [.6, .8], [0, 1]])
v = np.array([[1, 2], [3 , 4], [5, 6]])
I naively tried to use numpy's dot method, which returns an error as follows:
np.dot(u, v)
ValueError Traceback (most recent call last)
<ipython-input-9-146fe9079c1e> in <module>
----> 1 np.dot(u,v)
<__array_function__ internals> in dot(*args, **kwargs)
ValueError: shapes (3,2) and (3,2) not aligned: 2 (dim 1) != 3 (dim 0)
It's straightforward to define a function that produces the desired behavior:
def mydot(a, b):
return np.sum(a*b,axis=1,keepdims=True)
mydot(u,v)
array([[1.],
[5.],
[6.]])
However, this seems a bit clunky and leaves me with the suspicion that I'm missing something. Is there a more straightforward numpy way to do this?
Your way works fine. At the same time, you can use matmul with an extra dimension to multiply stacks of 1x2 by 2x1 matrices:
u[..., None] # v[:, None, :]
The biggest difference between matmul and dot is that matmul broadcasts the initial dimensions while dot combines them.
But the fastest way is probably einsum:
np.einsum('ij,ij->i', u, v)
np.dot acts as a matrix multiplication when used for 2D arrays so np.dot(v,u.T) this will not give you an error. But for dot product you can use this:
np.sum(v*u,axis=1)
v*u simply broadcast the 2D array and axis in np.sum() allows you to select the dimension in which you want to calculate the dot product. I hope you got it.

What is the difference between using matrix multiplication with np.matrix arrays, and dot()/tensor() with np.arrays?

At the moment, my code is written entirely using numpy arrays, np.array.
Define m as a np.array of 100 values, m.shape = (100,). There is also a multi-dimensional array, C.shape = (100,100).
The operation I would like to compute is
m^T * C * m
where m^T should be of shape (1,100), m of shape (100,1), and C of shape (100,100).
I'm conflicted how to proceed. If I insist the data types must remain np.arrays, then I should probably you numpy.dot() or numpy.tensordot() and specify the axis. That would be something like
import numpy as np
result = np.dot(C, m)
final = np.dot(m.T, result)
though m.T is an array of the same shape as m. Also, that's doing two individual operations instead of one.
Otherwise, I should convert everything into np.matrix and proceed to use matrix multiplication there. The problem with this is I must convert all my np.arrays into np.matrix, do the operations, and then convert back to np.array.
What is the most efficient and intelligent thing to do?
EDIT:
Based on the answers so far, I think np.dot(m^T, np.dot(C, m)) is probably the best way forward.
The main advantage of working with matrices is that the *symbol performs a matrix multiplication, whereas it performs an element-wise multiplications with arrays. With arrays you need to use dot. See:
Link
What are the differences between numpy arrays and matrices? Which one should I use?
If m is a one dimensional array, you don't need to transpose anything, because for 1D arrays, transpose doesn't change anything:
In [28]: m.T.shape, m.shape
Out[28]: ((3,), (3,))
In [29]: m.dot(C)
Out[29]: array([15, 18, 21])
In [30]: C.dot(m)
Out[30]: array([ 5, 14, 23])
This is different if you add another dimension to m:
In [31]: mm = m[:, np.newaxis]
In [32]: mm.dot(C)
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-28253c9b8898> in <module>()
----> 1 mm.dot(C)
ValueError: objects are not aligned
In [33]: (mm.T).dot(C)
Out[33]: array([[15, 18, 21]])
In [34]: C.dot(mm)
Out[34]:
array([[ 5],
[14],
[23]])

Dot product between 1D numpy array and scipy sparse matrix

Say I have Numpy array p and a Scipy sparse matrix q such that
>>> p.shape
(10,)
>>> q.shape
(10,100)
I want to do a dot product of p and q. When I try with numpy I get the following:
>>> np.dot(p,q)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-96-8260c6752ee5>", line 1, in <module>
np.dot(p,q)
ValueError: Cannot find a common data type.
I see in the Scipy documentation that
As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore
using it will result on unexpected results or errors. The
corresponding dense matrix should be obtained first instead
But that defeats my purpose of using a sparse matrix. Soooo, how am I to do dot products between a sparse matrix and a 1D numpy array (numpy matrix, I am open to either) without losing the sparsity of my matrix?
I am using Numpy 1.8.2 and Scipy 0.15.1.
Use *:
p * q
Note that * uses matrix-like semantics rather than array-like semantics for sparse matrices, so it computes a matrix product rather than a broadcasted product.
A sparse matrix is not a numpy array or matrix, though most formats use several arrays to store their data. As a general rule, regular numpy functions aren't aware of sparse matrices, so you should count on using the sparse versions of functions and operators.
By popular demand, the latest np.dot is sparse aware, though I don't know the details of how it acts on that. In 1.18 we have several options.
user2357112 suggests p*q. With the dense array first, I was a little doubtful, wondering if it would try to use array element by element multiplication (and fail due to broadcasting errors). But it works. Sometimes operators like * pass control to the 2nd argument. But just to be sure I tried several alternatives:
q.T * p
np.dot(p, q.A)
q.T.dot(p)
all give the same dense (100,) array. Note - this is an array, not a sparse matrix result.
To get a sparse matrix I need to use
sparse.csr_matrix(p)*q # (1,100) shape
q could be other sparse formats, but for calculations like this it is converted to csr or csc. And .T operation is cheap because if just requires switching the format from csr to csc.
It would be good idea to check whether these alternatives work if p is a 2d array, e.g. (2,10).
Scipy has inbuilt methods for sparse matrix multiplication.
Example from documentation:
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> Q = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
>>> p = np.array([1, 0, -1])
>>> Q.dot(p)
array([ 1, -3, -1], dtype=int64)
Check these resources:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.dot.html
http://docs.scipy.org/doc/scipy/reference/sparse.html

numpy: apply operation to multidimensional array

Assume I have a matrix of matrices, which is an order-4 tensor. What's the best way to apply the same operation to all the submatrices, similar to Map in Mathematica?
#!/usr/bin/python3
from pylab import *
t=random( (8,8,4,4) )
#t2=my_map(det,t)
#then shape(t2) becomes (8,8)
EDIT
Sorry for the bad English, since it's not my native one.
I tried numpy.linalg.det, but it doesn't seem to cope well with 3D or 4D tensors:
>>> import numpy as np
>>> a=np.random.rand(8,8,4,4)
>>> np.linalg.det(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1703, in det
sign, logdet = slogdet(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1645, in slogdet
_assertRank2(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 155, in _assertRank2
'two-dimensional' % len(a.shape))
numpy.linalg.linalg.LinAlgError: 4-dimensional array given. Array must be two-dimensional
EDIT2 (Solved)
The problem is older numpy version (<1.8) doesn't support inner loop in numpy.linalg.det, updating to numpy 1.8 solves the problem.
numpy 1.8 has some gufunc that can do this in C loop:
for example, numpy.linalg.det() is a gufunc:
import numpy as np
a = np.random.rand(8,8,4,4)
np.linalg.det(a)
First check the documentation for the operation that you intend to use. Many have a way of specifying which axis to operate on (np.sum). Others specify which axes they use (e.g. np.dot).
For np.linalg.det the documentation includes:
a : (..., M, M) array_like
Input array to compute determinants for.
So np.linalg.det(t) returns an (8,8) array, having calculated each det using the last 2 dimensions.
While it is possible to iterate on dimensions (the first is the default), it is better to write a function that makes use of numpy operations that use the whole array.

Calculating Correlation Coefficient with Numpy

I have a list of values and a 1-d numpy array, and I would like to calculate the correlation coefficient using numpy.corrcoef(x,y,rowvar=0). I get the following error:
Traceback (most recent call last):
File "testLearner.py", line 25, in <module>
corr = np.corrcoef(valuesToCompare,queryOutput,rowvar=0)
File "/usr/local/lib/python2.6/site-packages/numpy/lib/function_base.py", line 2003, in corrcoef
c = cov(x, y, rowvar, bias, ddof)
File "/usr/local/lib/python2.6/site-packages/numpy/lib/function_base.py", line 1935, in cov
X = concatenate((X,y), axis)
ValueError: array dimensions must agree except for d_0
I printed out the shape for my numpy array and got (400,1). When I convert my list to an array with numpy.asarray(y) I get (400,)!
I believe this is the problem. I did an array.reshape to (400,1) and printed out the shape, and I still get (400,). What am I missing?
Thanks in advance.
I think you might have assumed that reshape modifies the value of the original array. It doesn't:
>>> a = np.random.randn(5)
>>> a.shape
(5,)
>>> b = a.reshape(5,1)
>>> b.shape
(5, 1)
>>> a.shape
(5,)
np.asarray treats a regular list as a 1d array, but your original numpy array that you said was 1d is actually 2d (because its shape is (400,1)). If you want to use your list like a 2d array, there are two easy approaches:
np.asarray(lst).reshape((-1, 1)) – -1 means "however many it needs" for that dimension".
np.asarray([lst]).T – .T means array transpose, which switches from (1,5) to (5,1).-
You could also reshape your original array to 1d via ary.reshape((-1,)).

Categories