Say I have Numpy array p and a Scipy sparse matrix q such that
>>> p.shape
(10,)
>>> q.shape
(10,100)
I want to do a dot product of p and q. When I try with numpy I get the following:
>>> np.dot(p,q)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-96-8260c6752ee5>", line 1, in <module>
np.dot(p,q)
ValueError: Cannot find a common data type.
I see in the Scipy documentation that
As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore
using it will result on unexpected results or errors. The
corresponding dense matrix should be obtained first instead
But that defeats my purpose of using a sparse matrix. Soooo, how am I to do dot products between a sparse matrix and a 1D numpy array (numpy matrix, I am open to either) without losing the sparsity of my matrix?
I am using Numpy 1.8.2 and Scipy 0.15.1.
Use *:
p * q
Note that * uses matrix-like semantics rather than array-like semantics for sparse matrices, so it computes a matrix product rather than a broadcasted product.
A sparse matrix is not a numpy array or matrix, though most formats use several arrays to store their data. As a general rule, regular numpy functions aren't aware of sparse matrices, so you should count on using the sparse versions of functions and operators.
By popular demand, the latest np.dot is sparse aware, though I don't know the details of how it acts on that. In 1.18 we have several options.
user2357112 suggests p*q. With the dense array first, I was a little doubtful, wondering if it would try to use array element by element multiplication (and fail due to broadcasting errors). But it works. Sometimes operators like * pass control to the 2nd argument. But just to be sure I tried several alternatives:
q.T * p
np.dot(p, q.A)
q.T.dot(p)
all give the same dense (100,) array. Note - this is an array, not a sparse matrix result.
To get a sparse matrix I need to use
sparse.csr_matrix(p)*q # (1,100) shape
q could be other sparse formats, but for calculations like this it is converted to csr or csc. And .T operation is cheap because if just requires switching the format from csr to csc.
It would be good idea to check whether these alternatives work if p is a 2d array, e.g. (2,10).
Scipy has inbuilt methods for sparse matrix multiplication.
Example from documentation:
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> Q = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
>>> p = np.array([1, 0, -1])
>>> Q.dot(p)
array([ 1, -3, -1], dtype=int64)
Check these resources:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.dot.html
http://docs.scipy.org/doc/scipy/reference/sparse.html
Related
I have a numpy array Z with shape (k,N) and a second array X with shape (N,n).
Using numpy broadcasting, I can easily obtain a new array H with shape (n,k,N) whose slices are the array Z whose rows have been multiplied by the columns of X:
H = Z.reshape((1, k, N)) * X.T.reshape((n, 1, N))
This works fine and is surprisingly fast.
Now, X is extremely sparse, and I want to further speed up this operation using sparse matrix operations.
However if I perform the following operations:
import scipy.sparse as sprs
spX = sprs.csr_matrix(X)
H = (Z.reshape((1,k,N))*spX.T.reshape((n,1,N))).dot(Z.T)
I get the following error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python27\lib\site-packages\scipy\sparse\base.py", line 126, in reshape
self.__class__.__name__)
NotImplementedError: Reshaping not implemented for csc_matrix.
Is there a way to use broadcasting with sparse scipy matrices?
Scipy sparse matrices are limited to 2D shapes. But you can use Numpy in a "sparse" way:
H = np.zeros((n,k,N), np.result_type(Z, X))
I, J = np.nonzero(X)
Z_ = np.broadcast_to(Z, H.shape)
H[J,:,I] = Z_[J,:,I] * X[I,J,None]
Unfortunately the result H is still a dense array.
N.b. indexing with None is a handy way to add a unit-length dimension at the desired axis. The order of the result when combining advanced indexing with slicing is explained in the docs.
I have the following line of code in MATLAB which I am trying to convert to Python numpy:
pred = traindata(:,2:257)*beta;
In Python, I have:
pred = traindata[ : , 1:257]*beta
beta is a 256 x 1 array.
In MATLAB,
size(pred) = 1389 x 1
But in Python,
pred.shape = (1389L, 256L)
So, I found out that multiplying by the beta array is producing the difference between the two arrays.
How do I write the original Python line, so that the size of pred is 1389 x 1, like it is in MATLAB when I multiply by my beta array?
I suspect that beta is in fact a 1D numpy array. In numpy, 1D arrays are not row or column vectors where MATLAB clearly makes this distinction. These are simply 1D arrays agnostic of any shape. If you must, you need to manually introduce a new singleton dimension to the beta vector to facilitate the multiplication. On top of this, the * operator actually performs element-wise multiplication. To perform matrix-vector or matrix-matrix multiplication, you must use numpy's dot function to do so.
Therefore, you must do something like this:
import numpy as np # Just in case
pred = np.dot(traindata[:, 1:257], beta[:,None])
beta[:,None] will create a 2D numpy array where the elements from the 1D array are populated along the rows, effectively making a column vector (i.e. 256 x 1). However, if you have already done this on beta, then you don't need to introduce the new singleton dimension. Just use dot normally:
pred = np.dot(traindata[:, 1:257], beta)
At the moment, my code is written entirely using numpy arrays, np.array.
Define m as a np.array of 100 values, m.shape = (100,). There is also a multi-dimensional array, C.shape = (100,100).
The operation I would like to compute is
m^T * C * m
where m^T should be of shape (1,100), m of shape (100,1), and C of shape (100,100).
I'm conflicted how to proceed. If I insist the data types must remain np.arrays, then I should probably you numpy.dot() or numpy.tensordot() and specify the axis. That would be something like
import numpy as np
result = np.dot(C, m)
final = np.dot(m.T, result)
though m.T is an array of the same shape as m. Also, that's doing two individual operations instead of one.
Otherwise, I should convert everything into np.matrix and proceed to use matrix multiplication there. The problem with this is I must convert all my np.arrays into np.matrix, do the operations, and then convert back to np.array.
What is the most efficient and intelligent thing to do?
EDIT:
Based on the answers so far, I think np.dot(m^T, np.dot(C, m)) is probably the best way forward.
The main advantage of working with matrices is that the *symbol performs a matrix multiplication, whereas it performs an element-wise multiplications with arrays. With arrays you need to use dot. See:
Link
What are the differences between numpy arrays and matrices? Which one should I use?
If m is a one dimensional array, you don't need to transpose anything, because for 1D arrays, transpose doesn't change anything:
In [28]: m.T.shape, m.shape
Out[28]: ((3,), (3,))
In [29]: m.dot(C)
Out[29]: array([15, 18, 21])
In [30]: C.dot(m)
Out[30]: array([ 5, 14, 23])
This is different if you add another dimension to m:
In [31]: mm = m[:, np.newaxis]
In [32]: mm.dot(C)
--------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-32-28253c9b8898> in <module>()
----> 1 mm.dot(C)
ValueError: objects are not aligned
In [33]: (mm.T).dot(C)
Out[33]: array([[15, 18, 21]])
In [34]: C.dot(mm)
Out[34]:
array([[ 5],
[14],
[23]])
Assume I have a matrix of matrices, which is an order-4 tensor. What's the best way to apply the same operation to all the submatrices, similar to Map in Mathematica?
#!/usr/bin/python3
from pylab import *
t=random( (8,8,4,4) )
#t2=my_map(det,t)
#then shape(t2) becomes (8,8)
EDIT
Sorry for the bad English, since it's not my native one.
I tried numpy.linalg.det, but it doesn't seem to cope well with 3D or 4D tensors:
>>> import numpy as np
>>> a=np.random.rand(8,8,4,4)
>>> np.linalg.det(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1703, in det
sign, logdet = slogdet(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1645, in slogdet
_assertRank2(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 155, in _assertRank2
'two-dimensional' % len(a.shape))
numpy.linalg.linalg.LinAlgError: 4-dimensional array given. Array must be two-dimensional
EDIT2 (Solved)
The problem is older numpy version (<1.8) doesn't support inner loop in numpy.linalg.det, updating to numpy 1.8 solves the problem.
numpy 1.8 has some gufunc that can do this in C loop:
for example, numpy.linalg.det() is a gufunc:
import numpy as np
a = np.random.rand(8,8,4,4)
np.linalg.det(a)
First check the documentation for the operation that you intend to use. Many have a way of specifying which axis to operate on (np.sum). Others specify which axes they use (e.g. np.dot).
For np.linalg.det the documentation includes:
a : (..., M, M) array_like
Input array to compute determinants for.
So np.linalg.det(t) returns an (8,8) array, having calculated each det using the last 2 dimensions.
While it is possible to iterate on dimensions (the first is the default), it is better to write a function that makes use of numpy operations that use the whole array.
I wish to speed up my machine learning algorithm (written in Python) using Numba (http://numba.pydata.org/). Note that this algorithm takes as its input data a sparse matrix. In my pure Python implementation, I used csr_matrix and related classes from Scipy, but apparently it is not compatible with Numba's JIT compiler.
I have also created my own custom class to implement the sparse matrix (which is basically a list of list of (index, value) pair), but again it is incompatible with Numba (i.e., I got some weird error message saying it doesn't recognize extension type)
Is there an alternative, simple way to implement sparse matrix using only numpy (without resorting to SciPy) that is compatible with Numba? Any example code would be appreciated. Thanks!
If all you have to do is iterate over the values of a CSR matrix, you can pass the attributes data, indptr, and indices to a function instead of the CSR matrix object.
from scipy import sparse
from numba import njit
#njit
def print_csr(A, iA, jA):
for row in range(len(iA)-1):
for i in range(iA[row], iA[row+1]):
print(row, jA[i], A[i])
A = sparse.csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
print_csr(A.data, A.indptr, A.indices)
You can access the data of your sparse matrix as pure numpy or python. For example
M=sparse.csr_matrix([[1,0,0],[1,0,1],[1,1,1]])
ML = M.tolil()
for d,r in enumerate(zip(ML.data,ML.rows))
# d,r are lists
dr = np.array([d,r])
print dr
produces:
[[1]
[0]]
[[1 1]
[0 2]]
[[1 1 1]
[0 1 2]]
Surely numba can handle code that uses these arrays, provided, of course, that it does not expect each row to have the same size of array.
The lil format stores values 2 object dtype arrays, with data and indices stored lists, by row.