numpy: apply operation to multidimensional array - python

Assume I have a matrix of matrices, which is an order-4 tensor. What's the best way to apply the same operation to all the submatrices, similar to Map in Mathematica?
#!/usr/bin/python3
from pylab import *
t=random( (8,8,4,4) )
#t2=my_map(det,t)
#then shape(t2) becomes (8,8)
EDIT
Sorry for the bad English, since it's not my native one.
I tried numpy.linalg.det, but it doesn't seem to cope well with 3D or 4D tensors:
>>> import numpy as np
>>> a=np.random.rand(8,8,4,4)
>>> np.linalg.det(a)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1703, in det
sign, logdet = slogdet(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 1645, in slogdet
_assertRank2(a)
File "/usr/lib/python3/dist-packages/numpy/linalg/linalg.py", line 155, in _assertRank2
'two-dimensional' % len(a.shape))
numpy.linalg.linalg.LinAlgError: 4-dimensional array given. Array must be two-dimensional
EDIT2 (Solved)
The problem is older numpy version (<1.8) doesn't support inner loop in numpy.linalg.det, updating to numpy 1.8 solves the problem.

numpy 1.8 has some gufunc that can do this in C loop:
for example, numpy.linalg.det() is a gufunc:
import numpy as np
a = np.random.rand(8,8,4,4)
np.linalg.det(a)

First check the documentation for the operation that you intend to use. Many have a way of specifying which axis to operate on (np.sum). Others specify which axes they use (e.g. np.dot).
For np.linalg.det the documentation includes:
a : (..., M, M) array_like
Input array to compute determinants for.
So np.linalg.det(t) returns an (8,8) array, having calculated each det using the last 2 dimensions.
While it is possible to iterate on dimensions (the first is the default), it is better to write a function that makes use of numpy operations that use the whole array.

Related

Nested loop -TypeError: only size-1 arrays can be converted to Python scalars

I'm trying to find the max value with two different variables:
import numpy as np
import matplotlib.pyplot as plt
from math import pi, sqrt
i =.5
l = .01
u = 4*pi*10**-7
angle = np.linspace(0,pi/2,20)
d = np.linspace(0,.5, 50)
B = []
for ang in angle:
for dis in d:
x = (u*i*np.cos(ang))/(pi*sqrt((l/2)**2 + d**2))
B.append(max(x))
However, it keeps giving me "TypeError: only size-1 arrays can be converted to Python scalars"
I'm not even sure what this means.
Your problem is caused by the math.sqrt() function which tries to get a vector as input: The term related to l is a scalar, while the term related to d is a vector. Usually, Python tries to add the scalar value to each of the entries in the vector, which results in a vector. If this is what you want, you can just replace sqrt of the math package with np.sqrt().
Let's see why this is the case:
The math.sqrt() only take scalars as inputs. It also works, if an array has a size of one, which is then converted to a scalar:
>>> math.sqrt(np.array([4]))
2.0
This is tried in your case, but fails as the term in the brackets is a vector with a size bigger than one. You can try it out in an easier example:
>>> math.sqrt(np.array([9, 25]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only size-1 arrays can be converted to Python scalars
In these cases, you can just use numpys sqrt method instead of maths ones:
>>> np.sqrt(np.array([9, 25]))
array([3., 5.])

Using broadcasting with sparse scipy matrices

I have a numpy array Z with shape (k,N) and a second array X with shape (N,n).
Using numpy broadcasting, I can easily obtain a new array H with shape (n,k,N) whose slices are the array Z whose rows have been multiplied by the columns of X:
H = Z.reshape((1, k, N)) * X.T.reshape((n, 1, N))
This works fine and is surprisingly fast.
Now, X is extremely sparse, and I want to further speed up this operation using sparse matrix operations.
However if I perform the following operations:
import scipy.sparse as sprs
spX = sprs.csr_matrix(X)
H = (Z.reshape((1,k,N))*spX.T.reshape((n,1,N))).dot(Z.T)
I get the following error:
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "C:\Python27\lib\site-packages\scipy\sparse\base.py", line 126, in reshape
self.__class__.__name__)
NotImplementedError: Reshaping not implemented for csc_matrix.
Is there a way to use broadcasting with sparse scipy matrices?
Scipy sparse matrices are limited to 2D shapes. But you can use Numpy in a "sparse" way:
H = np.zeros((n,k,N), np.result_type(Z, X))
I, J = np.nonzero(X)
Z_ = np.broadcast_to(Z, H.shape)
H[J,:,I] = Z_[J,:,I] * X[I,J,None]
Unfortunately the result H is still a dense array.
N.b. indexing with None is a handy way to add a unit-length dimension at the desired axis. The order of the result when combining advanced indexing with slicing is explained in the docs.

Dot product between 1D numpy array and scipy sparse matrix

Say I have Numpy array p and a Scipy sparse matrix q such that
>>> p.shape
(10,)
>>> q.shape
(10,100)
I want to do a dot product of p and q. When I try with numpy I get the following:
>>> np.dot(p,q)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist packages/IPython/core/interactiveshell.py", line 2883, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-96-8260c6752ee5>", line 1, in <module>
np.dot(p,q)
ValueError: Cannot find a common data type.
I see in the Scipy documentation that
As of NumPy 1.7, np.dot is not aware of sparse matrices, therefore
using it will result on unexpected results or errors. The
corresponding dense matrix should be obtained first instead
But that defeats my purpose of using a sparse matrix. Soooo, how am I to do dot products between a sparse matrix and a 1D numpy array (numpy matrix, I am open to either) without losing the sparsity of my matrix?
I am using Numpy 1.8.2 and Scipy 0.15.1.
Use *:
p * q
Note that * uses matrix-like semantics rather than array-like semantics for sparse matrices, so it computes a matrix product rather than a broadcasted product.
A sparse matrix is not a numpy array or matrix, though most formats use several arrays to store their data. As a general rule, regular numpy functions aren't aware of sparse matrices, so you should count on using the sparse versions of functions and operators.
By popular demand, the latest np.dot is sparse aware, though I don't know the details of how it acts on that. In 1.18 we have several options.
user2357112 suggests p*q. With the dense array first, I was a little doubtful, wondering if it would try to use array element by element multiplication (and fail due to broadcasting errors). But it works. Sometimes operators like * pass control to the 2nd argument. But just to be sure I tried several alternatives:
q.T * p
np.dot(p, q.A)
q.T.dot(p)
all give the same dense (100,) array. Note - this is an array, not a sparse matrix result.
To get a sparse matrix I need to use
sparse.csr_matrix(p)*q # (1,100) shape
q could be other sparse formats, but for calculations like this it is converted to csr or csc. And .T operation is cheap because if just requires switching the format from csr to csc.
It would be good idea to check whether these alternatives work if p is a 2d array, e.g. (2,10).
Scipy has inbuilt methods for sparse matrix multiplication.
Example from documentation:
>>> import numpy as np
>>> from scipy.sparse import csr_matrix
>>> Q = csr_matrix([[1, 2, 0], [0, 0, 3], [4, 0, 5]])
>>> p = np.array([1, 0, -1])
>>> Q.dot(p)
array([ 1, -3, -1], dtype=int64)
Check these resources:
http://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.sparse.csc_matrix.dot.html
http://docs.scipy.org/doc/scipy/reference/sparse.html

Using masked numpy arrays with rpy2

import numpy
import rpy2
from rpy2 import robjects
import rpy2.robjects.numpy2ri
r = robjects.r
rpy2.robjects.numpy2ri.activate()
x = numpy.array( [1, 5, -99, 4, 5, 3, 7, -99, 6] )
mx = numpy.ma.masked_values( x, -99 )
print x # works, displays all values
print r.sd(x) # works, but uses -99 values in calculation
print mx # works, now -99 values are masked (--)
print r.sd(mx) # does not work - error
I am a new user of rpy2 and numpy. I am using R 2.14.1, python 2.7.1, rpy2 2.2.5, numpy 1.5.1 on RHEL5.
I need to read data into a numpy array and use rpy2 functions on it. However, I need to mask missing values prior to using the array with rpy2.
I have no problem masking values, but I can't get rpy2 to work with the resulting masked array. Looks like maybe the numpy2ri conversion doesn't work on masked numpy arrays? (see error below)
How can I make this work? Is it possible to tell rpy2 to ignore masked values? I'd like to stick with R rather than use scipy/numpy directly, since I'll be doing more advanced stats later.
Thanks.
Traceback (most recent call last):
File "d.py", line 16, in <module>
print r.sd(mx) # does not work - error
File "/dev/py/lib/python2.7/site-packages/rpy2-2.2.5dev_20120227-py2.7-linux-x86_64.egg/rpy2/robjects/functions.py", line 82, in __call__
return super(SignatureTranslatedFunction, self).__call__(*args, **kwargs)
File "/dev/py/lib/python2.7/site-packages/rpy2-2.2.5dev_20120227-py2.7-linux-x86_64.egg/rpy2/robjects/functions.py", line 30, in __call__
new_args = [conversion.py2ri(a) for a in args]
File "/dev/py/lib/python2.7/site-packages/rpy2-2.2.5dev_20120227-py2.7-linux-x86_64.egg/rpy2/robjects/numpy2ri.py", line 36, in numpy2ri
vec = SexpVector(o.ravel("F"), _kinds[o.dtype.kind])
TypeError: ravel() takes exactly 1 argument (2 given)
Update: Since rpy2 can't handle masked numpy arrays, I tried converting my -99 values to numpy NaN values. Apparently rpy2 recognizes numpy NaN values as R-style NA values.
The code below works because in the r.sd() call I can tell rpy2 to not use NA values. But the initial NaN substitution is definitely slower than applying the numpy mask.
Can any of you python wizards give me a faster way to do the -99 to NaN substitution across a large numpy ndarray? Or maybe suggest another approach?
Thanks.
# 'x' is a large numpy ndarray I am working with
# ('x' in the original code above was a small test array)
for i in range(900, 950): # random slice of numpy ndarray
for j in range(6225): # full extent across slice
if x[i][j] == -99:
x[i][j] = numpy.NaN
y = x[933] # random piece of converted range
sd = r.sd( y, **{'na.rm': 'TRUE'} ) # r.sd() call that ignores numpy NaN values
print sd
The concept of "masked values" (that is of an array of value coupled to a list of indices to be masked) does not directly exist in R.
In R values are either set to be "missing" (NA), or a subset of the original data structure is taken (so a new object containing only this subset is created).
Now what is happening behind the scene in rpy2 during numpy to rinterface is that a copy of the numpy array into an R array is made (the other way around, exposing an R array to numpy, does not necessarily require copying). There is no reason why masks would not be handled at that stage (this may make it way to the code base quicker if someone is providing a patch). The alternative is to create a numpy array without the masked values, then feed this to rpy2.
You can speed up the process of replacing -99 values by NaN
by using masked arrays, objects that are natively defined in numpy.ma
as in the following code :
x_masked = numpy.ma.masked_array(x, mask= (x==-99) )
x_filled = x_masked.filled( numpy.NaN )
x_masked is a numpy.ma (masked array).
x_filled is a numpy.ndarray (regular numpy array)

Calculating Correlation Coefficient with Numpy

I have a list of values and a 1-d numpy array, and I would like to calculate the correlation coefficient using numpy.corrcoef(x,y,rowvar=0). I get the following error:
Traceback (most recent call last):
File "testLearner.py", line 25, in <module>
corr = np.corrcoef(valuesToCompare,queryOutput,rowvar=0)
File "/usr/local/lib/python2.6/site-packages/numpy/lib/function_base.py", line 2003, in corrcoef
c = cov(x, y, rowvar, bias, ddof)
File "/usr/local/lib/python2.6/site-packages/numpy/lib/function_base.py", line 1935, in cov
X = concatenate((X,y), axis)
ValueError: array dimensions must agree except for d_0
I printed out the shape for my numpy array and got (400,1). When I convert my list to an array with numpy.asarray(y) I get (400,)!
I believe this is the problem. I did an array.reshape to (400,1) and printed out the shape, and I still get (400,). What am I missing?
Thanks in advance.
I think you might have assumed that reshape modifies the value of the original array. It doesn't:
>>> a = np.random.randn(5)
>>> a.shape
(5,)
>>> b = a.reshape(5,1)
>>> b.shape
(5, 1)
>>> a.shape
(5,)
np.asarray treats a regular list as a 1d array, but your original numpy array that you said was 1d is actually 2d (because its shape is (400,1)). If you want to use your list like a 2d array, there are two easy approaches:
np.asarray(lst).reshape((-1, 1)) – -1 means "however many it needs" for that dimension".
np.asarray([lst]).T – .T means array transpose, which switches from (1,5) to (5,1).-
You could also reshape your original array to 1d via ary.reshape((-1,)).

Categories