How to convert Zero to Nan in the array? - python

I used the temp[temp==0] = np.nan, but I got this Error:
IndexError: 2-dimensional boolean indexing is not supported.

I'd use where, to avoid having to drop down to numpy:
In [35]: d
Out[35]:
<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[0, 1, 2],
[3, 4, 5]])
Dimensions without coordinates: dim_0, dim_1
In [36]: d.where(d != 0)
Out[36]:
<xarray.DataArray (dim_0: 2, dim_1: 3)>
array([[nan, 1., 2.],
[ 3., 4., 5.]])
Dimensions without coordinates: dim_0, dim_1
and which will automatically move to floats if necessary.

Related

np.where equivalent for 1-D arrays

I am trying to fill nan values in an array with values from another array. Since the arrays I am working on are 1-D np.where is not working. However, following the tip in the documentation I tried the following:
import numpy as np
sample = [1, 2, np.nan, 4, 5, 6, np.nan]
replace = [3, 7]
new_sample = [new_value if condition else old_value for (new_value, condition, old_value) in zip(replace, np.isnan(sample), sample)]
However, instead output I expected [1, 2, 3, 4, 5, 6, 7] I get:
[Out]: [1, 2]
What I am doing wrong?
np.where works
In [561]: sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
Use isnan to identify the nan values (don't use ==)
In [562]: np.isnan(sample)
Out[562]: array([False, False, True, False, False, False, True])
In [564]: np.where(np.isnan(sample))
Out[564]: (array([2, 6], dtype=int32),)
Either one, the boolean or the where tuple can index the nan values:
In [565]: sample[Out[564]]
Out[565]: array([nan, nan])
In [566]: sample[Out[562]]
Out[566]: array([nan, nan])
and be used to replace:
In [567]: sample[Out[562]]=[1,2]
In [568]: sample
Out[568]: array([1., 2., 1., 4., 5., 6., 2.])
The three parameter also works - but returns a copy.
In [571]: np.where(np.isnan(sample),999,sample)
Out[571]: array([ 1., 2., 999., 4., 5., 6., 999.])
You can use numpy.argwhere. But #hpaulj shows that numpy.where works just as well.
import numpy as np
sample = np.array([1, 2, np.nan, 4, 5, 6, np.nan])
replace = np.array([3, 7])
sample[np.argwhere(np.isnan(sample)).ravel()] = replace
# array([ 1., 2., 3., 4., 5., 6., 7.])

Create a matrix from a vector where each row is a shifted version of the vector

I have a numpy array like this
import numpy as np
ar = np.array([1, 2, 3, 4])
and I want to create an array that looks like this:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Thereby, each row corresponds to ar which is shifted by the row index + 1.
A straightforward implementation could look like this:
ar_roll = np.tile(ar, ar.shape[0]).reshape(ar.shape[0], ar.shape[0])
for indi, ri in enumerate(ar_roll):
ar_roll[indi, :] = np.roll(ri, indi + 1)
which gives me the desired output.
My question is whether there is a smarter way of doing this which avoids the loop.
Here's one approach using NumPy strides basically padding with the leftover elements and then the strides helping us in creating that shifted version pretty efficiently -
def strided_method(ar):
a = np.concatenate(( ar, ar[:-1] ))
L = len(ar)
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a[L-1:], (L,L), (-n,n))
Sample runs -
In [42]: ar = np.array([1, 2, 3, 4])
In [43]: strided_method(ar)
Out[43]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
In [44]: ar = np.array([4,9,3,6,1,2])
In [45]: strided_method(ar)
Out[45]:
array([[2, 4, 9, 3, 6, 1],
[1, 2, 4, 9, 3, 6],
[6, 1, 2, 4, 9, 3],
[3, 6, 1, 2, 4, 9],
[9, 3, 6, 1, 2, 4],
[4, 9, 3, 6, 1, 2]])
Runtime test -
In [5]: a = np.random.randint(0,9,(1000))
# #Eric's soln
In [6]: %timeit roll_matrix(a)
100 loops, best of 3: 3.39 ms per loop
# #Warren Weckesser's soln
In [8]: %timeit circulant(a[::-1])
100 loops, best of 3: 2.03 ms per loop
# Strides method
In [18]: %timeit strided_method(a)
100000 loops, best of 3: 6.7 µs per loop
Making a copy (if you want to make changes and not just use as a read only array) won't hurt us too badly for the strides method -
In [19]: %timeit strided_method(a).copy()
1000 loops, best of 3: 381 µs per loop
Both of the existing answers are fine; this answer is probably only of interest if you are already using scipy.
The matrix that you describe is known as a circulant matrix. If you don't mind the dependency on scipy, you can use scipy.linalg.circulant to create one:
In [136]: from scipy.linalg import circulant
In [137]: ar = np.array([1, 2, 3, 4])
In [138]: circulant(ar[::-1])
Out[138]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Here's one approach
def roll_matrix(vec):
N = len(vec)
buffer = np.empty((N, N*2 - 1))
# generate a wider array that we want a slice into
buffer[:,:N] = vec
buffer[:,N:] = vec[:-1]
rolled = buffer.reshape(-1)[N-1:-1].reshape(N, -1)
return rolled[:,:N]
In your case, we build buffer to be
array([[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.]])
Then flatten it, trim it, reshape it to get rolled:
array([[ 4., 1., 2., 3., 1., 2.],
[ 3., 4., 1., 2., 3., 1.],
[ 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2.]])
And finally, slice off the garbage last columns

Scipy sparse matrix assignment using only stored elements

I have a large sparse matrix globalGrid (lil_matrix) and a smaller matrix localGrid (coo_matrix). The localGrid represents a subset of the globalGrid and I want to update the globalGrid with the localGrid. For this I use the following code (in Python Scipy):
globalGrid[xLocalgrid:xLocalgrid + localGrid.shape[0], yLocalgrid: yLocalgrid + localGrid.shape[1]] = localGrid
where xLocalGrid and yLocalGrid are the offset of the localGrid origin with respect to the globalGrid.
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
I have found about masked arrays in numpy, however that does not seem to apply to sparse scipy matrices.
edit: In response to the comments below, here is a example to illustrate what I mean:
First setup the matrices:
M=sparse.lil_matrix(2*np.ones([5,5]))
m = sparse.eye(3)
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.],
[ 2., 2., 2., 2., 2.]])
m.todense()
matrix([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
Then assign:
M[1:4, 1:4] = m
Now the result is:
M.todense()
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 0., 0., 2.],
[ 2., 0., 1., 0., 2.],
[ 2., 0., 0., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Whereas I need the result to be:
matrix([[ 2., 2., 2., 2., 2.],
[ 2., 1., 2., 2., 2.],
[ 2., 2., 1., 2., 2.],
[ 2., 2., 2., 1., 2.],
[ 2., 2., 2., 2., 2.]])
Should this line
The problem is that the localGrid is sparse, but also the non-zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
be changed to?
The problem is that the localGrid is sparse, but also the zero elements are assigned to the globalGrid. Is there a way I can only assign the stored elements and not the 0-elements?
Your question isn't quite clear, but I'm guessing that because the globalGrid[a:b, c:d] indexing spans values that should be 0 in both arrays, that you are worried that 0's are being copied.
Let's try this with real matrices.
In [13]: M=sparse.lil_matrix((10,10))
In [14]: m=sparse.eye(3)
In [15]: M[4:7,5:8]=m
In [16]: m
Out[16]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements (1 diagonals) in DIAgonal format>
In [17]: M
Out[17]:
<10x10 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in LInked List format>
In [18]: M.data
Out[18]: array([[], [], [], [], [1.0], [1.0], [1.0], [], [], []], dtype=object)
In [19]: M.rows
Out[19]: array([[], [], [], [], [5], [6], [7], [], [], []], dtype=object)
M does not have any unnecessary 0's.
If there are unnecessary 0's in a sparse matrix, a round trip to csr format should take care of them
M.tocsr().tolil()
csr format also has an inplace .eliminate_zeros() method.
So your concern is with over writing the nonzeros of the target array.
With dense arrays, the use of nonzero (or where) takes care of this:
In [87]: X=np.ones((10,10),int)*2
In [88]: y=np.eye(3)
In [89]: I,J=np.nonzero(y)
In [90]: X[I+3,J+2]=y[I,J]
In [91]: X
Out[91]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])
Trying the sparse equivalent:
In [92]: M=sparse.lil_matrix(X)
In [93]: M
Out[93]:
<10x10 sparse matrix of type '<class 'numpy.int32'>'
with 100 stored elements in LInked List format>
In [94]: m=sparse.coo_matrix(y)
In [95]: m
Out[95]:
<3x3 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
In [96]: I,J=np.nonzero(m)
In [97]: I
Out[97]: array([0, 1, 2], dtype=int32)
In [98]: J
Out[98]: array([0, 1, 2], dtype=int32)
In [99]: M[I+3,J+2]=m[I,J]
...
TypeError: 'coo_matrix' object is not subscriptable
I could have used the sparse matrix own nonzero.
In [106]: I,J=m.nonzero()
For coo format, this is the same as
In [109]: I,J=m.row, m.col
In which case I can also use the data attribute:
In [100]: M[I+3,J+2]=m.data
In [101]: M.A
Out[101]:
array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 1, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 1, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 1, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], dtype=int32)
The code for m.nonzero may be instructive
A = self.tocoo()
nz_mask = A.data != 0
return (A.row[nz_mask],A.col[nz_mask])
So you need to be careful to make sure that index and data attributes of the sparse matrix match.
And also pay attention as to which sparse formats allow indexing. lil is good for changing values. csr allows element by element indexing, but raises an efficiency warning if you try to change zero values to nonzero (or v.v.). coo has this nice pairing of indices and data, but doesn't allow indexing.
Another subtle point: in constructing a coo you may repeat coordinates. When converted to csr format those values are summed. But the assignment that I'm suggesting will only use the last value, not the sum. So make sure you understand how your local matrix was constructed, and know whether it is a 'clean' representation of the data.
I have found a working solution. Instead of using an assignment I loop over the data in the sparse matrix (using M.data and M.rows) and replace the elements one by one.
for idx, row in enumerate(m.rows):
for idy, col in enumerate(row):
M[yOffset+col, xOffset+idx] = m.data[idx][idy]
I am still curious though if there is no simpler/faster method to achieve this result.
My implementation of the above answer:
I,J = m.nonzero()
M[I+yOffset,J+xOffset] = m[I,J]
return M'
This was however marginally slower.

Specifying dtype=object for numpy.gradient

Is there a way to specify the dtype for numpy.gradient?
I'm using an array of subarrays and it's throwing the following error:
ValueError: setting an array element with a sequence.
Here is an example:
import numpy as np
a = np.empty([3, 3], dtype=object)
it = np.nditer(a, flags=['multi_index', 'refs_ok'])
while not it.finished:
i = it.multi_index[0]
j = it.multi_index[1]
a[it.multi_index] = np.array([i, j])
it.iternext()
print(a)
which outputs
[[array([0, 0]) array([0, 1]) array([0, 2])]
[array([1, 0]) array([1, 1]) array([1, 2])]
[array([2, 0]) array([2, 1]) array([2, 2])]]
I would like print(np.gradient(a)) to return
array(
[[array([[1, 0],[0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])],
[array([[1, 0], [0, 1]]), array([[1, 0], [0, 1]]), array([[1, 0],[0, 1]])]],
dtype=object)
Notice that, in this case, the gradient of the vector field is an identity tensor field.
why are you working an array of dtype object? That's more work than using a 2d array.
e.g.
In [53]: a1=np.array([[1,2],[3,4],[5,6]])
In [54]: a1
Out[54]:
array([[1, 2],
[3, 4],
[5, 6]])
In [55]: np.gradient(a1)
Out[55]:
[array([[ 2., 2.],
[ 2., 2.],
[ 2., 2.]]),
array([[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])]
or working column by column, or row by row
In [61]: [np.gradient(i) for i in a1.T]
Out[61]: [array([ 2., 2., 2.]), array([ 2., 2., 2.])]
In [62]: [np.gradient(i) for i in a1]
Out[62]: [array([ 1., 1.]), array([ 1., 1.]), array([ 1., 1.])]
dtype=object only make sense if the subarrays/lists differ in type and/or shape. And even then it doesn't add much to a regular Python list.
==============================
I can take your 2d a, and make a 3d array with:
In [126]: a1=np.zeros((3,3,2),int)
In [127]: a1.flat[:]=[i for i in a.flatten()]
In [128]: a1
Out[128]:
array([[[0, 0],
[0, 1],
[0, 2]],
[[1, 0],
[1, 1],
[1, 2]],
[[2, 0],
[2, 1],
[2, 2]]])
Or I could produce the same thing with meshgrid:
In [129]: X,Y=np.meshgrid(np.arange(3),np.arange(3),indexing='ij')
In [130]: a2=np.array([Y,X]).T
When I apply np.gradient to that I get 3 arrays, each (3,3,2) in shape.
In [136]: ga1=np.gradient(a1)
In [137]: len(ga1)
Out[137]: 3
In [138]: ga1[0].shape
Out[138]: (3, 3, 2)
It looks like the 1st 2 arrays have the values you want, so it's just a matter of rearranging them.
In [141]: np.array(ga1[:2]).shape
Out[141]: (2, 3, 3, 2)
In [143]: gga1=np.array(ga1[:2]).transpose([1,2,0,3])
In [144]: gga1.shape
Out[144]: (3, 3, 2, 2)
In [145]: gga1[0,0]
Out[145]:
array([[ 1., -0.],
[-0., 1.]])
If they must go back into a (3,3) object array, I could do:
In [146]: goa1=np.empty([3,3],dtype=object)
In [147]: for i in range(3):
for j in range(3):
goa1[i,j]=gga1[i,j]
.....:
In [148]: goa1
Out[148]:
array([[array([[ 1., -0.],
[-0., 1.]]),
array([[ 1., -0.],
[ 0., 1.]]),
array([[ 1., -0.],
...
[ 0., 1.]]),
array([[ 1., 0.],
[ 0., 1.]])]], dtype=object)
I still wonder what's the point to working with a object array.

puzzled on how to slice a numpy array

m is a ndarray with shape (12, 21, 21), now I want to take only a sparse slice of it to form a new 2D array, with
sliceid = 0
indx = np.array([0, 2, 4, 6, 8, 10])
so that sparse_slice is, intuitively,
sparse_slice = m[sliceid, indx, indx]
but apparently the above operation does not work, currently what I am using is
sparse_slice = m[sliceid,indx,:][:, indx]
why is the first "intuitive" way not working? and is there a more compact way than my current solution? all my previous ndarray slicing trials were based on nothing but intuition, maybe I shall switch to read some serious manual now...
The more compact way is to do new = m[0, :12:2, :12:2]. This is what the numpy docs call "basic indexing" meaning that you slice with an integer or a slice object (ie 0:12:2). When you use basic indexing numpy returns a view of the original array. For example:
In [3]: a = np.zeros((2, 3, 4))
In [4]: b = a[0, 1, ::2]
In [5]: b
Out[5]: array([ 0., 0.])
In [6]: b[:] = 7
In [7]: a
Out[7]:
array([[[ 0., 0., 0., 0.],
[ 7., 0., 7., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]]])
In you "intuitive" approach what you're doing is indexing an array with another array. When you index an numpy array with another array the arrays need to be the same size (or they need to broadcast against each other, more on this in a sec). In the docs this is called fancy indexing or advanced indexing. For example:
In [10]: a = np.arange(9).reshape(3,3)
In [11]: a
Out[11]:
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
In [12]: index = np.array([0,1,2])
In [13]: b = a[index, index]
In [14]: b
Out[14]: array([0, 4, 8])
You see that I get a[0,0], a[1,1], and a[2,2] not a[0,0], a[0,1] ... If you want the "outer product" of index with index you can do the following.
In [22]: index1 = np.array([[0,0],[1,1]])
In [23]: index2 = np.array([[0,1],[0,1]])
In [24]: b = a[index1, index2]
In [25]: b
Out[25]:
array([[0, 1],
[3, 4]])
There is a shorthand for doing the above, like this:
In [28]: index = np.array([0,1])
In [29]: index1, index2 = np.ix_(index, index)
In [31]: index1
Out[31]:
array([[0],
[1]])
In [32]: index2
Out[32]: array([[0, 1]])
In [33]: a[index1, index2]
Out[33]:
array([[0, 1],
[3, 4]])
In [34]: a[np.ix_(index, index)]
Out[34]:
array([[0, 1],
[3, 4]])
You'll notice that index1 is (2, 1) and index2 is (1, 2), not (2, 2). That's because the two arrays get broadcast against one another, you can read more about broadcasting here. Keep in mind that when you're using fancy indexing you get a copy of the original data not a view. Sometimes this is better (if you want to leave the original data unchanged) and sometimes it just takes more memory. More about indexing here.
If I'm not mistaken, for input m = np.array(range(5292)).reshape(12,21,21) you are expecting output sparse_slice = m[sliceid,indx,:][:, indx] of
array([[ 0, 2, 4, 6, 8, 10],
[ 42, 44, 46, 48, 50, 52],
[ 84, 86, 88, 90, 92, 94],
[126, 128, 130, 132, 134, 136],
[168, 170, 172, 174, 176, 178],
[210, 212, 214, 216, 218, 220]])
In that case, you can get it using the step part of a slice:
m[0, :12:2, :12:2]

Categories