Is there a way to enumerate over the non-masked locations of a masked numpy ndarray (e.g. in the way that ndenumerate does it for regular ndarrays, but omitting all the masked entries)?
EDIT: to be more precise: the enumeration should not only skip over the masked entries, but also show the indices of the non-masked ones in the original array. E.g. if the first five elements of a 1-d array are masked, and the next one has an unmasked value of 3, then the enumeration should start with something like ((5,), 3), ....
Thanks!
PS: note that, although it is possible to apply ndenumerate to a masked ndarray, the resulting enumeration does not discriminate between its masked and normal entries. In fact, ndenumerate not only does not filter out the masked entries from the enumeration, but it doesn't even replace the enumerated values with the masked constant. Therefore, one can't adapt ndenumerate for this task by just wrapping ndenumerate with a suitable filter.
You can access only valid entries using inverse of a mask as an index:
>>> import numpy as np
>>> import numpy.ma as ma
>>> x = np.array([11, 22, -1, 44])
>>> m_arr = ma.masked_array(x, mask=[0, 0, 1, 0])
>>> for index, i in np.ndenumerate(m_arr[~m_arr.mask]):
print index, i
(0,) 11
(1,) 22
(2,) 44
See this for details.
The enumeration over only valid entries with indices from the original array:
>>> for (index, val), m in zip(np.ndenumerate(m_arr), m_arr.mask):
if not m:
print index, val
(0,) 11
(1,) 22
(3,) 44
How about:
import numpy as np
import itertools
def maenumerate(marr):
mask = ~marr.mask.ravel()
for i, m in itertools.izip(np.ndenumerate(marr), mask):
if m: yield i
N = 12
a = np.arange(N).reshape(2, 2, 3)+10
b = np.ma.array(a, mask = (a%5 == 0))
for i, val in maenumerate(b):
print i, val
which yields
(0, 0, 1) 11
(0, 0, 2) 12
(0, 1, 0) 13
(0, 1, 1) 14
(1, 0, 0) 16
(1, 0, 1) 17
(1, 0, 2) 18
(1, 1, 0) 19
(1, 1, 2) 21
Related
I have an array A1 with shape 2x3 & list A2. I want to extract the index value of array from the list.
Example
A1 = [[0, 1, 2]
[3, 4, 5]] # Shape 2 rows & 3 columns
A2 = [0,1,2,3,4,5]
Now, I want to write a code to access the an element's index in Array A1
Expected Output
A2[3] = (1,0) #(1 = row & 0 = column) Index of No.3 in A1
Please help me. Thank you
There is some ambiguity in the question. Are we looking for the indices of elements by value, or by order?
Unravel an ordinal index
Assuming that the values in A1 are not important (i.e. this is not a search of certain values, but really finding the index corresponding to a location), you can use unravel_index for that.
Example:
>>> np.unravel_index(3, A1.shape)
(1, 0)
Or, on the whole A2 in one shot:
>>> np.unravel_index(A2, np.array(A1).shape)
(array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 1, 2]))
which you may prefer as a list of tuples ("transpose" of the above):
>>> list(zip(*np.unravel_index(A2, np.array(A1).shape)))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)]
Search for a value
If, instead, you are searching for values, e.g., where in A1 are there values equal to A2[i], then, like in #dc_Bita98's answer:
>>> tuple(np.argwhere(A1 == A2[3]).squeeze())
(1, 0)
If you want all the locations in one shot, you need to do something to handle the fact that the shapes are different. Say also, for sake of illustration, that:
A3 = np.array([9, 1, 0, 1])
Then, either:
>>> i, j, k = np.where(A1 == A3[:, None, None])
>>> out = np.full(A3.shape, (,), dtype=object)
>>> out[i] = list(zip(j, k))
>>> out.tolist()
[None, (1, 0), (2, 0), (3, 0)]
which clearly indicates that the first value (9) was not found, and where to find the others.
Or:
>>> [tuple(np.argwhere(A1 == v).squeeze()) for v in A3]
[None, (0, 1), (0, 0), (0, 1)]
If you can use numpy, check out argwhere
a1 = np.array([[0,1,2],[3,4,5]])
a2 = [0,1,2,3,4,5]
a3 = np.argwhere(a1 == a2[3]).squeeze() # -> (1, 0)
I'm working with a problem using numpy arrays and I hit a roadblock, basically I have two arrays, one with a 2D numpy array and the other is a 1D numpy which represents some index of the 2D array, what I need is to use pairs of this indexes to extract a 2D numpy array from the original 2D array, I did something, but I'm sure it can be better, so I'm asking for advice. Here is my code:
import numpy as np
import itertools
x = np.arange(25).reshape(5, 5) #Original Array
#x = [[ 0 1 2 3 4]
# [ 5 6 7 8 9]
# [10 11 12 13 14]
# [15 16 17 18 19]
# [20 21 22 23 24]]
y = np.array([0, 2, 4]) #Indexes
idx = list(itertools.product(y, repeat = 2)) #This create a combination of the indexes to act as my coordinates from the array
#idx = [(0, 0), (0, 2), (0, 4), (2, 0), (2, 2), (2, 4), (4, 0), (4, 2), (4, 4)]
newarray = np.array([x[i] for i in idx]).reshape(3, 3) #This uses the tuples from before to extract the values of the original array
#newarray = [[ 0 2 4]
# [10 12 14] #The extracted array
# [20 22 24]]
So it works, but I think there is a lot to improve, for example, in the final step I use a list comprehesion and then a numpy array, and then a reshape, also I'm not sure if it's okay to create all the combinations of the index array maybe there is a easier way, so any advice will be appreciated, thank you!
x[::2, ::2]
will select every other row and column
For a less regular pattern try
x[y[:,None], y]
which uses advanced indexing
Numpy has some sophisticated indexing options. Also remember that reshape is free; never be afraid to reshape.
import numpy as np
import itertools
x = np.arange(25).reshape(5, 5) #Original Array
y = [0, 2, 4]
idx = list(itertools.product(y, repeat = 2)) #This create a combination of the indexes to act as my coordinates from the array
idx0 = [k[0] for k in idx]
idx1 = [k[1] for k in idx]
print(idx)
newarray = x[idx0,idx1].reshape((3,3))
print(newarray)
Output:
[(0, 0), (0, 2), (0, 4), (2, 0), (2, 2), (2, 4), (4, 0), (4, 2), (4, 4)]
[[ 0 2 4]
[10 12 14]
[20 22 24]]
I think what you hate is the for keyword (like me). And in fact you don't need itertools.
So my answer would be:
import numpy as np
x = np.arange(25).reshape(5, 5)
y = np.array([0, 2, 4])
ny = y.size
i = y.reshape(ny, 1)
j = y.repeat(ny).reshape(ny, ny).T
print(x[i, j])
Output:
[[ 0 2 4]
[10 12 14]
[20 22 24]]
how I convert dtype=object to int 32, I need to do this for insert array in right row of 2 d array, you can see in output. np.insert() not work properly becuse of the miss match of type, so I need to convert.
import numpy as np
arr = np.zeros(2,object)
arr[0] = np.array([0,0,0])
arr[1] = np.array([0,0,0])
p=np.insert(arr, 2, np.array([1,1,1]), 0)
print(p)
print(arr.dtype)
Output:
[array([0, 0, 0]) array([0, 0, 0]) 1 1 1]
object
Given a numpy array (let it be a bit array for simplicity), how can I construct a new array of the same shape where 1 stands exactly at the positions where in the original array there was a zero, preceded by at least N-1 consecutive zeros?
For example, what is the best way to implement function nzeros having two arguments, a numpy array and the minimal required number of consecutive zeros:
import numpy as np
a = np.array([0, 0, 0, 0, 1, 0, 0, 0, 1, 1])
b = nzeros(a, 3)
Function nzeros(a, 3) should return
array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0])
Approach #1
We can use 1D convolution -
def nzeros(a, n):
# Define kernel for 1D convolution
k = np.ones(n,dtype=int)
# Get sliding summations for zero matches with that kernel
s = np.convolve(a==0,k)
# Look for summations that are equal to n value, which will occur for
# n consecutive 0s. Remember that we are using a "full" version of
# convolution, so there's one-off offsetting because of the way kernel
# slides across input data. Also, we need to create 1s at places where
# n consective 0s end, so we would need to slice out ending elements.
# Thus, we would end up with the following after int dtype conversion
return (s==n).astype(int)[:-n+1]
Sample run -
In [46]: a
Out[46]: array([0, 0, 0, 0, 1, 0, 0, 0, 1, 1])
In [47]: nzeros(a,3)
Out[47]: array([0, 0, 1, 1, 0, 0, 0, 1, 0, 0])
In [48]: nzeros(a,2)
Out[48]: array([0, 1, 1, 1, 0, 0, 1, 1, 0, 0])
Approach #2
Another way to solve and this could be considered as a variant of the 1D convolution approach, would be to use erosion, because if you look at the outputs, we can simply erode the mask of 0s from the starts until n-1 places. So, we can use scipy.ndimage.morphology's binary_erosion that also allow us to specify the portion of kernel center with its origin arg, hence we will avoid any slicing. The implementation would look something like this -
from scipy.ndimage.morphology import binary_erosion
out = binary_erosion(a==0,np.ones(n),origin=(n-1)//2).astype(int)
Using for loop:
def nzeros(a, n):
#Create a numpy array of zeros of length equal to n
b = np.zeros(n)
#Create a numpy array of zeros of same length as array a
c = np.zeros(len(a), dtype=int)
for i in range(0,len(a) - n):
if (b == a[i : i+n]).all(): #Check if array b is equal to slice in a
c[i+n-1] = 1
return c
Sample Output:
print(nzeros(a, 3))
[0 0 1 1 0 0 0 1 0 0]
I have a large (79 000 x 480 000) sparse csr matrix. I am trying to remove all columns (within a certain range) for which each value < k.
In regular numpy matrices this is simply done by a mask:
m = np.array([[0,2,1,1],
[0,4,2,0],
[0,3,4,0]])
mask = (arr < 2)
idx = mask.all(axis=0)
result = m[:, ~idx]
print result
>>> [[2 1]
[4 2]
[3 4]]
The unary bitwise negation operator ~ and boolean mask functionality are not available for sparse matrices however. What is the best method to:
Obtain the indices of columns where all values fulfill condition e < k.
Remove these columns based on the list of indices.
Some things to note:
The columns represent ngram text features: there are no columns in the matrix for which each element is zero.
Is using the csr matrix format even a plausible choice for this?
Do I transpose and make use of .nonzero()? I have a fair amount of working memory (192GB) so time efficiency is preferable to memory efficiency.
If I do
M = sparse.csr_matrix(m)
M < 2
I get an efficiency warning; all the 0 values of M satisfy the condition,
In [1754]: print(M)
(0, 1) 2
(0, 2) 1
(0, 3) 1
(1, 1) 4
(1, 2) 2
(2, 1) 3
(2, 2) 4
In [1755]: print(M<2)
/usr/lib/python3/dist-packages/scipy/sparse/compressed.py:275: SparseEfficiencyWarning: Comparing a sparse matrix with a scalar greater than zero using < is inefficient, try using >= instead.
warn(bad_scalar_msg, SparseEfficiencyWarning)
(0, 0) True # not in M
(0, 2) True
(0, 3) True
(1, 0) True # not in M
(1, 3) True
(2, 0) True # not in M
(2, 3) True
In [1756]: print(M>=2) # all a subset of M
(0, 1) True
(1, 1) True
(1, 2) True
(2, 1) True
(2, 2) True
If I=M>=2; there isn't an all method, but there is a sum.
In [1760]: I.sum(axis=0)
Out[1760]: matrix([[0, 3, 2, 0]], dtype=int32)
sum is actually performed using a matrix multiplication
In [1769]: np.ones((1,3),int)*I
Out[1769]: array([[0, 3, 2, 0]], dtype=int32)
Using nonzero to find the nonzero columns:
In [1778]: np.nonzero(I.sum(axis=0))
Out[1778]: (array([0, 0], dtype=int32), array([1, 2], dtype=int32))
In [1779]: M[:,np.nonzero(I.sum(axis=0))[1]]
Out[1779]:
<3x2 sparse matrix of type '<class 'numpy.int32'>'
with 6 stored elements in Compressed Sparse Row format>
In [1780]: M[:,np.nonzero(I.sum(axis=0))[1]].A
Out[1780]:
array([[2, 1],
[4, 2],
[3, 4]], dtype=int32)
General points:
watch out for those 0 values when doing comparisons
watch out for False values when doing logic on sparse matrices
sparse matrices are optimized for math, especially matrix multiplication
sparse indexing isn't quite as powerful as array indexing; and not as fast either.
note when operations produce a dense matrix