I have 2-D data containing bad values (0 indicates bad). My goal is to replace each bad value with its nearest neighbor that isn't bad.
SciPy's NearestNDInterpolator seems like a nice way to do this. In the 2-D case, it accepts a (number of points) x 2 array of indices and a (number of points) x 1 array of corresponding values to interpolate from.
So, I need to get a subset of the indices and values: those that are "good." The code below achieves this, but coordinates = array(list(ndindex(n_y, n_x))) and where(values != 0)[0] are both messy. Is there a cleaner way to do this?
# n_y and n_x are the number of points along each dimension.
coordinates = array(list(ndindex(n_y, n_x)))
values = data.flatten()
nonzero_ind = where(values != 0)[0]
nonzero_coordinates = coordinates[nonzero_ind, :]
nonzero_values = values[nonzero_ind]
Thanks.
nonzero_coordinates = np.argwhere(data != 0)
nonzero_values = np.extract(data, data)
or simply:
nonzero_values = data[data!=0]
I initially rather missed the obvious nonzero_values method, but thanks to #askewchan in the comments for that.
So, I need to get a subset of the indices and values: those that are "good."
If you've created a "mask" of the bad indices, you can take the negation of that mask ~ and then find the indices from the mask using np.where. For example:
import numpy as np
# Sample array
Z = np.random.random(size=(5,5))
# Use whatever criteria you have to mark the bad indices
bad_mask = Z<.2
good_mask = ~bad_mask
good_idx = np.where(good_mask)
print good_mask
print good_idx
Gives, as an example:
[[ True True True True False]
[ True False False True True]
[ True False True True True]
[ True True True True True]
[ True True True True True]]
(array([0, 0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4]), array([0, 1, 2, 3, 0, 3, 4, 0, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))
Another way to approach this problem altogether is to just run your array through an image filter that would automatically 'close' those holes. There is such a filter in scipy.ndimage called grey_closing:
>>> from scipy import ndimage
>>> a = np.arange(1,26).reshape(5,5)
>>> a[2,2] = 0
>>> a
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 0, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
>>> a = np.arange(1,26).reshape(5,5)
>>> ndimage.grey_closing(a, size=2)
array([[ 7, 7, 8, 9, 10],
[ 7, 7, 8, 9, 10],
[12, 12, 13, 14, 15],
[17, 17, 18, 19, 20],
[22, 22, 23, 24, 25]])
But this has unfortunate edge affects (which you can change a bit with the mode paramenter). To avoid this, you could just take the new values from where the original array was 0 and place them into the original array:
>>> np.where(a, a, ndimage.grey_closing(a, size=2))
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 12, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
Alternatively, you could use scikit-image:
>>> from skimage.morphology import closing, square
>>> a = np.arange(1,10, dtype=np.uint8).reshape(3,3)
>>> a[1,1] = 0
>>> a
array([[1, 2, 3],
[4, 0, 6],
[7, 8, 9]], dtype=uint8)
>>> closing(a, square(2))
array([[1, 2, 3],
[4, 4, 6],
[7, 8, 9]], dtype=uint8)
>>> a
array([[1, 2, 3],
[4, 0, 6],
[7, 8, 9]], dtype=uint8)
Give it a as the output array and the closing is done in-place:
>>> closing(a, square(2), a)
>>> a
array([[1, 2, 3],
[4, 4, 6],
[7, 8, 9]], dtype=uint8)
Use a larger square (or any shape from skimage.morphology) if you have big gaps of zeros. The disadvantage of this (aside from the dependency) is that it seems to only work for uint8.
Related
I have a numpy array with integers as entries. I need to find the indices corresponding to runs of integers in the array. For example if my array is a = [1, 2, 3, 6, 7, 8], then the indices I want are [[0, 1, 2], [3, 4, 5]]. I have tried the following piece of code which uses a loop.
import numpy as np
idx = np.array([0, 1, 2, 3, 8, 9, 10, 11, 12, 2, 3, 4, 5, 6, 55, 56, 89, 90, 91, 92])
idx_cpy = idx
indices = []
while len(idx) > 0:
sorted_idx = np.arange(idx[0], idx.max()+1)[:len(idx)]
bool_equal = np.equal(sorted_idx, idx)
true_idx = np.argwhere(bool_equal == True)[:, 0]
try:
indices.append(np.array(indices[-1]).max() + 1 + true_idx)
except IndexError:
indices.append(true_idx)
idx = idx[true_idx.max()+1:]
"""
indices =
[array([0, 1, 2, 3], dtype=int64),
array([4, 5, 6, 7, 8], dtype=int64),
array([ 9, 10, 11, 12, 13], dtype=int64),
array([14, 15], dtype=int64),
array([16, 17, 18, 19], dtype=int64)]
"""
Although this works as expected, the variable idx in my actual code has large length and this takes a long time to complete. Is there a vectorized way of doing this? Thanks.
You can find the "turning points" with np.diff and check where it differs from 1. To include end points, we pass prepend and append to it such that the difference isn't 1 in those places and they get counted too. Then a list comprehension with np.arange gives the final result:
>>> turnings, = np.where(np.diff(a, prepend=a[0], append=a[-1]) != 1)
>>> turnings
array([ 0, 4, 9, 14, 16, 20], dtype=int64)
>>> result = [np.arange(pre, nex) for pre, nex in zip(turnings, turnings[1:])]
>>> result
[array([0, 1, 2, 3], dtype=int64),
array([4, 5, 6, 7, 8], dtype=int64),
array([ 9, 10, 11, 12, 13], dtype=int64),
array([14, 15], dtype=int64),
array([16, 17, 18, 19], dtype=int64)]
I want to merge multiple 2d Numpy array of shapes let say (r, a) ,(r,b) ,(r,c),...(r,z) into single 2d array of shape (r,a+b+c...+z)
I tried np.hstack but it needs the same shape & np.concat operates only on tuple as 2nd array.
You can use np.concatenate or np.hstack. Here is an example:
>>> a = np.arange(15).reshape(5,3)
>>> a
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11],
[12, 13, 14]])
>>> b = np.arange(10).reshape(5,2)
>>> b
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> np.concatenate((a,b), axis =1)
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
>>> np.hstack((a,b))
array([[ 0, 1, 2, 0, 1],
[ 3, 4, 5, 2, 3],
[ 6, 7, 8, 4, 5],
[ 9, 10, 11, 6, 7],
[12, 13, 14, 8, 9]])
Hope it helps
I am new to numpy but I think its not possible. Precondition is
"The arrays must have the same shape along all but the second axis, except 1-D arrays which can be any length."
Actually one of my function was returning scipy.sparse.csr.csr_matrix and I was converting it into np.array along with lists returned by another function so that I can merge all them but the sparse matrix was converted into
array(<73194x17 sparse matrix of type '' with 203371 stored elements in Compressed Sparse Row format>, dtype=object)
which was not compatible with np.hstack.
so sorry for the inconvenience.
I figured out my solution instead of numpy.hstack i used scipy hstack function.
Thank you, Everyone, for responding.
I've got K feature vectors that all share dimension n but have a variable dimension m (n x m). They all live in a list together.
to_be_padded = []
to_be_padded.append(np.reshape(np.arange(9),(3,3)))
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
to_be_padded.append(np.reshape(np.arange(18),(3,6)))
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
to_be_padded.append(np.reshape(np.arange(15),(3,5)))
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
What I am looking for is a smart way to zero pad the rows of these np.arrays such that they all share the same dimension m. I've tried solving it with np.pad but I have not been able to come up with a pretty solution. Any help or nudges in the right direction would be greatly appreciated!
The result should leave the arrays looking like this:
array([[0, 1, 2, 0, 0, 0],
[3, 4, 5, 0, 0, 0],
[6, 7, 8, 0, 0, 0]])
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17]])
array([[ 0, 1, 2, 3, 4, 0],
[ 5, 6, 7, 8, 9, 0],
[10, 11, 12, 13, 14, 0]])
You could use np.pad for that, which can also pad 2-D arrays using a tuple of values specifying the padding width, ((top, bottom), (left, right)). For that you could define:
def pad_to_length(x, m):
return np.pad(x,((0, 0), (0, m - x.shape[1])), mode = 'constant')
Usage
You could start by finding the ndarray with the highest amount of columns. Say you have two of them, a and b:
a = np.array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
b = np.array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
m = max(i.shape[1] for i in [a,b])
# 5
And then use this parameter to pad the ndarrays:
pad_to_length(a, m)
array([[0, 1, 2, 0, 0],
[3, 4, 5, 0, 0],
[6, 7, 8, 0, 0]])
I believe there is no very efficient solution for this. I think you will need to loop over the list with a for loop and treat every array individually:
for i in range(len(to_be_padded)):
padded = np.zeros((n, maxM))
padded[:,:to_be_padded[i].shape[1]] = to_be_padded[i]
to_be_padded[i] = padded
where maxM is the longest m of the matrices in your list.
Given a matrix A with dimensions axa, and B with dimensions bxb, and axa modulo bxb == 0. B is a submatrix(s) of A starting at (0,0) and tiled until the dimensions of axa is met.
A = array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
An example of a submatrix might be:
B = array([[10, 11],
[14, 15]])
Where the number 15 is in position (1, 1) with respect to B's coordinates.
How could I return a view on the array A, for a particular position in B? For example for position (1,1) in B, I want to get all such values from A:
C = array([[5, 7],
[13, 15]])
The reason I want a view, is that I wish to update multiple positions in A:
C = array([[5, 7],[13, 15]]) = 20
results in
A = array([[ 0, 1, 2, 3],
[ 4, 20, 6, 20],
[ 8, 9, 10, 11],
[12, 20, 14, 20]])
You can obtain this as follows:
>>> A = np.array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> A[np.ix_([1,3],[1,3])] = 20
>>> A
array([[ 0, 1, 2, 3],
[ 4, 20, 6, 20],
[ 8, 9, 10, 11],
[12, 20, 14, 20]])
For more info about np.ix_ could review the NumPy documentation
I have two arrays.
"a", a 2d numpy array.
import numpy.random as npr
a = array([[5,6,7,8,9],[10,11,12,14,15]])
array([[ 5, 6, 7, 8, 9],
[10, 11, 12, 14, 15]])
"idx", a 3d numpy array constituting three index variants I want to use to index "a".
idx = npr.randint(5, size=(nsamp,shape(a)[0], shape(a)[1]))
array([[[1, 2, 1, 3, 4],
[2, 0, 2, 0, 1]],
[[0, 0, 3, 2, 0],
[1, 3, 2, 0, 3]],
[[2, 1, 0, 1, 4],
[1, 1, 0, 1, 0]]])
Now I want to index "a" three times with the indices in "idx" to obtain an object as follows:
array([[[6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
The naive "a[idx]" does not work. Any ideas as to how to do this? (I use Python 3.4 and numpy 1.9)
You can use choose to make the selection from a:
>>> np.choose(idx, a.T[:,:,np.newaxis])
array([[[ 6, 7, 6, 8, 9],
[12, 10, 12, 10, 11]],
[[ 5, 5, 8, 7, 5],
[11, 14, 12, 10, 14]],
[[ 7, 6, 5, 6, 9],
[11, 11, 10, 11, 10]]])
As you can see, a has to be reshaped from an array with shape (2, 5) to an array with shape (5, 2, 1) first. This is essentially so that it is broadcastable with idx, which has shape (3, 2, 5).
(I learned this method from #immerrr's answer here: https://stackoverflow.com/a/26225395/3923281)
You can use take array method:
import numpy
a = numpy.array([[5,6,7,8,9],[10,11,12,14,15]])
idx = numpy.random.randint(5, size=(3, a.shape[0], a.shape[1]))
print a.take(idx)