Use numpy to mask a row containing only zeros - python

I have a large array of point cloud data which is generated using the azure kinect. All erroneous measurements are assigned the coordinate [0,0,0]. I want to remove all coordinates with the value [0,0,0]. Since my array is rater large (1 million points) and since U need to do this process in real-time, speed is of the essence.
In my current approach I try to use numpy to mask out all rows that contain three zeroes ([0,0,0]). However, the np.ma.masked_equal function does not evaluate an entire row, but only evaluates single elements. As a result, rows that contain at least one 0 are already filtered by this approach. I only want rows to be filtered when all values in the row are 0. Find an example of my code below:
my_data = np.array([[1,2,3],[0,0,0],[3,4,5],[2,5,7],[0,0,1]])
my_data = np.ma.masked_equal(my_data, [0,0,0])
my_data = np.ma.compress_rows(my_data)
output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7]])
desired output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])`

Find all data points that are 0 (doesn't require np.ma module) and then select all rows that do not contain all zeros:
import numpy as np
my_data = np.array([[1, 2, 3], [0, 0, 0], [3, 4, 5], [2, 5, 7], [0, 0, 1]])
my_data[~(my_data == 0).all(axis= 1)]
Output:
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])

Instead of using the np.ma.masked_equal and np.ma.compress_rows functions, you can use the np.all function to check if all values in a row are equal to [0, 0, 0]. This should be faster than your method as it evaluates all values in a row at once.
mask = np.all(my_data == [0, 0, 0], axis=1)
my_data = my_data[~mask]

Related

Remove all zero rows and columns in one go in Python

I want to remove all zero rows and columns in one line from the array A1. I present the current and expected outputs.
import numpy as np
A1=np.array([[0, 0, 0],
[0, 1, 2],
[0, 3, 4]])
A1 = A1[~np.all(A1 == 0, axis=0)]
print([A1])
The current output is
[array([[0, 1, 2],
[0, 3, 4]])]
The expected output is
[array([[1, 2],
[3, 4]])]
Not really sure your example works, but given the description in the title - for a matrix matrix, you can use
mask = matrix != 0
new_matrix = matrix[np.ix_(mask.any(1), mask.any(0))]
you can check out this post about np.ix_

python replace value in array based on previous and following value in column

given the following array, I want to replace the zero with their previous value columnwise as long as it is surrounded by two values greater than zero.
I am aware of np.where but it would consider the whole array instead of its columns.
I am not sure how to do it and help would be appreciated.
This is the array:
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
and since the only zero that meets this condition is the second row/second column one,
the new array should be the following
new_a=np.array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
How do I accomplish this?
And what if I would like to extend the gap surrounded by nonzero ? For instance, the first column contains two 0 and the second column contains one 0, so the new array would be
new_a=np.array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
In short, how do I solve this if the columnwise condition would be the one of having N consecutive zeros or less?
As a generic method, I would approach this using a convolution:
from scipy.signal import convolve2d
# kernel for top/down neighbors
kernel = np.array([[1],
[0],
[1]])
# is the value a zero?
m1 = a==0
# count non-zeros neighbors
m2 = convolve2d(~m1, kernel, mode='same') > 1
mask = m1&m2
# replace matching values with previous row value
a[mask] = np.roll(a, 1, axis=0)[mask]
output:
array([[4, 3, 3, 2],
[0, 3, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
filling from surrounding values
Using pandas to benefit from ffill/bfill (you can forward-fill in pure numpy but its more complex):
import pandas as pd
df = pd.DataFrame(a)
# limit for neighbors
N = 2
# identify non-zeros
m = df.ne(0)
# mask zeros
m2 = m.where(m)
# mask for values with 2 neighbors within limits
mask = m2.ffill(limit=N) & m2.bfill(limit=N)
df.mask(mask&~m).ffill()
array([[4, 3, 3, 2],
[4, 3, 1, 2],
[4, 4, 2, 4],
[2, 4, 3, 0]])
That's one solution I found. I know it's basic but I think it works.
a=np.array([[4, 3, 3, 2],
[0, 0, 1, 2],
[0, 4, 2, 4],
[2, 4, 3, 0]])
a_t = a.T
for i in range(len(a_t)):
ar = a_t[i]
for j in range(len(ar)-1):
if (j>0) and (ar[j] == 0) and (ar[j+1] > 0):
a_t[i][j] = a_t[i][j-1]
a = a_t.T

Replace numpy subarray when element matches a condition

I have an n x m x 3 numpy array. This represents a middle-step towards an RGB representation of a complex-function plotter. When the function being plotted takes infinite values or has singularities, parts of the RGB data become NaNs.
I'm looking for an efficient way to replace a row containing a NaN with a row of my choice, perhaps [0, 0, 0] or [1, 1, 1]. In terms of the RGB values, this has the effect of replacing poorly-behaving pixels with white or black pixels. By efficient, I mean some way that takes advantage of numpy's vectorization and speed.
Please note that I am not looking to merely replace the NaN values with 0 (which I know how to do with numpy.where); if a row contains a NaN, I want to replace the whole row. I suspect this can be done nicely in numpy, but I'm not sure how.
Concrete Question
Suppose we are given a 2 x 2 x 3 array arr. If a row contains a 5, I want to replace the row with [0, 0, 0]. Trivial code that does this slowly is as follows.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
# so arr is
# array([[[1, 2, 3],
# [4, 5, 6]],
#
# [[1, 3, 5],
# [2, 4, 6]]])
# Trivial and slow version to replace rows containing 5 with [0,0,0]
for i in range(len(arr)):
for j in range(len(arr[i])):
if 5 in arr[i][j]:
arr[i][j] = np.array([0, 0, 0])
# Now arr is
#
# array([[[1, 2, 3],
# [0, 0, 0]],
#
# [[0, 0, 0],
# [2, 4, 6]]])
How can we accomplish this taking advantage of numpy?
A simpler way would be -
arr[np.isin(arr,5).any(-1)] = 0
If it's just a single value that you are looking for, then we could simplify to -
arr[(arr==5).any(-1)] = 0
If you are looking to match against NaN, we need to do the comparison differently and use np.isnan instead -
arr[np.isnan(arr).any(-1)] = 0
If you are looking to assign array values, instead of just 0, the solutions stay the same. Hence it would be -
arr[(arr==5).any(-1)] = new_array
Using np.broadcast_to
arr[np.broadcast_to((arr == 5).any(-1)[..., None], arr.shape)] = 0
array([[[1, 2, 3],
[0, 0, 0]],
[[0, 0, 0],
[2, 4, 6]]])
Just as FYI, based on your description, if you want to find np.nans instead of integers like 5, you shouldn't use ==, but rather np.isnan
arr[np.broadcast_to((np.isnan(arr)).any(-1)[..., None], arr.shape)] = 0
you can do it using in1d function like below
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
arr[np.in1d(arr,5).reshape(arr.shape).any(axis=2)] = [0,0,0]
arr

Fill several parts of NumPy array, given a list of indexes

I want to fill a numpy.ndarray with data (32x32 pixel integer pictures==arrays)
From the name of the file of the picture I know where in my ndarray I want my values to be stored.
I would like to give my ndarray a list but also some slice(0) in it, because the picture is stored in the last two dimensions. How do I do that?
I would like to do something like
Pesudocode:
data=numpy.ndarray(dim1,dim2,dim3,32,32)
list=function(filename)
data[list,slice(0),slice(0)]=read_image(filename)
Is that possible?
My list has entries specifying the positions of the ndarray [int,int,int] and my read image is a 32 times 32 integer array (filling the last two dimension of my ndarray).
To perform this assignment, pass a suitable array in each of the first three dimensions, and : (meaning entire index range) in the last two dimensions.
If your list is, for example,
list = [[1, 2, 3], [4, 2, 0], [5, 3, 4], [2, 2, 2]]
then the array to pass as the first index is [1, 4, 5, 2], and similarly for two others: [2, 2, 3, 2] and [3, 0, 4, 2]. Complete example with fake (random) image:
data = np.zeros((6, 7, 8, 32, 32))
list = [[1, 2, 3], [4, 2, 0], [5, 3, 4], [2, 2, 2]]
image = np.random.uniform(size=(32, 32))
ix = np.array(list)
data[ix[:, 0], ix[:, 1], ix[:, 2], :, :] = image
Here ix[:, 0] is [1, 4, 5, 2], ix[:, 1] is [2, 2, 3, 2], and so on.
Reference: NumPy indexing and broadcasting.

How to get a value from every column in a Numpy matrix

I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]

Categories