Related
I have a large array of point cloud data which is generated using the azure kinect. All erroneous measurements are assigned the coordinate [0,0,0]. I want to remove all coordinates with the value [0,0,0]. Since my array is rater large (1 million points) and since U need to do this process in real-time, speed is of the essence.
In my current approach I try to use numpy to mask out all rows that contain three zeroes ([0,0,0]). However, the np.ma.masked_equal function does not evaluate an entire row, but only evaluates single elements. As a result, rows that contain at least one 0 are already filtered by this approach. I only want rows to be filtered when all values in the row are 0. Find an example of my code below:
my_data = np.array([[1,2,3],[0,0,0],[3,4,5],[2,5,7],[0,0,1]])
my_data = np.ma.masked_equal(my_data, [0,0,0])
my_data = np.ma.compress_rows(my_data)
output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7]])
desired output
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])`
Find all data points that are 0 (doesn't require np.ma module) and then select all rows that do not contain all zeros:
import numpy as np
my_data = np.array([[1, 2, 3], [0, 0, 0], [3, 4, 5], [2, 5, 7], [0, 0, 1]])
my_data[~(my_data == 0).all(axis= 1)]
Output:
array([[1, 2, 3],
[3, 4, 5],
[2, 5, 7],
[0, 0, 1]])
Instead of using the np.ma.masked_equal and np.ma.compress_rows functions, you can use the np.all function to check if all values in a row are equal to [0, 0, 0]. This should be faster than your method as it evaluates all values in a row at once.
mask = np.all(my_data == [0, 0, 0], axis=1)
my_data = my_data[~mask]
I have a 2D numpy array that I want to extract a submatrix from.
I get the submatrix by slicing the array as below.
Here I want a 3*3 submatrix around an item at the index of (2,3).
>>> import numpy as np
>>> a = np.array([[0, 1, 2, 3],
... [4, 5, 6, 7],
... [8, 9, 0, 1],
... [2, 3, 4, 5]])
>>> a[1:4, 2:5]
array([[6, 7],
[0, 1],
[4, 5]])
But what I want is that for indexes that are out of range, it goes back to the beginning of array and continues from there. This is the result I want:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
I know that I can do things like getting mod of the index to the width of the array; but I'm looking for a numpy function that does that.
And also for an one dimensional array this will cause an index out of range error, which is not really useful...
This is one way using np.pad with wraparound mode.
>>> a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
>>> pad_width = 1
>>> i, j = 2, 3
>>> startrow, endrow = i-1+pad_width, i+2+pad_width # for 3 x 3 submatrix
>>> startcol, endcol = j-1+pad_width, j+2+pad_width
>>> np.pad(a, (pad_width, pad_width), 'wrap')[startrow:endrow, startcol:endcol]
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
Depending on the shape of your patch (eg. 5 x 5 instead of 3 x 3) you can increase the pad_width and start and end row and column indices accordingly.
np.take does have a mode parameter which can wrap-around out of bound indices. But it's a bit hacky to use np.take for multidimensional arrays since the axis must be a scalar.
However, In your particular case you could do this:
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[8, 9, 0, 1],
[2, 3, 4, 5]])
np.take(a, np.r_[2:5], axis=1, mode='wrap')[1:4]
Output:
array([[6, 7, 4],
[0, 1, 8],
[4, 5, 2]])
EDIT
This function might be what you are looking for (?)
def select3x3(a, idx):
x,y = idx
return np.take(np.take(a, np.r_[x-1:x+2], axis=0, mode='wrap'), np.r_[y-1:y+2], axis=1, mode='wrap')
But in retrospect, i recommend using modulo and fancy indexing for this kind of operation (it's basically what the mode='wrap' is doing internally anyways):
def select3x3(a, idx):
x,y = idx
return a[np.r_[x-1:x+2][:,None] % a.shape[0], np.r_[y-1:y+2][None,:] % a.shape[1]]
The above solution is also generalized for any 2d shape on a.
I have an n x m x 3 numpy array. This represents a middle-step towards an RGB representation of a complex-function plotter. When the function being plotted takes infinite values or has singularities, parts of the RGB data become NaNs.
I'm looking for an efficient way to replace a row containing a NaN with a row of my choice, perhaps [0, 0, 0] or [1, 1, 1]. In terms of the RGB values, this has the effect of replacing poorly-behaving pixels with white or black pixels. By efficient, I mean some way that takes advantage of numpy's vectorization and speed.
Please note that I am not looking to merely replace the NaN values with 0 (which I know how to do with numpy.where); if a row contains a NaN, I want to replace the whole row. I suspect this can be done nicely in numpy, but I'm not sure how.
Concrete Question
Suppose we are given a 2 x 2 x 3 array arr. If a row contains a 5, I want to replace the row with [0, 0, 0]. Trivial code that does this slowly is as follows.
import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
# so arr is
# array([[[1, 2, 3],
# [4, 5, 6]],
#
# [[1, 3, 5],
# [2, 4, 6]]])
# Trivial and slow version to replace rows containing 5 with [0,0,0]
for i in range(len(arr)):
for j in range(len(arr[i])):
if 5 in arr[i][j]:
arr[i][j] = np.array([0, 0, 0])
# Now arr is
#
# array([[[1, 2, 3],
# [0, 0, 0]],
#
# [[0, 0, 0],
# [2, 4, 6]]])
How can we accomplish this taking advantage of numpy?
A simpler way would be -
arr[np.isin(arr,5).any(-1)] = 0
If it's just a single value that you are looking for, then we could simplify to -
arr[(arr==5).any(-1)] = 0
If you are looking to match against NaN, we need to do the comparison differently and use np.isnan instead -
arr[np.isnan(arr).any(-1)] = 0
If you are looking to assign array values, instead of just 0, the solutions stay the same. Hence it would be -
arr[(arr==5).any(-1)] = new_array
Using np.broadcast_to
arr[np.broadcast_to((arr == 5).any(-1)[..., None], arr.shape)] = 0
array([[[1, 2, 3],
[0, 0, 0]],
[[0, 0, 0],
[2, 4, 6]]])
Just as FYI, based on your description, if you want to find np.nans instead of integers like 5, you shouldn't use ==, but rather np.isnan
arr[np.broadcast_to((np.isnan(arr)).any(-1)[..., None], arr.shape)] = 0
you can do it using in1d function like below
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 3, 5], [2, 4, 6]]])
arr[np.in1d(arr,5).reshape(arr.shape).any(axis=2)] = [0,0,0]
arr
I'd like to get the index of a value for every column in a matrix M. For example:
M = matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In pseudocode, I'd like to do something like this:
for col in M:
idx = numpy.where(M[col]==0) # Only for columns!
and have idx be 0, 4, 0 for each column.
I have tried to use where, but I don't understand the return value, which is a tuple of matrices.
The tuple of matrices is a collection of items suited for indexing. The output will have the shape of the indexing matrices (or arrays), and each item in the output will be selected from the original array using the first array as the index of the first dimension, the second as the index of the second dimension, and so on. In other words, this:
>>> numpy.where(M == 0)
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
>>> row, col = numpy.where(M == 0)
>>> M[row, col]
matrix([[0, 0, 0]])
>>> M[numpy.where(M == 0)] = 1000
>>> M
matrix([[1000, 1, 1000],
[ 4, 2, 4],
[ 3, 4, 1],
[ 1, 3, 2],
[ 2, 1000, 3]])
The sequence may be what's confusing you. It proceeds in flattened order -- so M[0,2] appears second, not third. If you need to reorder them, you could do this:
>>> row[0,col.argsort()]
matrix([[0, 4, 0]])
You also might be better off using arrays instead of matrices. That way you can manipulate the shape of the arrays, which is often useful! Also note ajcr's transpose-based trick, which is probably preferable to using argsort.
Finally, there is also a nonzero method that does the same thing as where in this case. Using the transpose trick now:
>>> (M == 0).T.nonzero()
(matrix([[0, 1, 2]]), matrix([[0, 4, 0]]))
As an alternative to np.where, you could perhaps use np.argwhere to return an array of indexes where the array meets the condition:
>>> np.argwhere(M == 0)
array([[[0, 0]],
[[0, 2]],
[[4, 1]]])
This tells you each the indexes in the format [row, column] where the condition was met.
If you'd prefer the format of this output array to be grouped by column rather than row, (that is, [column, row]), just use the method on the transpose of the array:
>>> np.argwhere(M.T == 0).squeeze()
array([[0, 0],
[1, 4],
[2, 0]])
I also used np.squeeze here to get rid of axis 1, so that we are left with a 2D array. The sequence you want is the second column, i.e. np.argwhere(M.T == 0).squeeze()[:, 1].
The result of where(M == 0) would look something like this
(matrix([[0, 0, 4]]), matrix([[0, 2, 1]])) First matrix tells you the rows where 0s are and second matrix tells you the columns where 0s are.
Out[4]:
matrix([[0, 1, 0],
[4, 2, 4],
[3, 4, 1],
[1, 3, 2],
[2, 0, 3]])
In [5]: np.where(M == 0)
Out[5]: (matrix([[0, 0, 4]]), matrix([[0, 2, 1]]))
In [6]: M[0,0]
Out[6]: 0
In [7]: M[0,2] #0th row 2nd column
Out[7]: 0
In [8]: M[4,1] #4th row 1st column
Out[8]: 0
This isn't anything new on what's been already suggested, but a one-line solution is:
>>> np.where(np.array(M.T)==0)[-1]
array([0, 4, 0])
(I agree that NumPy matrix objects are more trouble than they're worth).
>>> M = np.array([[0, 1, 0],
... [4, 2, 4],
... [3, 4, 1],
... [1, 3, 2],
... [2, 0, 3]])
>>> [np.where(M[:,i]==0)[0][0] for i in range(M.shape[1])]
[0, 4, 0]
How do I sort a NumPy array by its nth column?
For example, given:
a = array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
I want to sort the rows of a by the second column to obtain:
array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
To sort by the second column of a:
a[a[:, 1].argsort()]
#steve's answer is actually the most elegant way of doing it.
For the "correct" way see the order keyword argument of numpy.ndarray.sort
However, you'll need to view your array as an array with fields (a structured array).
The "correct" way is quite ugly if you didn't initially define your array with fields...
As a quick example, to sort it and return a copy:
In [1]: import numpy as np
In [2]: a = np.array([[1,2,3],[4,5,6],[0,0,1]])
In [3]: np.sort(a.view('i8,i8,i8'), order=['f1'], axis=0).view(np.int)
Out[3]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
To sort it in-place:
In [6]: a.view('i8,i8,i8').sort(order=['f1'], axis=0) #<-- returns None
In [7]: a
Out[7]:
array([[0, 0, 1],
[1, 2, 3],
[4, 5, 6]])
#Steve's really is the most elegant way to do it, as far as I know...
The only advantage to this method is that the "order" argument is a list of the fields to order the search by. For example, you can sort by the second column, then the third column, then the first column by supplying order=['f1','f2','f0'].
You can sort on multiple columns as per Steve Tjoa's method by using a stable sort like mergesort and sorting the indices from the least significant to the most significant columns:
a = a[a[:,2].argsort()] # First sort doesn't need to be stable.
a = a[a[:,1].argsort(kind='mergesort')]
a = a[a[:,0].argsort(kind='mergesort')]
This sorts by column 0, then 1, then 2.
In case someone wants to make use of sorting at a critical part of their programs here's a performance comparison for the different proposals:
import numpy as np
table = np.random.rand(5000, 10)
%timeit table.view('f8,f8,f8,f8,f8,f8,f8,f8,f8,f8').sort(order=['f9'], axis=0)
1000 loops, best of 3: 1.88 ms per loop
%timeit table[table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
import pandas as pd
df = pd.DataFrame(table)
%timeit df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
So, it looks like indexing with argsort is the quickest method so far...
From the NumPy mailing list, here's another solution:
>>> a
array([[1, 2],
[0, 0],
[1, 0],
[0, 2],
[2, 1],
[1, 0],
[1, 0],
[0, 0],
[1, 0],
[2, 2]])
>>> a[np.lexsort(np.fliplr(a).T)]
array([[0, 0],
[0, 0],
[0, 2],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 2],
[2, 1],
[2, 2]])
As the Python documentation wiki suggests:
a = ([[1, 2, 3], [4, 5, 6], [0, 0, 1]]);
a = sorted(a, key=lambda a_entry: a_entry[1])
print a
Output:
[[[0, 0, 1], [1, 2, 3], [4, 5, 6]]]
I had a similar problem.
My Problem:
I want to calculate an SVD and need to sort my eigenvalues in descending order. But I want to keep the mapping between eigenvalues and eigenvectors.
My eigenvalues were in the first row and the corresponding eigenvector below it in the same column.
So I want to sort a two-dimensional array column-wise by the first row in descending order.
My Solution
a = a[::, a[0,].argsort()[::-1]]
So how does this work?
a[0,] is just the first row I want to sort by.
Now I use argsort to get the order of indices.
I use [::-1] because I need descending order.
Lastly I use a[::, ...] to get a view with the columns in the right order.
import numpy as np
a=np.array([[21,20,19,18,17],[16,15,14,13,12],[11,10,9,8,7],[6,5,4,3,2]])
y=np.argsort(a[:,2],kind='mergesort')# a[:,2]=[19,14,9,4]
a=a[y]
print(a)
Desired output is [[6,5,4,3,2],[11,10,9,8,7],[16,15,14,13,12],[21,20,19,18,17]]
note that argsort(numArray) returns the indices of an numArray as it was supposed to be arranged in a sorted manner.
example
x=np.array([8,1,5])
z=np.argsort(x) #[1,3,0] are the **indices of the predicted sorted array**
print(x[z]) #boolean indexing which sorts the array on basis of indices saved in z
answer would be [1,5,8]
A little more complicated lexsort example - descending on the 1st column, secondarily ascending on the 2nd. The tricks with lexsort are that it sorts on rows (hence the .T), and gives priority to the last.
In [120]: b=np.array([[1,2,1],[3,1,2],[1,1,3],[2,3,4],[3,2,5],[2,1,6]])
In [121]: b
Out[121]:
array([[1, 2, 1],
[3, 1, 2],
[1, 1, 3],
[2, 3, 4],
[3, 2, 5],
[2, 1, 6]])
In [122]: b[np.lexsort(([1,-1]*b[:,[1,0]]).T)]
Out[122]:
array([[3, 1, 2],
[3, 2, 5],
[2, 1, 6],
[2, 3, 4],
[1, 1, 3],
[1, 2, 1]])
Here is another solution considering all columns (more compact way of J.J's answer);
ar=np.array([[0, 0, 0, 1],
[1, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[0, 0, 1, 0],
[1, 1, 0, 0]])
Sort with lexsort,
ar[np.lexsort(([ar[:, i] for i in range(ar.shape[1]-1, -1, -1)]))]
Output:
array([[0, 0, 0, 1],
[0, 0, 1, 0],
[0, 1, 0, 0],
[1, 0, 0, 1],
[1, 0, 1, 0],
[1, 1, 0, 0]])
Pandas Approach Just For Completeness:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
a = pd.DataFrame(a)
a.sort_values(1, ascending=True).to_numpy()
array([[7, 0, 5], # '1' means sort by second column
[9, 2, 3],
[4, 5, 6]])
prl900
Did the Benchmark, comparing with the accepted answer:
%timeit pandas_df.sort_values(9, ascending=True)
1000 loops, best of 3: 400 µs per loop
%timeit numpy_table[numpy_table[:,9].argsort()]
10000 loops, best of 3: 180 µs per loop
It is an old question but if you need to generalize this to a higher than 2 dimension arrays, here is the solution than can be easily generalized:
np.einsum('ij->ij', a[a[:,1].argsort(),:])
This is an overkill for two dimensions and a[a[:,1].argsort()] would be enough per #steve's answer, however that answer cannot be generalized to higher dimensions. You can find an example of 3D array in this question.
Output:
[[7 0 5]
[9 2 3]
[4 5 6]]
#for sorting along column 1
indexofsort=np.argsort(dataset[:,0],axis=-1,kind='stable')
dataset = dataset[indexofsort,:]
def sort_np_array(x, column=None, flip=False):
x = x[np.argsort(x[:, column])]
if flip:
x = np.flip(x, axis=0)
return x
Array in the original question:
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
The result of the sort_np_array function as expected by the author of the question:
sort_np_array(a, column=1, flip=False)
[2]: array([[7, 0, 5],
[9, 2, 3],
[4, 5, 6]])
Thanks to this post: https://stackoverflow.com/a/5204280/13890678
I found a more "generic" answer using structured array.
I think one advantage of this method is that the code is easier to read.
import numpy as np
a = np.array([[9, 2, 3],
[4, 5, 6],
[7, 0, 5]])
struct_a = np.core.records.fromarrays(
a.transpose(), names="col1, col2, col3", formats="i8, i8, i8"
)
struct_a.sort(order="col2")
print(struct_a)
[(7, 0, 5) (9, 2, 3) (4, 5, 6)]
Simply using sort, use column number based on which you want to sort.
a = np.array([1,1], [1,-1], [-1,1], [-1,-1]])
print (a)
a = a.tolist()
a = np.array(sorted(a, key=lambda a_entry: a_entry[0]))
print (a)