Remove one value from a NumPy array - python

I am trying to all rows that only contain zeros from a NumPy array. For example, I want to remove [0,0] from
n = np.array([[1,2], [0,0], [5,6]])
and be left with:
np.array([[1,2], [5,6]])

To remove the second row from a numpy table:
import numpy
n = numpy.array([[1,2],[0,0],[5,6]])
new_n = numpy.delete(n, 1, axis=0)
To remove rows containing only 0:
import numpy
n = numpy.array([[1,2],[0,0],[5,6]])
idxs = numpy.any(n != 0, axis=1) # index of rows with at least one non zero value
n_non_zero = n[idxs, :] # selection of the wanted rows

If you want to delete any row that only contains zeros, the fastest way I can think of is:
n = numpy.array([[1,2], [0,0], [5,6]])
keep_row = n.any(axis=1) # Index of rows with at least one non-zero value
n_non_zero = n[keep_row] # Rows to keep, only
This runs much faster than Simon's answer, because n.any() stops checking the values of each row as soon as it encounters any non-zero value (in Simon's answer, all the elements of each row are compared to zero first, which results in unnecessary computations).
Here is a generalization of the answer, if you ever need to remove a rows that have a specific value (instead of removing only rows that only contain zeros):
n = numpy.array([[1,2], [0,0], [5,6]])
to_be_removed = [0, 0] # Can be any row values: [5, 6], etc.
other_rows = (n != to_be_removed).any(axis=1) # Rows that have at least one element that differs
n_other_rows = n[other_rows] # New array with rows equal to to_be_removed removed.
Note that this solution is not fully optimized: even if the first element of to_be_removed does not match, the remaining row elements from n are compared to those of to_be_removed (as in Simon's answer).
I'd be curious to know if there is a simple efficient NumPy solution to the more general problem of deleting rows with a specific value.
Using cython loops might be a fast solution: for each row, element comparison could be stopped as soon as one element from the row differs from the corresponding element in to_be_removed.

You can use numpy.delete to remove specific rows or columns.
For example:
n = [[1,2], [0,0], [5,6]]
np.delete(n, 1, axis=0)
The output will be:
array([[1, 2],
[5, 6]])

To delete according to value,which is an Object.
To do like this:
>>> n
array([[1, 2],
[0, 0],
[5, 6]])
>>> bl=n==[0,0]
>>> bl
array([[False, False],
[ True, True],
[False, False]], dtype=bool)
>>> bl=np.any(bl,axis=1)
>>> bl
array([False, True, False], dtype=bool)
>>> ind=np.nonzero(bl)[0]
>>> ind
array([1])
>>> np.delete(n,ind,axis=0)
array([[1, 2],
[5, 6]])

Related

How to do element-wise comparison between two NumPy arrays

I have two arrays. I would like to do an element-wise comparison between the two of them to find out which values are the same.
a= np.array([[1,2],[3,4]])
b= np.array([[3,2],[1,4]])
Is there a way for me to compare these two arrays to 1) find out which values are the same and 2) get the index of the same values?
Adding on to the previous question, is there a way for me to return 1 if the values are the same and 0 otherwise?
Thanks in advance!
a= np.array([[1,2],[3,4]])
b= np.array([[3,2],[1,4]])
#1) find out which values are the same
a==b
# array([[False, True],
# [False, True]])
#2) get the index of the same values?
np.where((a==b) == True) # or np.where(a==b)
#(array([0, 1]), array([1, 1]))
# Adding on to the previous question, is there a way for me to return 1 if the values are the same and 0 otherwise
(a==b).astype(int)
# array([[0, 1],
# [0, 1]])

Extract numpy rows by given condition

I have numpy array as follows.
import numpy as np
data = np.array([[0,0,0,4],
[3,0,5,0],
[8,9,5,3]])
print (data)
I have to extract only those lines which first three elements are not all zeros
expected result is as follows:
result = np.array([[3,0,5,0],
[8,9,5,3]])
I tried as:
res = [l for l in data if l[:3].sum() !=0]
print (res)
It gives result. But, looking for better, numpy way of doing it.
sum is a bit unreliable if your array can contain negative numbers, but any will always work:
result = data[data[:, :3].any(1)]
You say
first three elements are not all zeros
so a solution is
import numpy as np
data = np.array([[0,0,0,4],
[3,0,5,0],
[8,9,5,3]])
data[~np.all(data[:, :3] == 0, axis=1), :]
I'll try to explaing how I think about these kinds of problems through my answer.
First step: define a function that returns a boolean indicating whether this is a good row.
For that, I use np.any, which checks if any of the entries is "True" (for integers, true is non-zero).
import numpy as np
v1 = np.array([1, 1, 1, 0])
v2 = np.array([0, 0, 0, 1])
good_row = lambda v: np.any(v[:3])
good_row(v1)
Out[28]: True
good_row(v2)
Out[29]: False
Second step: I apply this on all rows, and obtain a masking vector. To do so, one can use the 'axis' keyword in 'np.any', which will apply this on columns or rows depending on the axis value.
np.any(data[:, :3], axis=1)
Out[32]: array([False, True, True])
Final step: I combine this with indexing, to wrap it all.
rows_inds = np.any(data[:, :3], axis=1)
data[rows_inds]
Out[37]:
array([[3, 0, 5, 0],
[8, 9, 5, 3]])

How to get a value in a nested list that is smaller than the ones around it

So basically I have to find the smallest number in a nested list as compared to the numbers around it. This would be called a 'sink' and the function returns True if it is a sink and False if it isn't. For example, if the nested list is
[[1, 2, 1],
[4, 6, 5],
[7, 8, 9]]
then the number at [0,2], (1), should be true as all the values adjacent to it are smaller than 1 but the number at [2, 0], (7), shouldn't be true as it is greater than some of the values around it.
I tried to use slicing to get the numbers beside it but I don't know how to slice it to get the number diagonal from the sink or above or below.
This is some of the code that I tried to do:
for x in elevation_map:
for xs in x:
if elevation_map[cell[0]][cell[1]] < xs[cell[0]]:
return True
return False
You could convert your list of lists to a numpy array, iterate the indices, and use two-dimensional slicing to get the sub-matrix and check whether the value at the current position is the minimal value.
>>> import numpy as np
>>> from itertools import product
>>> m = np.array([[1, 2, 1],
... [4, 6, 5],
... [7, 8, 9]])
...
>>> [(r, c, m[r,c]) for r,c in product(*map(range, m.shape))
... if m[r,c] == m[max(0,r-1):r+2,max(0,c-1):c+2].min()]
...
[(0, 0, 1), (0, 2, 1)]
(The max(0, ...) is so that the lower bound 0-1 does not refer to the last element in the array; if the upper bound is higher then the size of the array, that's not a problem.)
Note: This will also identify a point as a "sink" if one of it's neighbors has the same value; not sure if this is a problem.
don't know how to slice it to get the number diagonal from the sink
or above or below.
Your question shows an indexing scheme of [row,col].
Diagonals:
[row-1,col-1], [row-1,col+1], [row+1,col-1], [row+1,col+1]
left, right, up down:
[row,col-1],[row,col+1],[row-1,col],[row+1,col]
[row-1,col-1] [row-1, col ] [row-1,col+1]
[ row ,col-1] [ row , col ] [ row ,col+1]
[row+1,col-1] [row+1, col ] [row+1,col+1]
When calculating the indices you will need to include checks to see if you are at the edge of the 2-d structure. If you are at the left edge then col-1 will rap around to the last item in that row and if you are at the top edge then row-1 will rap around to the last item in that column. You also might want to check for the right and bottom edge - there won't be any cells at col+1 or row+1.
If you consider a value a sink if it is the minimum of the surrounding 3x3 grid, i.e. you don't require that it is strictly smaller than any other value around it, then you can use numpy.lib.stride_tricks.as_strided in order to create a window that checks the 3x3 neighbors. In order to work at the edges of the matrix, the original array can be padded with np.inf (since min is later used):
import numpy as np
matrix = np.array(
[[1, 2, 1],
[4, 6, 5],
[7, 8, 9]])
padded = np.pad(matrix.astype(float), 1, constant_values=np.inf)
window = np.lib.stride_tricks.as_strided(
padded,
padded.shape + (3, 3),
padded.strides * 2)[:-2, :-2]
result = matrix == window.min(axis=(-2, -1))
Which gives the following result:
[[ True False True]
[False False False]
[False False False]]

Get all the rows with same values in python?

So, suppose I have this 2D array in python
a = [[1,2]
[2,3]
[3,2]
[1,3]]
How do get all array entries with the same row value and store them in a new matrix.
For example, I will have
b = [1,2]
[1,3]
after the query.
My approach is b = [a[i] for i in a if a[i][0] == 1][0]]
but it didn't seem to work?
I am new to Python and the whole index slicing thing is kind confusing. Thanks!
Since you tagged numpy, you can perform this task with NumPy arrays. First define your array:
a = np.array([[1, 2],
[2, 3],
[3, 2],
[1, 3]])
For all unique values in the first column, you can use a dictionary comprehension. This is useful to avoid duplicating operations.
d = {i: a[a[:, 0] == i] for i in np.unique(a[:, 0])}
{1: array([[1, 2],
[1, 3]]),
2: array([[2, 3]]),
3: array([[3, 2]])}
Then access your array where first column is equal to 1 via d[1].
For a single query, you can simply use a[a[:, 0] == 1].
The for i in a syntax gives you the actual items in the list..so for example:
list_of_strs = ['first', 'second', 'third']
first_letters = [s[0] for s in list_of_strs]
# first_letters == ['f', 's', 't']
What you are actually doing with b = [a[i] for i in a if a[i][0]==1] is trying to index an element of a with each of the elements of a. But since each element of a is itself a list, this won't work (you can't index lists with other lists)
Something like this should work:
b = [row for row in a if row[0] == 1]
Bonus points if you write it as a function so that you can pick which thing you want to filter on.
If you're working with arrays a lot, you might also check out the numpy library. With numpy, you can do stuff like this.
import numpy as np
a = np.array([[1,2], [2,3], [3,2], [1,3]])
b = a[a[:,0] == 1]
The last line is basically indexing the original array a with a boolean array defined inside the first set of square brackets. It's very flexible, so you could also modify this to filter on the second element, filter on other conditions (like > some_number), etc. etc.

Deleting elements from numpy array with iteration

What is the fastest method to delete elements from numpy array while retreiving their initial positions. The following code does not return all elements that it should:
list = []
for pos,i in enumerate(ARRAY):
if i < some_condition:
list.append(pos) #This is where the loop fails
for _ in list:
ARRAY = np.delete(ARRAY, _)
It really feels like you're going about this inefficiently. You should probably be using more builtin numpy capabilities -- e.g. np.where, or boolean indexing. Using np.delete in a loop like that is going to kill any performance gains you get from using numpy...
For example (with boolean indexing):
keep = np.ones(ARRAY.shape, dtype=bool)
for pos, val in enumerate(ARRAY):
if val < some_condition:
keep[pos] = False
ARRAY = ARRAY[keep]
Of course, this could possibly be simplified (and generalized) even further:
ARRAY = ARRAY[ARRAY >= some_condition]
EDIT
You've stated in the comments that you need the same mask to operate on other arrays as well -- That's not a problem. You can keep a handle on the mask and use it for other arrays:
mask = ARRAY >= some_condition
ARRAY = ARRAY[mask]
OTHER_ARRAY = OTHER_ARRAY[mask]
...
Additionally (and perhaps this is the reason your original code isn't working), as soon as you delete the first index from the array in your loop, all of the other items shift one index to the left, so you're not actually deleting the same items that you "tagged" on the initial pass.
As an example, lets say that your original array was [a, b, c, d, e] and on the original pass, you tagged elements at indexes [0, 2] for deletion (a, c)... On the first pass through your delete loop, you'd remove the item at index 0 -- Which would make your array:
[b, c, d, e]
now on the second iteration of your delete loop, you're going to delete the item at index 2 in the new array:
[b, c, e]
But look, instead of removing c like we wanted, we actually removed d! Oh snap!
To fix that, you could probably write your loop over reversed(list), but that still won't result in a fast operation.
You don't need to iterate, especially with a simple condition like this. And you don't really need to use delete:
A sample array:
In [693]: x=np.arange(10)
A mask, boolean array were a condition is true (or false):
In [694]: msk = x%2==0
In [695]: msk
Out[695]: array([ True, False, True, False, True, False, True, False, True, False], dtype=bool)
where (or nonzero) converts it to indexes
In [696]: ind=np.where(msk)
In [697]: ind
Out[697]: (array([0, 2, 4, 6, 8], dtype=int32),)
You use the whole ind in one call to delete (no need to iterate):
In [698]: np.delete(x,ind)
Out[698]: array([1, 3, 5, 7, 9])
You can use it ind to retain those values instead:
In [699]: x[ind]
Out[699]: array([0, 2, 4, 6, 8])
Or you can used the boolean msk directly:
In [700]: x[msk]
Out[700]: array([0, 2, 4, 6, 8])
or use its inverse:
In [701]: x[~msk]
Out[701]: array([1, 3, 5, 7, 9])
delete doesn't do much more than this kind of boolean masking. It's all Python code, so you can easily study it.

Categories