Replacing Numpy elements if condition is met - python

I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?

>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)

>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.

The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]

You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>

I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.

I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0

Related

Accessing numpy array with list of logical indices prints the list instead

When using logical indexing on numpy arrays, different behaviours occur based on whether the indices were boolean or integer (1/0). This answer states that, as of Python 3.x,
True and False are keywords and will always be equal to 1 and 0.
Can someone explain what causes this behaviour?
MWE to replicate (Python 3.7.3, Numpy 1.16.3):
import numpy as np
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [True, True, True, True, True, False, False, False, False, False]
c = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
npa = np.asarray(a)
npb = np.asarray(b)
npc = np.asarray(c)
print(npa[b]) # [0 1 2 3 4]
print(npa[npb]) # [0 1 2 3 4]
print(npa[c]) # [1 1 1 1 1 0 0 0 0 0]
print(npa[npc]) # [1 1 1 1 1 0 0 0 0 0]
If we look at it like this:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [True, True, True, True, True, False, False, False, False, True]
c = [1, 1, 1, 1, 1, 0, 0, 0, 0, -1]
npa = np.asarray(a)
npb = np.asarray(b)
npc = np.asarray(c)
print(npa[b]) # [0 1 2 3 4 9]
print(npa[npb]) # [0 1 2 3 4 9]
print(npa[c]) # [1 1 1 1 1 0 0 0 0 0 9]
print(npa[npc]) # [1 1 1 1 1 0 0 0 0 0 9]
We can see that b and npb are boolean masks that instruct which indices to choose and c and npc are actual indices to be taken from the given array.
This is mentioned in docs:
Boolean arrays used as indices are treated in a different manner
entirely than index arrays ... Unlike in the case of integer index
arrays, in the boolean case, the result is a 1-D array containing all
the elements in the indexed array corresponding to all the true
elements in the boolean array.

Problem with replacing values in a numpy.array with a for-loop [duplicate]

I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.
The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]
You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>
I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.
I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0

Is there any way to run np.where on multiple values rather than just one?

I'm wondering if I have an image in a numpy array, say 250x250x3 (3 channels), is it possible to use np.where to quickly find out if any of the 250x250 arrays of size 3 are equal to [143, 255, 0] or another color represented by rgb and get a 250x250 bool array?
When I try it in code with a 4x4x3, I get a 3x3 array as a result and I'm not totally sure where that shape is coming from.
import numpy as np
test = np.arange(4,52).reshape(4,4,3)
print(np.where(test == [4,5,6]))
-------------------------------------------
Result:
array([[0, 0, 0],
[0, 0, 0],
[0, 1, 2]])
What I'm trying to get:
array([[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
Solution
You don't need np.where (or anything particularly complicated) at all. You can just make use of the power of boolean arrays:
print(np.all(test == [4,5,6], axis=-1).astype(int))
# output:
# [[1 0 0 0]
# [0 0 0 0]
# [0 0 0 0]
# [0 0 0 0]]
An equivalent alternative would be to use logical_and:
print(np.logical_and.reduce(test == [4,5,6], axis=-1).astype(int))
# output:
# [[1 0 0 0]
# [0 0 0 0]
# [0 0 0 0]
# [0 0 0 0]]
Heavy duty test
import numpy as np
np.random.seed(0)
# the subarray we'll search for
pattern = [143, 255, 0]
# generate a random test array
arr = np.random.randint(0, 255, size=(255,255,3))
# insert the pattern array at ~10000 random indices
ix = np.unique(np.random.randint(np.prod(arr.shape[:-1]), size=10000))
arr.reshape(-1, arr.shape[-1])[ix] = pattern
# find all instances of the pattern array (ignore partial matches)
loc = np.all(arr==pattern, axis=-1).astype(int)
# test that the found locs are equivalent to the test ixs
locix = np.ravel_multi_index(loc.nonzero(), arr.shape[:-1])
np.testing.assert_array_equal(np.sort(ix), np.sort(locix))
# test has been run, the above assert passes
For simplicity, let's say that we are looking for all locations where all 3 channels equal 1. This will do the job:
np.random.seed(0)
a=np.random.randint(0,2,(3,5,5))
print(a)
np.where((a[0]==1)*(a[1]==1)*(a[2]==1))
This outputs
[[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
[0 0 0 0 1]
[0 1 1 0 0]]
[[1 1 1 1 0]
[1 0 1 0 1]
[1 0 1 1 0]
[0 1 0 1 1]
[1 1 1 0 1]]
[[0 1 1 1 1]
[0 1 0 0 1]
[1 0 1 0 1]
[0 0 0 0 0]
[1 1 0 0 0]]]
(array([0, 0, 1, 2, 4], dtype=int64), array([1, 2, 4, 0, 1], dtype=int64))
And indeed there are 5 coordinates in which all 3 channels equal 1.
If you want to get a more easy to read representation, replace the last row with
tuple(zip(*np.where((a[0]==1)*(a[1]==1)*(a[2]==1))))
This will output
((0, 1), (0, 2), (1, 4), (2, 0), (4, 1))
which are all the 5 locations where all 3 channels equal 1.
Note that (a[0]==1)*(a[1]==1)*(a[2]==1) is just
array([[False, True, True, False, False],
[False, False, False, False, True],
[ True, False, False, False, False],
[False, False, False, False, False],
[False, True, False, False, False]])
the boolean representation that you were looking for.
If you want to get any other triplet, say [143, 255, 0], just use (a[0]==143)*(a[1]==255)*(a[2]==0).

Numpy: Find column index for element on each row

Suppose I have a vector with elements to find:
a = np.array([1, 5, 9, 7])
Now I have a matrix where those elements should be searched:
M = np.array([
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]
])
Now I'd like to get an index array telling in which column of row j of M the element j of a occurs.
The result would be:
[1, 0, 1, 2]
Does Numpy offer such a function?
(Thanks for the answers with list comprehensions, but that's not an option performance-wise. I also apologize for mentioning Numpy just in the final question.)
Note the result of:
M == a[:, None]
>>> array([[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
The indices can be retrieved with:
yind, xind = numpy.where(M == a[:, None])
>>> (array([0, 1, 2, 3], dtype=int64), array([1, 0, 1, 2], dtype=int64))
For the first match in each row, it might be an efficient way to use argmax after extending a to 2D as done in #Benjamin's post -
(M == a[:,None]).argmax(1)
Sample run -
In [16]: M
Out[16]:
array([[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]])
In [17]: a
Out[17]: array([1, 5, 9, 7])
In [18]: a[:,None]
Out[18]:
array([[1],
[5],
[9],
[7]])
In [19]: (M == a[:,None]).argmax(1)
Out[19]: array([1, 0, 1, 2])
Lazy solution without any import:
a = [1, 5, 9, 7]
M = [
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7],
]
for n, i in enumerate(M):
for j in a:
if j in i:
print("{} found at row {} column: {}".format(j, n, i.index(j)))
Returns:
1 found at row 0 column: 1
9 found at row 0 column: 2
5 found at row 1 column: 0
9 found at row 2 column: 1
1 found at row 3 column: 1
7 found at row 3 column: 2
Maybe something like this?
>>> [list(M[i,:]).index(a[i]) for i in range(len(a))]
[1, 0, 1, 2]
[sub.index(val) if val in sub else -1 for sub, val in zip(M, a)]
# [1, 0, 1, 2]

print rows that have x value in the last column

I´m quite new using numpy, and I have this problem:
Having this array:
x = np.array([[ 1, 2, 0],[ 4, 5, 0],[ 7, 8, 1],[10, 11, 1]])
>[[ 1 2 0]
[ 4 5 0]
[ 7 8 1]
[10 11 1]]
How could I print the rows with 1 in the last column?
I would like to get something like this:
>[[ 7 8 1]
[10 11 1]]
Get a slice of the array on last column and find which of those equal 1. Based on the resulting boolean array filter your main array:
>>> x[:,-1]
array([0, 0, 1, 1])
>>> x[:,-1]==1
array([False, False, True, True], dtype=bool)
>>> x[x[:,-1]==1]
array([[ 7, 8, 1],
[10, 11, 1]])
Please try this:
y = [ a for a in x if a[-1] == 1 ]
print y
Cheers,
Alex

Categories