I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.
The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]
You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>
I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.
I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0
Related
When using logical indexing on numpy arrays, different behaviours occur based on whether the indices were boolean or integer (1/0). This answer states that, as of Python 3.x,
True and False are keywords and will always be equal to 1 and 0.
Can someone explain what causes this behaviour?
MWE to replicate (Python 3.7.3, Numpy 1.16.3):
import numpy as np
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [True, True, True, True, True, False, False, False, False, False]
c = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0]
npa = np.asarray(a)
npb = np.asarray(b)
npc = np.asarray(c)
print(npa[b]) # [0 1 2 3 4]
print(npa[npb]) # [0 1 2 3 4]
print(npa[c]) # [1 1 1 1 1 0 0 0 0 0]
print(npa[npc]) # [1 1 1 1 1 0 0 0 0 0]
If we look at it like this:
a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
b = [True, True, True, True, True, False, False, False, False, True]
c = [1, 1, 1, 1, 1, 0, 0, 0, 0, -1]
npa = np.asarray(a)
npb = np.asarray(b)
npc = np.asarray(c)
print(npa[b]) # [0 1 2 3 4 9]
print(npa[npb]) # [0 1 2 3 4 9]
print(npa[c]) # [1 1 1 1 1 0 0 0 0 0 9]
print(npa[npc]) # [1 1 1 1 1 0 0 0 0 0 9]
We can see that b and npb are boolean masks that instruct which indices to choose and c and npc are actual indices to be taken from the given array.
This is mentioned in docs:
Boolean arrays used as indices are treated in a different manner
entirely than index arrays ... Unlike in the case of integer index
arrays, in the boolean case, the result is a 1-D array containing all
the elements in the indexed array corresponding to all the true
elements in the boolean array.
I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.
The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]
You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>
I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.
I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0
I'm wondering if I have an image in a numpy array, say 250x250x3 (3 channels), is it possible to use np.where to quickly find out if any of the 250x250 arrays of size 3 are equal to [143, 255, 0] or another color represented by rgb and get a 250x250 bool array?
When I try it in code with a 4x4x3, I get a 3x3 array as a result and I'm not totally sure where that shape is coming from.
import numpy as np
test = np.arange(4,52).reshape(4,4,3)
print(np.where(test == [4,5,6]))
-------------------------------------------
Result:
array([[0, 0, 0],
[0, 0, 0],
[0, 1, 2]])
What I'm trying to get:
array([[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]])
Solution
You don't need np.where (or anything particularly complicated) at all. You can just make use of the power of boolean arrays:
print(np.all(test == [4,5,6], axis=-1).astype(int))
# output:
# [[1 0 0 0]
# [0 0 0 0]
# [0 0 0 0]
# [0 0 0 0]]
An equivalent alternative would be to use logical_and:
print(np.logical_and.reduce(test == [4,5,6], axis=-1).astype(int))
# output:
# [[1 0 0 0]
# [0 0 0 0]
# [0 0 0 0]
# [0 0 0 0]]
Heavy duty test
import numpy as np
np.random.seed(0)
# the subarray we'll search for
pattern = [143, 255, 0]
# generate a random test array
arr = np.random.randint(0, 255, size=(255,255,3))
# insert the pattern array at ~10000 random indices
ix = np.unique(np.random.randint(np.prod(arr.shape[:-1]), size=10000))
arr.reshape(-1, arr.shape[-1])[ix] = pattern
# find all instances of the pattern array (ignore partial matches)
loc = np.all(arr==pattern, axis=-1).astype(int)
# test that the found locs are equivalent to the test ixs
locix = np.ravel_multi_index(loc.nonzero(), arr.shape[:-1])
np.testing.assert_array_equal(np.sort(ix), np.sort(locix))
# test has been run, the above assert passes
For simplicity, let's say that we are looking for all locations where all 3 channels equal 1. This will do the job:
np.random.seed(0)
a=np.random.randint(0,2,(3,5,5))
print(a)
np.where((a[0]==1)*(a[1]==1)*(a[2]==1))
This outputs
[[[0 1 1 0 1]
[1 1 1 1 1]
[1 0 0 1 0]
[0 0 0 0 1]
[0 1 1 0 0]]
[[1 1 1 1 0]
[1 0 1 0 1]
[1 0 1 1 0]
[0 1 0 1 1]
[1 1 1 0 1]]
[[0 1 1 1 1]
[0 1 0 0 1]
[1 0 1 0 1]
[0 0 0 0 0]
[1 1 0 0 0]]]
(array([0, 0, 1, 2, 4], dtype=int64), array([1, 2, 4, 0, 1], dtype=int64))
And indeed there are 5 coordinates in which all 3 channels equal 1.
If you want to get a more easy to read representation, replace the last row with
tuple(zip(*np.where((a[0]==1)*(a[1]==1)*(a[2]==1))))
This will output
((0, 1), (0, 2), (1, 4), (2, 0), (4, 1))
which are all the 5 locations where all 3 channels equal 1.
Note that (a[0]==1)*(a[1]==1)*(a[2]==1) is just
array([[False, True, True, False, False],
[False, False, False, False, True],
[ True, False, False, False, False],
[False, False, False, False, False],
[False, True, False, False, False]])
the boolean representation that you were looking for.
If you want to get any other triplet, say [143, 255, 0], just use (a[0]==143)*(a[1]==255)*(a[2]==0).
Suppose I have a vector with elements to find:
a = np.array([1, 5, 9, 7])
Now I have a matrix where those elements should be searched:
M = np.array([
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]
])
Now I'd like to get an index array telling in which column of row j of M the element j of a occurs.
The result would be:
[1, 0, 1, 2]
Does Numpy offer such a function?
(Thanks for the answers with list comprehensions, but that's not an option performance-wise. I also apologize for mentioning Numpy just in the final question.)
Note the result of:
M == a[:, None]
>>> array([[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
The indices can be retrieved with:
yind, xind = numpy.where(M == a[:, None])
>>> (array([0, 1, 2, 3], dtype=int64), array([1, 0, 1, 2], dtype=int64))
For the first match in each row, it might be an efficient way to use argmax after extending a to 2D as done in #Benjamin's post -
(M == a[:,None]).argmax(1)
Sample run -
In [16]: M
Out[16]:
array([[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]])
In [17]: a
Out[17]: array([1, 5, 9, 7])
In [18]: a[:,None]
Out[18]:
array([[1],
[5],
[9],
[7]])
In [19]: (M == a[:,None]).argmax(1)
Out[19]: array([1, 0, 1, 2])
Lazy solution without any import:
a = [1, 5, 9, 7]
M = [
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7],
]
for n, i in enumerate(M):
for j in a:
if j in i:
print("{} found at row {} column: {}".format(j, n, i.index(j)))
Returns:
1 found at row 0 column: 1
9 found at row 0 column: 2
5 found at row 1 column: 0
9 found at row 2 column: 1
1 found at row 3 column: 1
7 found at row 3 column: 2
Maybe something like this?
>>> [list(M[i,:]).index(a[i]) for i in range(len(a))]
[1, 0, 1, 2]
[sub.index(val) if val in sub else -1 for sub, val in zip(M, a)]
# [1, 0, 1, 2]
I´m quite new using numpy, and I have this problem:
Having this array:
x = np.array([[ 1, 2, 0],[ 4, 5, 0],[ 7, 8, 1],[10, 11, 1]])
>[[ 1 2 0]
[ 4 5 0]
[ 7 8 1]
[10 11 1]]
How could I print the rows with 1 in the last column?
I would like to get something like this:
>[[ 7 8 1]
[10 11 1]]
Get a slice of the array on last column and find which of those equal 1. Based on the resulting boolean array filter your main array:
>>> x[:,-1]
array([0, 0, 1, 1])
>>> x[:,-1]==1
array([False, False, True, True], dtype=bool)
>>> x[x[:,-1]==1]
array([[ 7, 8, 1],
[10, 11, 1]])
Please try this:
y = [ a for a in x if a[-1] == 1 ]
print y
Cheers,
Alex