Related
I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.
The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]
You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>
I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.
I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0
Suppose I have a vector with elements to find:
a = np.array([1, 5, 9, 7])
Now I have a matrix where those elements should be searched:
M = np.array([
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]
])
Now I'd like to get an index array telling in which column of row j of M the element j of a occurs.
The result would be:
[1, 0, 1, 2]
Does Numpy offer such a function?
(Thanks for the answers with list comprehensions, but that's not an option performance-wise. I also apologize for mentioning Numpy just in the final question.)
Note the result of:
M == a[:, None]
>>> array([[False, True, False],
[ True, False, False],
[False, True, False],
[False, False, True]], dtype=bool)
The indices can be retrieved with:
yind, xind = numpy.where(M == a[:, None])
>>> (array([0, 1, 2, 3], dtype=int64), array([1, 0, 1, 2], dtype=int64))
For the first match in each row, it might be an efficient way to use argmax after extending a to 2D as done in #Benjamin's post -
(M == a[:,None]).argmax(1)
Sample run -
In [16]: M
Out[16]:
array([[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7]])
In [17]: a
Out[17]: array([1, 5, 9, 7])
In [18]: a[:,None]
Out[18]:
array([[1],
[5],
[9],
[7]])
In [19]: (M == a[:,None]).argmax(1)
Out[19]: array([1, 0, 1, 2])
Lazy solution without any import:
a = [1, 5, 9, 7]
M = [
[0, 1, 9],
[5, 3, 8],
[3, 9, 0],
[0, 1, 7],
]
for n, i in enumerate(M):
for j in a:
if j in i:
print("{} found at row {} column: {}".format(j, n, i.index(j)))
Returns:
1 found at row 0 column: 1
9 found at row 0 column: 2
5 found at row 1 column: 0
9 found at row 2 column: 1
1 found at row 3 column: 1
7 found at row 3 column: 2
Maybe something like this?
>>> [list(M[i,:]).index(a[i]) for i in range(len(a))]
[1, 0, 1, 2]
[sub.index(val) if val in sub else -1 for sub, val in zip(M, a)]
# [1, 0, 1, 2]
If I have an 2d array such as
A = np.arange(16).reshape(4,4)
How can I select row = [0, 2] and column = [0, 2] using parameters?
In MATLAB, I can simply do A[row, column] but in python this will select 2 elements corresponding to (0,0) and (2,2).
Is there anyway I can do this using some parameters as in MATLAB?
The output should be like
[0 2
8 10]
To select a block of elements - as MATLAB does, the 1st index has to be column vector. There are several ways of doing this:
In [19]: A = np.arange(16).reshape(4,4)
In [20]: row=[0,2];column=[0,2]
In [21]: A[np.ix_(row,column)]
Out[21]:
array([[ 0, 2],
[ 8, 10]])
In [22]: np.ix_(row,column)
Out[22]:
(array([[0],
[2]]), array([[0, 2]]))
In [23]: A[[[0],[2]],[0,2]]
Out[23]:
array([[ 0, 2],
[ 8, 10]])
The other answer uses meshgrid. We could probably list a half dozen variations.
Good documentation in this section:
http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#purely-integer-array-indexing
You can use the following
A = np.arange(16).reshape(4,4)
print np.ravel(A[row,:][:,column])
to get:
array([ 0, 2, 8, 10])
MATLAB creates a 2D mesh when indexed with vectors across dimensions. So, in MATLAB, you would have -
A =
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
>> row = [1, 3]; column = [1, 3];
>> A(row,column)
ans =
0 2
8 10
Now, in NumPy/Python, indexing with the vectors across dimensions selects the elements after making tuplets from each element in those vectors. To replicate the MATLAB behaviour, you need to create a mesh of such indices from the vectors. For the same, you can use np.meshgrid -
In [18]: A
Out[18]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
In [19]: row = [0, 2]; column = [0, 2];
In [20]: C,R = np.meshgrid(row,column)
In [21]: A[R,C]
Out[21]:
array([[ 0, 2],
[ 8, 10]])
I have an numpy array with 4 columns and want to select columns 1, 3 and 4, where the value of the second column meets a certain condition (i.e. a fixed value). I tried to first select only the rows, but with all 4 columns via:
I = A[A[:,1] == i]
which works. Then I further tried (similarly to matlab which I know very well):
I = A[A[:,1] == i, [0,2,3]]
which doesn't work. How to do it?
EXAMPLE DATA:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
# I want to get the columns 1, 3 and 4
# for every row which has the value i in the second column.
# In this case, this would be row 1 and 3 with columns 1, 3 and 4:
[[1 3 4]
[3 5 6]]
I am now currently using this:
I = A[A[:,1] == i]
I = I[:, [0,2,3]]
But I thought that there had to be a nicer way of doing it... (I am used to MATLAB)
>>> a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
>>> a
array([[ 1, 2, 3, 4],
[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3] # select rows where first column is greater than 3
array([[ 5, 6, 7, 8],
[ 9, 10, 11, 12]])
>>> a[a[:,0] > 3][:,np.array([True, True, False, True])] # select columns
array([[ 5, 6, 8],
[ 9, 10, 12]])
# fancier equivalent of the previous
>>> a[np.ix_(a[:,0] > 3, np.array([True, True, False, True]))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
For an explanation of the obscure np.ix_(), see https://stackoverflow.com/a/13599843/4323
Finally, we can simplify by giving the list of column numbers instead of the tedious boolean mask:
>>> a[np.ix_(a[:,0] > 3, (0,1,3))]
array([[ 5, 6, 8],
[ 9, 10, 12]])
If you do not want to use boolean positions but the indexes, you can write it this way:
A[:, [0, 2, 3]][A[:, 1] == i]
Going back to your example:
>>> A = np.array([[1,2,3,4],[6,1,3,4],[3,2,5,6]])
>>> print A
[[1 2 3 4]
[6 1 3 4]
[3 2 5 6]]
>>> i = 2
>>> print A[:, [0, 2, 3]][A[:, 1] == i]
[[1 3 4]
[3 5 6]]
Seriously,
>>> a=np.array([[1,2,3], [1,3,4], [2,2,5]])
>>> a[a[:,0]==1][:,[0,1]]
array([[1, 2],
[1, 3]])
>>>
This also works.
I = np.array([row[[x for x in range(A.shape[1]) if x != i-1]] for row in A if row[i-1] == i])
print I
Edit: Since indexing starts from 0, so
i-1
should be used.
I am hoping this answers your question but a piece of script I have implemented using pandas is:
df_targetrows = df.loc[df[col2filter]*somecondition*, [col1,col2,...,coln]]
For example,
targets = stockdf.loc[stockdf['rtns'] > .04, ['symbol','date','rtns']]
this will return a dataframe with only columns ['symbol','date','rtns'] from stockdf where the row value of rtns satisfies, stockdf['rtns'] > .04
hope this helps
I have a large numpy array that I need to manipulate so that each element is changed to either a 1 or 0 if a condition is met (will be used as a pixel mask later). There are about 8 million elements in the array and my current method takes too long for the reduction pipeline:
for (y,x), value in numpy.ndenumerate(mask_data):
if mask_data[y,x]<3: #Good Pixel
mask_data[y,x]=1
elif mask_data[y,x]>3: #Bad Pixel
mask_data[y,x]=0
Is there a numpy function that would speed this up?
>>> import numpy as np
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[4, 2, 1, 1],
[3, 0, 1, 2],
[2, 0, 1, 1],
[4, 0, 2, 3],
[0, 0, 0, 2]])
>>> b = a < 3
>>> b
array([[False, True, True, True],
[False, True, True, True],
[ True, True, True, True],
[False, True, True, False],
[ True, True, True, True]], dtype=bool)
>>>
>>> c = b.astype(int)
>>> c
array([[0, 1, 1, 1],
[0, 1, 1, 1],
[1, 1, 1, 1],
[0, 1, 1, 0],
[1, 1, 1, 1]])
You can shorten this with:
>>> c = (a < 3).astype(int)
>>> a = np.random.randint(0, 5, size=(5, 4))
>>> a
array([[0, 3, 3, 2],
[4, 1, 1, 2],
[3, 4, 2, 4],
[2, 4, 3, 0],
[1, 2, 3, 4]])
>>>
>>> a[a > 3] = -101
>>> a
array([[ 0, 3, 3, 2],
[-101, 1, 1, 2],
[ 3, -101, 2, -101],
[ 2, -101, 3, 0],
[ 1, 2, 3, -101]])
>>>
See, eg, Indexing with boolean arrays.
The quickest (and most flexible) way is to use np.where, which chooses between two arrays according to a mask(array of true and false values):
import numpy as np
a = np.random.randint(0, 5, size=(5, 4))
b = np.where(a<3,0,1)
print('a:',a)
print()
print('b:',b)
which will produce:
a: [[1 4 0 1]
[1 3 2 4]
[1 0 2 1]
[3 1 0 0]
[1 4 0 1]]
b: [[0 1 0 0]
[0 1 0 1]
[0 0 0 0]
[1 0 0 0]
[0 1 0 0]]
You can create your mask array in one step like this
mask_data = input_mask_data < 3
This creates a boolean array which can then be used as a pixel mask. Note that we haven't changed the input array (as in your code) but have created a new array to hold the mask data - I would recommend doing it this way.
>>> input_mask_data = np.random.randint(0, 5, (3, 4))
>>> input_mask_data
array([[1, 3, 4, 0],
[4, 1, 2, 2],
[1, 2, 3, 0]])
>>> mask_data = input_mask_data < 3
>>> mask_data
array([[ True, False, False, True],
[False, True, True, True],
[ True, True, False, True]], dtype=bool)
>>>
I was a noob with Numpy, and the answers above where not straight to the point to modify in place my array, so I'm posting what I came up with:
import numpy as np
arr = np.array([[[10,20,30,255],[40,50,60,255]],
[[70,80,90,255],[100,110,120,255]],
[[170,180,190,255],[230,240,250,255]]])
# Change 1:
# Set every value to 0 if first element is smaller than 80
arr[arr[:,:,0] < 80] = 0
print('Change 1:',arr,'\n')
# Change 2:
# Set every value to 1 if bigger than 180 and smaller than 240
# OR if equal to 170
arr[(arr > 180) & (arr < 240) | (arr == 170)] = 1
print('Change 2:',arr)
This produces:
Change 1: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[170 180 190 255]
[230 240 250 255]]]
Change 2: [[[ 0 0 0 0]
[ 0 0 0 0]]
[[ 0 0 0 0]
[100 110 120 255]]
[[ 1 180 1 255]
[ 1 240 250 255]]]
This way you can add tons of conditions like 'Change 2' and set values accordingly.
I am not sure I understood your question, but if you write:
mask_data[:3, :3] = 1
mask_data[3:, 3:] = 0
This will make all values of mask data whose x and y indexes are less than 3 to be equal to 1 and all rest to be equal to 0