numpy first occurence in array for an array of reference values - python

Given a threshold alpha and a numpy array a, there are multiple possibilities for finding the first index i such that arr[i] > alpha; see Numpy first occurrence of value greater than existing value:
numpy.searchsorted(a, alpha)+1
numpy.argmax(a > alpha)
In my case, alpha can be either a scalar or an array of arbitrary shape. I'd like to have a function get_lowest that works in both cases:
alpha = 1.12
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # 2
alpha = numpy.array(1.12, -0.5, 2.7])
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # [2, 0, 3]
Any hints?

You can use broadcasting:
In [9]: arr = array([ 0. , 1.1, 1.2, 3. ])
In [10]: alpha = array([ 1.12, -0.5 , 2.7 ])
In [11]: np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
Out[11]: array([2, 0, 3])
To collapse multidimensional arrays, you can use np.squeeze, but you might have to do something special if you want a Python float in your first case:
def get_lowest(arr, alpha):
b = np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
b = np.squeeze(b)
if np.size(b) == 1:
return float(b)
return b

searchsorted actually does the trick:
np.searchsorted(a, alpha)
The axis argument to argmax helps out; this
np.argmax(numpy.add.outer(alpha, -a) < 0, axis=-1)
does the trick. Indeed
import numpy as np
a = np.array([0.0, 1.1, 1.2, 3.0])
alpha = 1.12
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # 0
np.searchsorted(a, alpha) # 0
alpha = np.array([1.12, -0.5, 2.7])
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # [2 0 3]
np.searchsorted(a, alpha) # [2 0 3]

Related

How to set threshold values in a numpy array?

I have an array of values and I want to set specific values to integers. Anything below 0.95 set to 0, anything above 1.6 set to 2. How can I set everything between 0.95 and 1.6 to 1?
n1_binary = np.where(n1_img_resize < 0.95, 0, n1_img_resize)
n1_binary = np.where(n1_binary > 1.6, 2, n1_binary)
Like this in one line using np.where:
n1_binary = np.where((n1_binary > 0.95) & (n1_binary <= 1.6), 1, n1_binary)
Check below example:
In [652]: a = np.array([0.99, 1.23, 1.7, 9])
In [653]: a = np.where((a > 0.95) & (a <= 1.6), 1, a)
In [654]: a
Out[654]: array([1. , 1. , 1.7, 9. ])
Try this:
a = np.array([0.3, 5, 7, 2])
a[a < 0.95] = 0
a[a > 1.6] = 1
This is very clear and consice, and tells exactly what you are doing. a now is:
[0.0, 1.0, 1.0, 1.0]

Numpy apply function to array

For example, I have function:
f1 = lambda x: x % 2
If I want to modify array = np.linspace(0, 5, 6) I can do f1(array). Everything works as expected:
[0. 1. 0. 1. 0. 1.]
If I change function to:
f2 = lambda x: 0
print(f2(array))
gives me 0 while I expected [0. 0. 0. 0. 0. 0.]. How to achieve consistency?
You can use below code to achieve desirable output
import numpy as np
array = np.linspace(0, 5, 6)
f2 = lambda x: x-x
print(f2(array))
Slightly more explicit than previous answer :
import numpy as np
array = np.linspace(0, 5, 6)
f2 = lambda x: np.zeros_like(x)
print(f2(array))
Documentation for numpy.zeros_like: Return an array of zeros with the same shape and type as a given array.
To iterate over an array, evaluate the function for every element, then store it to a resulting array, a list iterator works consistently:
import numpy as np
array = np.linspace(0, 5, 6)
f1 = lambda x: x % 2
f2 = lambda x: 0
print ([f1(x) for x in array])
[0.0, 1.0, 0.0, 1.0, 0.0, 1.0]
print ([f2(x) for x in array])
[0, 0, 0, 0, 0, 0]

Using Python numpy where condition to change entires below a certain value

Here's my array:
import numpy as np
a = np.array([0, 5.0, 0, 5.0, 5.0])
Is it possible to use numpy.where in some way to add a value x to all those entires in a that are less than l?
So something like:
a = a[np.where(a < 5).add(2.5)]
Should return:
array([2.5, 5.0, 2.5, 5.0, 5.0])
a = np.array([0., 5., 0., 5., 5.])
a[np.where(a < 5)] += 2.5
in case you really want to use where or just
a[a < 5] += 2.5
which I usually use for these kind of operations.
You could use np.where to create the array of additions and then simply add to a -
a + np.where(a < l, 2.5,0)
Sample run -
In [16]: a = np.array([1, 5, 4, 5, 5])
In [17]: l = 5
In [18]: a + np.where(a < l, 2.5,0)
Out[18]: array([ 3.5, 5. , 6.5, 5. , 5. ])
Given that you probably need to change the dtype (from int to float) you need to create a new array. A simple way without explicit .astype or np.where calls is multiplication with a mask:
>>> b = a + (a < 5) * 2.5
>>> b
array([ 2.5, 5. , 2.5, 5. , 5. ])
with np.where this can be changed to a simple expression (using the else-condition, third argument, in where):
>>> a = np.where(a < 5, a + 2.5, a)
>>> a
array([ 2.5, 5. , 2.5, 5. , 5. ])
a += np.where(a < 1, 2.5, 0)
where will return the second argument wherever the condition (first argument) is satisfied and the third argument otherwise.
You can use a "masked array" as an index. Boolean operations, such as a < 1 return such an array.
>>> a<1
array([False, False, False, False, False], dtype=bool)
you can use it as
>>> a[a<1] += 1
The a<1 part selects only the items in a that match the condition. You can operate on this part only then.
If you want to keep a trace of your selection, you can proceed in two steps.
>>> mask = a>1
>>> a[mask] += 1
Also, you can count the items matching the conditions:
>>> print np.sum(mask)

Python - Binning x,y,z values on a 2D grid

I have a list of z points associated to pairs x,y, meaning that for example
x y z
3.1 5.2 1.3
4.2 2.3 9.3
5.6 9.8 3.5
and so on. The total number of z values is relatively high, around 10000.
I would like to bin my data, in the following sense:
1) I would like to split the x and y values into cells, so as to make a 2-dimensional grid in x,y.If I have Nx cells for the x axis and Ny for the y axis, I would then have Nx*Ny cells on the grid. For example, the first bin for x could be ranging from 1. to 2., the second from 2. to 3. and so on.
2) For each of this cell in the 2dimensional grid, I would then need to calculate how many points fall into that cell, and sum all their z values. This gives me a numerical value associated to each cell.
I thought about using binned_statistic from scipy.stats, but I would have no idea on how to set the options to accomplish my task. Any suggestions? Also other tools, other than binned_statistic, are well accepted.
Assuming I understand, you can get what you need by exploiting the expand_binnumbers parameter for binned_statistic_2d, thus.
from scipy.stats import binned_statistic_2d
import numpy as np
x = [0.1, 0.1, 0.1, 0.6]
y = [2.1, 2.6, 2.1, 2.1]
z = [2.,3.,5.,7.]
binx = [0.0, 0.5, 1.0]
biny = [2.0, 2.5, 3.0]
ret = binned_statistic_2d(x, y, None, 'count', bins=[binx,biny], \
expand_binnumbers=True)
print (ret.statistic)
print (ret.binnumber)
sums = np.zeros([-1+len(binx), -1+len(biny)])
for i in range(len(x)):
m = ret.binnumber [0][i] - 1
n = ret.binnumber [1][i] - 1
sums[m][n] += sums[m][n] + z[i]
print (sums)
This is just an expansion of one of the examples. Here's the output.
[[ 2. 1.]
[ 1. 0.]]
[[1 1 1 2]
[1 2 1 1]]
[[ 9. 3.]
[ 7. 0.]]
Establish the edges of the cells, iterate over cell edges and use boolean indexing to extract the z values in each cell, keep the sums in a list, convert the list and reshape it.
import itertools
import numpy as np
x = np.array([0.1, 0.1, 0.1, 0.6, 1.2, 2.1])
y = np.array([2.1, 2.6, 2.1, 2.1, 3.4, 4.7])
z = np.array([2., 3., 5., 7., 10, 20])
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return itertools.izip(a, b)
minx, maxx = int(min(x)), int(max(x)) + 1
miny, maxy = int(min(y)), int(max(y)) + 1
result = []
x_edges = pairwise(xrange(minx, maxx + 1))
for xleft, xright in x_edges:
xmask = np.logical_and(x >= xleft, x < xright)
y_edges = pairwise(xrange(miny, maxy + 1))
for yleft, yright in y_edges:
ymask = np.logical_and(y >= yleft, y < yright)
cell = z[np.logical_and(xmask, ymask)]
result.append(cell.sum())
result = np.array(result).reshape((maxx - minx, maxy - miny))
>>> result
array([[ 17., 0., 0.],
[ 0., 10., 0.],
[ 0., 0., 20.]])
>>>
Unfortunately, no numpy vectorization magic

numpy split doesn't work on float array

I was trying to split a float array into sub arrays using numpy split, however the results are not correct:
import numpy as np
x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
np.split(x, [1, 2, 3])
Out[127]: [array([ 1.2]), array([ 1.3]), array([ 1.5]), array([ 2. , 2.1, 2.5])]
1.2, 1.3 and 1.5 should be put into one sub array but they are separated, whereas it seems it splits the 2, 2.1 and 2.5 correctly.
I guess you want to split the array into the elements that are smaller than 1, between 1 and 2, between 2 and 3 and greater than 3 (4 bins). If we assume the array is sorted then the following will work:
>>> x = np.array([0.4, 1.2, 1.3, 1.5, 2, 2.1, 2.5, 3.4])
>>> np.split(x, np.bincount(np.digitize(x, [1, 2, 3])).cumsum())[:-1]
[array([ 0.4]),
array([ 1.2, 1.3, 1.5]),
array([ 2. , 2.1, 2.5]),
array([ 3.4])]
With np.digitize we get the index of the bin for each array element. With np.bincount we get the number of elements in each bin. With np.cumsum we can take the splitting indexes of each bin in the sorted array. Finally, we have what np.split needs.
Quoted from the docs:
numpy.split(ary, indices_or_sections, axis=0)
indices_or_sections : int or 1-D array If indices_or_sections is an
integer, N, the array will be divided into N equal arrays along axis.
If such a split is not possible, an error is raised. If
indices_or_sections is a 1-D array of sorted integers, the entries
indicate where along axis the array is split. For example, [2, 3]
would, for axis=0, result in ary[:2] ary[2:3] ary[3:] If an index
exceeds the dimension of the array along axis, an empty sub-array is
returned correspondingly.
So, if you want to split a the third element on the axis you need to do something like this:
In [1]: import numpy as np
In [2]: x = np.array([1.2, 1.3, 1.5, 2, 2.1, 2.5])
In[3]: np.split(x, [3])
Out[3]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]
If you rather want to split the array x into two equal sub-arrays:
In [4]: np.split(x, 2)
Out[4]: [array([ 1.2, 1.3, 1.5]), array([ 2. , 2.1, 2.5])]
np.split(x, [1, 2, 3]) gives you x[:1], x[1:2], x[3:] which obviously is not what you want. It seems what you want is np.split(x, [3]).

Categories