How to set threshold values in a numpy array? - python

I have an array of values and I want to set specific values to integers. Anything below 0.95 set to 0, anything above 1.6 set to 2. How can I set everything between 0.95 and 1.6 to 1?
n1_binary = np.where(n1_img_resize < 0.95, 0, n1_img_resize)
n1_binary = np.where(n1_binary > 1.6, 2, n1_binary)

Like this in one line using np.where:
n1_binary = np.where((n1_binary > 0.95) & (n1_binary <= 1.6), 1, n1_binary)
Check below example:
In [652]: a = np.array([0.99, 1.23, 1.7, 9])
In [653]: a = np.where((a > 0.95) & (a <= 1.6), 1, a)
In [654]: a
Out[654]: array([1. , 1. , 1.7, 9. ])

Try this:
a = np.array([0.3, 5, 7, 2])
a[a < 0.95] = 0
a[a > 1.6] = 1
This is very clear and consice, and tells exactly what you are doing. a now is:
[0.0, 1.0, 1.0, 1.0]

Related

how to use pandas list to solve linear equation using numpy

I was hoping to find out how to take two pandas lists and solve for x.
list_a = [-1, 1, 1, -1, 1, 1, 1, -1, 0, 1]
list_b = [0.0, 0.0, 1.75, -1.125, 1.0, 0.5, 0.0, -1.25, 1.375, -0.125]
for each entry in the list I would like to compute the following:
x + list_b = list_a
which would then return a list of 10 elements with the result for x.
Any help is appreciated!
If you convert list_a and list_b to numpy.array you can just solve for x using subtraction, which numpy will perform elementwise
>>> import numpy as np
>>> list_a = np.array([-1, 1, 1, -1, 1, 1, 1, -1, 0, 1])
>>> list_b = np.array([0.0, 0.0, 1.75, -1.125, 1.0, 0.5, 0.0, -1.25, 1.375, -0.125])
>>> x = list_a - list_b
>>> x
array([-1. , 1. , -0.75 , 0.125, 0. , 0.5 , 1. , 0.25 , -1.375, 1.125])

numpy first occurence in array for an array of reference values

Given a threshold alpha and a numpy array a, there are multiple possibilities for finding the first index i such that arr[i] > alpha; see Numpy first occurrence of value greater than existing value:
numpy.searchsorted(a, alpha)+1
numpy.argmax(a > alpha)
In my case, alpha can be either a scalar or an array of arbitrary shape. I'd like to have a function get_lowest that works in both cases:
alpha = 1.12
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # 2
alpha = numpy.array(1.12, -0.5, 2.7])
arr = numpy.array([0.0, 1.1, 1.2, 3.0])
get_lowest(arr, alpha) # [2, 0, 3]
Any hints?
You can use broadcasting:
In [9]: arr = array([ 0. , 1.1, 1.2, 3. ])
In [10]: alpha = array([ 1.12, -0.5 , 2.7 ])
In [11]: np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
Out[11]: array([2, 0, 3])
To collapse multidimensional arrays, you can use np.squeeze, but you might have to do something special if you want a Python float in your first case:
def get_lowest(arr, alpha):
b = np.argmax(arr > np.atleast_2d(alpha).T, axis=1)
b = np.squeeze(b)
if np.size(b) == 1:
return float(b)
return b
searchsorted actually does the trick:
np.searchsorted(a, alpha)
The axis argument to argmax helps out; this
np.argmax(numpy.add.outer(alpha, -a) < 0, axis=-1)
does the trick. Indeed
import numpy as np
a = np.array([0.0, 1.1, 1.2, 3.0])
alpha = 1.12
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # 0
np.searchsorted(a, alpha) # 0
alpha = np.array([1.12, -0.5, 2.7])
np.argmax(np.add.outer(alpha, -a) < 0, axis=-1) # [2 0 3]
np.searchsorted(a, alpha) # [2 0 3]

Using Python numpy where condition to change entires below a certain value

Here's my array:
import numpy as np
a = np.array([0, 5.0, 0, 5.0, 5.0])
Is it possible to use numpy.where in some way to add a value x to all those entires in a that are less than l?
So something like:
a = a[np.where(a < 5).add(2.5)]
Should return:
array([2.5, 5.0, 2.5, 5.0, 5.0])
a = np.array([0., 5., 0., 5., 5.])
a[np.where(a < 5)] += 2.5
in case you really want to use where or just
a[a < 5] += 2.5
which I usually use for these kind of operations.
You could use np.where to create the array of additions and then simply add to a -
a + np.where(a < l, 2.5,0)
Sample run -
In [16]: a = np.array([1, 5, 4, 5, 5])
In [17]: l = 5
In [18]: a + np.where(a < l, 2.5,0)
Out[18]: array([ 3.5, 5. , 6.5, 5. , 5. ])
Given that you probably need to change the dtype (from int to float) you need to create a new array. A simple way without explicit .astype or np.where calls is multiplication with a mask:
>>> b = a + (a < 5) * 2.5
>>> b
array([ 2.5, 5. , 2.5, 5. , 5. ])
with np.where this can be changed to a simple expression (using the else-condition, third argument, in where):
>>> a = np.where(a < 5, a + 2.5, a)
>>> a
array([ 2.5, 5. , 2.5, 5. , 5. ])
a += np.where(a < 1, 2.5, 0)
where will return the second argument wherever the condition (first argument) is satisfied and the third argument otherwise.
You can use a "masked array" as an index. Boolean operations, such as a < 1 return such an array.
>>> a<1
array([False, False, False, False, False], dtype=bool)
you can use it as
>>> a[a<1] += 1
The a<1 part selects only the items in a that match the condition. You can operate on this part only then.
If you want to keep a trace of your selection, you can proceed in two steps.
>>> mask = a>1
>>> a[mask] += 1
Also, you can count the items matching the conditions:
>>> print np.sum(mask)

Digitize numpy array

I have two vectors:
time_vec = np.array([0.2,0.23,0.3,0.4,0.5,...., 28....])
values_vec = np.array([500,200,220,250,200,...., 218....])
time_vec.shape == values_vec.shape
Now, I want to take the bin the values for every 0.5 second interval and take the mean of the values. So for example
value_vec = np.array(mean_of(500,200,220,250,200), mean_of(next values in next 0.5 second interval))
Is there any numpy method I am missing which bin and take mean of the bins?
You may use np.ufunc.reduceat. You just need to populate where the breaking points are, i.e. when floor(t / .5) changes:
say for:
>>> t
array([ 0. , 0.025 , 0.2125, 0.2375, 0.2625, 0.3375, 0.475 , 0.6875, 0.7 , 0.7375, 0.8 , 0.9 ,
0.925 , 1.05 , 1.1375, 1.15 , 1.1625, 1.1875, 1.1875, 1.225 ])
>>> b
array([ 0.8144, 0.3734, 1.4734, 0.6307, -0.611 , -0.8762, 1.6064, 0.3863, -0.0103, -1.6889, -0.4328, -0.7373,
1.7856, 0.8938, -1.1574, -0.4029, -0.4352, -0.4412, -1.7819, -0.3298])
the break points are:
>>> i = np.r_[0, 1 + np.nonzero(np.diff(np.floor(t / .5)))[0]]
>>> i
array([ 0, 7, 13])
and the sum over each interval is:
>>> np.add.reduceat(b, i)
array([ 3.411 , -0.6975, -3.6545])
and the mean would be sum over length of interval:
>>> np.add.reduceat(b, i) / np.diff(np.r_[i, len(b)])
array([ 0.4873, -0.1162, -0.5221])
You can pass a weights= parameter to np.histogram to compute the summed values within each time bin, then normalize by the bin count:
# 0.5 second time bins to average within
tmin = time_vec.min()
tmax = time_vec.max()
bins = np.arange(tmin - (tmin % 0.5), tmax - (tmax % 0.5) + 0.5, 0.5)
# summed values within each bin
bin_sums, edges = np.histogram(time_vec,bins=bins, weights=values_vec)
# number of values within each bin
bin_counts, edges = np.histogram(time_vec,bins=bins)
# average value within each bin
bin_means = bin_sums / bin_counts
You can use np.bincount that is supposedly pretty efficient for such binning operations. Here's an implementation based on it to solve our case -
# Find indices where 0.5 intervals shifts onto next ones
A = time_vec*2
idx = np.searchsorted(A,np.arange(1,int(np.ceil(A.max()))),'right')
# Setup ID array such that all 0.5 intervals are ID-ed same
out = np.zeros((A.size),dtype=int)
out[idx[idx < A.size]] = 1
ID = out.cumsum()
# Finally use bincount to sum and count elements of same IDs
# and thus get mean values per ID
mean_vec = np.bincount(ID,values_vec)/np.bincount(ID)
Sample run -
In [189]: time_vec
Out[189]:
array([ 0.2 , 0.23, 0.3 , 0.4 , 0.5 , 0.7 , 0.8 , 0.92, 0.95,
1. , 1.11, 1.5 , 2. , 2.3 , 2.5 , 4.5 ])
In [190]: values_vec
Out[190]: array([36, 11, 93, 32, 72, 75, 26, 28, 77, 31, 60, 77, 76, 32, 6, 85])
In [191]: ID
Out[191]: array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 3, 4, 4, 5], dtype=int32)
In [192]: mean_vec
Out[192]: array([ 48.8, 47.4, 68.5, 76. , 19. , 85. ])

matlab ismember function in python

Although similar questions have been raised a couple of times, still I cannot make a function similar to the matlab ismember function in Python. In particular, I want to use this function in a loop, and compare in each iteration a whole matrix to an element of another matrix. Where the same value is occurring, I want to print 1 and in any other case 0.
Let say that I have the following matrices
d = np.reshape(np.array([ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]),(1,10))
d_unique = np.unique(d)
then I have
d_unique
array([ 0. , 1. , 1.25, 1.5 , 1.75, 2.25])
Now I want to iterate like
J = np.zeros(np.size(d_unique))
for i in xrange(len(d_unique)):
J[i] = np.sum(ismember(d,d_unique[i]))
so as to take as an output:
J = [3,1,2,2,1,1]
Does anybody have any idea? Many thanks in advance.
In contrast to other answers, numpy has the built-in numpy.in1d for doing that.
Usage in your case:
bool_array = numpy.in1d(array1, array2)
Note: It also accepts lists as inputs.
EDIT (2021):
numpy now recommend using np.isin instead of np.in1d. np.isin preserves the shape of the input array, while np.in1d returns a flattened output.
To answer your question, I guess you could define a ismember similarly to:
def ismember(d, k):
return [1 if (i == k) else 0 for i in d]
But I am not familiar with numpy, so a little adjustement may be in order.
I guess you could also use Counter from collections:
>>> from collections import Counter
>>> a = [2.25, 1.25, 1.5, 1., 0., 1.25, 1.75, 0., 1.5, 0. ]
>>> Counter(a)
Counter({0.0: 3, 1.25: 2, 1.5: 2, 2.25: 1, 1.0: 1, 1.75: 1})
>>> Counter(a).keys()
[2.25, 1.25, 0.0, 1.0, 1.5, 1.75]
>>> c =Counter(a)
>>> [c[i] for i in sorted(c.keys())]
[3, 1, 2, 2, 1, 1]
Once again, not numpy, you will probably have to do some list(d) somewhere.
Try the following function:
def ismember(A, B):
return [ np.sum(a == B) for a in A ]
This should very much behave like the corresponding MALTAB function.
Try the ismember library from pypi.
pip install ismember
Example:
# Import library
from ismember import ismember
# data
d = [ 2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ]
d_unique = [ 0. , 1. , 1.25, 1.5 , 1.75, 2.25]
# Lookup
Iloc,idx = ismember(d, d_unique)
# Iloc is boolean defining existence of d in d_unique
print(Iloc)
# [[True True True True True True True True True True]]
# indexes of d_unique that exists in d
print(idx)
# array([5, 2, 3, 1, 0, 2, 4, 0, 3, 0], dtype=int64)
print(d_unique[idx])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
print(d[Iloc])
array([2.25, 1.25, 1.5 , 1. , 0. , 1.25, 1.75, 0. , 1.5 , 0. ])
# These vectors will match
d[Iloc]==d_unique[idx]

Categories