Numpy bincount() with floats - python

I am trying to get a bincount of a numpy array which is of the float type:
w = np.array([0.1, 0.2, 0.1, 0.3, 0.5])
print np.bincount(w)
How can you use bincount() with float values and not int?

You need to use numpy.unique before you use bincount. Otherwise it's ambiguous what you're counting. unique should be much faster than Counter for numpy arrays.
>>> w = np.array([0.1, 0.2, 0.1, 0.3, 0.5])
>>> uniqw, inverse = np.unique(w, return_inverse=True)
>>> uniqw
array([ 0.1, 0.2, 0.3, 0.5])
>>> np.bincount(inverse)
array([2, 1, 1, 1])

Since version 1.9.0, you can use np.unique directly:
w = np.array([0.1, 0.2, 0.1, 0.3, 0.5])
values, counts = np.unique(w, return_counts=True)

You want something like this?
>>> from collections import Counter
>>> w = np.array([0.1, 0.2, 0.1, 0.3, 0.5])
>>> c = Counter(w)
Counter({0.10000000000000001: 2, 0.5: 1, 0.29999999999999999: 1, 0.20000000000000001: 1})
or, more nicely output:
Counter({0.1: 2, 0.5: 1, 0.3: 1, 0.2: 1})
You can then sort it and get your values:
>>> np.array([v for k,v in sorted(c.iteritems())])
array([2, 1, 1, 1])
The output of bincount wouldn't make sense with floats:
>>> np.bincount([10,11])
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1])
as there is no defined sequence of floats.

Related

Find local maxima index of element not surrounded by zeros

I am trying to identify the indexes of local maxima not surrounded by zeros of a 1D numpy array.
The original code is:
max_idx = [
i for i in range(1, len(elem_array) - 1)
if ((elem_array[i - 1] < elem_array[i]) and (elem_array[i + 1] <= elem_array[i]))
and ((elem_array[i - 1] != 0) or (elem_array[i + 1] != 0))
]
With this code using the array:
elem_array = np.array([23, 0, 45, 0, 12, 13, 14, 0, 0, 0, 1, 67, 1])
the result is: max_idx = [6, 11].
Important: the element i can be greater or equal to element i+1, but just greater than element i-1 and the 0 can be only in 1 side of the element i, this is the reason why 45 is not recognised as a local maximum.
I was trying to modify it with scipy.signal.argrelextrema, but this gives me the result: max_idx = [2, 6, 11], which contains an extra element.
And with the array:
elem_array = np.array([0.0, 0.0, 0.0, 0.0, 0.07, 0.2, 0.4, 0.6, 0.8, 0.9, 1.0, 1.0, 1.0, 1.0, 1.0])
the result is an empty array, when it should be: max_idx = [10].
Do you have any suggestion how the original code could be modified? Thanks
A loop like this is pretty straightforward to vectorize:
mask = (
(elem_array[:-2] < elem_array[1:-1])
& (elem_array[2:] <= elem_array[1:-1])
& ((elem_array[:-2] != 0) | (elem_array[2:] != 0))
)
max_idx = np.nonzero(mask)[0] + 1
You can use numpy.lib.stride_tricks.sliding_window_view to create a sliding window of shape 3 and then apply conditions in vectorized way:
import numpy as np
def get_local_maxima(a: np.array, window_shape: int = 3) -> np.array:
mid_index = window_shape//2
# Adding initial and final zeros and create the windo of given size
window = np.lib.stride_tricks.sliding_window_view(np.array([0]*mid_index + [*a] + [0]*mid_index), window_shape)
c1 = np.argmax(window, axis=1)==mid_index # first condition is that the max must be in the center of the window
c2 = (window[:, [i for i in range(window_shape) if i!=mid_index]]!=0).any(axis=1) # second condition is that one among 0-th and 2-nd element must be non-zero
return np.argwhere(c1 & c2)
a = np.array([23, 0, 45, 0, 12, 13, 14, 0, 0, 0, 1, 67, 1])
b = np.array([0.0, 0.0, 0.0, 0.0, 0.07, 0.2, 0.4, 0.6, 0.8, 0.9, 1.0, 1.0, 1.0, 1.0, 1.0])
get_local_maxima(a)
array([[ 6],
[11]])
get_local_maxima(b)
array([[10]])

Python convert a vector into inverse frequencies

So given this numpy array:
import numpy as np
vector = np.array([1, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 1])
# len(vector) == 12
# 2 x ones, 4 x two, 6 x three
How can I convert this into a vector of inverse frequencies?
Such that for each value, the output contains 1 divided by the frequency of that value:
array([0.16, 0.33, 0.33, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.33, 0.33, 0.16])
[Update to a general one]
How about this one using np.histogram:
import numpy as np
l = np.array([1,2,2,3,3,3,3,3,3,2,2,1])
_u, _l = np.unique(l, return_inverse=True)
np.histogram(_l, bins=np.arange(_u.size+1))[0][_l] / _l.size
This essentially requires a grouping operation, which numpy isn't great at... but pandas is. You can do this with groupby + transform + count, and divide the result by the length of vector.
import pandas as pd
s = pd.Series(vector)
vector = (s.groupby(s).transform('count') / len(s)).values
vector
array([ 0.16666667, 0.33333333, 0.33333333, 0.5 , 0.5 ,
0.5 , 0.5 , 0.5 , 0.5 , 0.33333333,
0.33333333, 0.16666667])
You can use collections.Counter to first determine the frequency of each element. Then create an intermediate mapping dict which will contain key as the element and value as the frequency. Finally using numpy.vectorize to transform the array to your desired format
>>> import numpy as np
>>> from collections import Counter
>>> v = np.array([1, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 1])
>>> freq_dict = Counter(v)
At this point the freq_dict will contains frequency of each element like
>>> freq_dict
>>> Counter({3: 6, 2: 4, 1: 2})
Next build a probability dict of the format element: probability, using dict comprehension
>>> prob_dict = dict((k,round(val/len(v),3)) for k, val in freq_dict.items())
>>> prob_dict
>>> {1: 0.167, 2: 0.333, 3: 0.5}
Finally using numpy.vectorize to get your desired output
>>> out = np.vectorize(prob_dict.get)(v)
This will produce:
>>> out
>>> array([ 0.167, 0.333, 0.333, 0.5, 0.5, 0.5, 0.5, 0.5,
0.5, 0.333, 0.333, 0.167])

python multiply list elements in a changing manner

I want to decay elements of a list such that on every 5 element, the elements will be reduced by half. For example, a list of ones with length 10 will become:
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
[1,1,1,1,1,0.5,0.5,0.5,0.5,0.5,0.25,0.25,0.25,0.25,0.25]
I tried list comprehensions and a basic for loop, but I couldn't construc the logic behind it.
Is this what you're looking for?
>>> x = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
>>> r = [v*2**(-(i//5)) for i, v in enumerate(x)]
>>> r
[1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.25, 0.25, 0.25]
>>>
Think simple.
value = 1
result = []
for i in range(3):
for j in range(5):
result.append(value)
else:
value /= 2
print(result)
# [1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5, 0.25, 0.25, 0.25, 0.25, 0.25]
All other answers are great, i would like to add a stretched solution for this.
start_range = 0
end_range = 5
num = 1
x = [1 for _ in range(10)]
res = []
while start_range <= len(x):
for item in x[start_range:end_range]:
res.append(item*num)
start_range = end_range
end_range = start_range + 5
num /= float(2)
print res
# output: [1, 1, 1, 1, 1, 0.5, 0.5, 0.5, 0.5, 0.5]

Numpy: get the index of the top value sorted that are superior to 0

I have a numpy array and I wan to have the index of the top value sorted that are superior to 0 for instance:
[-0.4, 0.6, 0, 0, 0.4, 0.2, 0.7]
And I want to have:
[6, 1, 4, 5]
I can do it using a function I implemented but I guess for this kind of task there is something already implemented in Numpy.
You can also do:
L = [-0.4, 0.6, 0, 0, 0.4, 0.2, 0.7]
[L.index(i) for i in sorted(filter(lambda x: x>0, L), reverse=True)]
Out[72]: [6, 1, 4, 5]
You can implement with np.where
a = np.array([-0.4, 0.6, 0, 0, 0.4, 0.2, 0.7])
np.where(a > 0)[0].tolist()
Result
[1, 4, 5, 6]
The result of np.where(a > 0) is in the form of tuple of numpy array. So can converted into list with using tolist()
Here's a vectorized approach -
idx = np.where(A>0)[0]
out = idx[A[idx].argsort()[::-1]]
Sample run -
In [37]: A = np.array([-0.4, 0.6, 0, 0, 0.4, 0.2, 0.7])
In [38]: idx = np.where(A>0)[0]
In [39]: idx[A[idx].argsort()[::-1]]
Out[39]: array([6, 1, 4, 5])
Use np.where().
d > 0.0 generates a boolean mask and where fetches all the values where the mask is true.
>>> d=np.array([-0.4, 0.6, 0, 0, 0.4, 0.2, 0.7])
>>> r=np.where( d > 0)
>>> s=sorted(r[0].tolist(), key=lambda x:d[x], reverse=True)
>>> s
[6L, 1L, 4L, 5L]
EDIT
Here's what I mean by mask.
>>> mask = d > 0
>>> mask
array([False, True, False, False, True, True, True], dtype=bool)

Python numpy running sum of repeated trues [duplicate]

I have an array like so:
a = np.array([0.1, 0.2, 1.0, 1.0, 1.0, 0.9, 0.6, 1.0, 0.0, 1.0])
I'd like to have a running counter of instances of 1.0 that resets when it encounters a 0.0, so the result would be:
[0, 0, 1, 2, 3, 3, 3, 4, 0, 1]
My initial thought was to use something like b = np.cumsum(a[a==1.0]), but I don't know how to (1) modify this to reset at zeros or (2) quite how to structure it so the output array is the same shape as the input array. Any ideas how to do this without iteration?
I think you could do something like
def rcount(a):
without_reset = (a == 1).cumsum()
reset_at = (a == 0)
overcount = np.maximum.accumulate(without_reset * reset_at)
result = without_reset - overcount
return result
which gives me
>>> a = np.array([0.1, 0.2, 1.0, 1.0, 1.0, 0.9, 0.6, 1.0, 0.0, 1.0])
>>> rcount(a)
array([0, 0, 1, 2, 3, 3, 3, 4, 0, 1])
This works because we can use the cumulative maximum to figure out the "overcount":
>>> without_reset * reset_at
array([0, 0, 0, 0, 0, 0, 0, 0, 4, 0])
>>> np.maximum.accumulate(without_reset * reset_at)
array([0, 0, 0, 0, 0, 0, 0, 0, 4, 4])
Sanity testing:
def manual(arr):
out = []
count = 0
for x in arr:
if x == 1:
count += 1
if x == 0:
count = 0
out.append(count)
return out
def test():
for w in [1, 2, 10, 10**4]:
for trial in range(100):
for vals in [0,1],[0,1,2]:
b = np.random.choice(vals, size=w)
assert (rcount(b) == manual(b)).all()
print("hooray!")
and then
>>> test()
hooray!

Categories