Computing a "moving sum of counts" on a NumPy array - python

I have the following arrays:
# input
In [77]: arr = np.array([23, 45, 23, 0, 12, 45, 45])
# result
In [78]: res = np.zeros_like(arr)
Now, I want to compute a moving sum of unique elements and store it in the res array.
Concretely, res array should be:
In [79]: res
Out[79]: array([1, 1, 2, 1, 1, 2, 3])
[23, 45, 23, 0, 12, 45, 45]
[1, 1, 2, 1, 1, 2, 3]
We start counting each element and increment the count if an element re-appears, until we reach end of the array. This element specific counts should be returned as result.
How should we do achieve this using NumPy built-in functions? I tried using numpy.bincount but it gives undesired results.

Not sure you'll find a builtin, so here is a homebrew using argsort.
def running_count(arr):
idx = arr.argsort(kind='mergesort')
sarr = arr[idx]
neq = np.where(sarr[1:] != sarr[:-1])[0] + 1
run = np.ones(arr.shape, int)
run[neq[0]] -= neq[0]
run[neq[1:]] -= np.diff(neq)
res = np.empty_like(run)
res[idx] = run.cumsum()
return res
For example:
>>> running_count(arr)
array([1, 1, 2, 1, 1, 2, 3])
>>> running_count(np.array(list("xabaaybeeetz")))
array([1, 1, 1, 2, 3, 1, 2, 1, 2, 3, 1, 1])
Explainer:
We first sort using argsort because we need indices to go back to original order in the end. Here it is important to have a stable sort, hence the use of the slow mergesort.
Once the elements are sorted the running counts will form a "saw tooth" pattern. The vectorized way to create this is to observe that the diff of a saw tooth has "jump" values where a new tooth starts and ones everywhere else. So that is staight-forward to construct.

Related

Numpy: How to check if a number is the minimum/maximum among the previous K numbers?

I'm trying to automate a trading strategy which should enter/exit a long position when the current price is the minimum/maximum among the previous k prices.
The result should contain 1 if the current number is maximum among previous k numbers, -1 if it is the minimum and 0 if none of the conditions are true.
For example if k = 3 and the numpyp array = [1, 2, 3, 2, 1, 6], the result should be an array like:
[0, 0, 1, 0, -1, 1].
I tried the numpy's max function but don't know how to take into account the previous k numbers instead of fixed index and how to switch to default condition for the first k - 1 numbers which should be 0 since there are not k number available to compare them with.
I will use Pandas
import pandas as pd
array = [1, 2, 3, 2, 1, 6]
df = pd.DataFrame(array)
df['rolling_max'] = df[0].rolling(3).max()
df['rolling_min'] = df[0].rolling(3).min()
df['result'] = df.apply(lambda row: 1 if row[0] == row['rolling_max'] else (-1 if row[0] == row['rolling_min'] else 0), axis=1)
Here is a solution with numpy using numpy.lib.stride_tricks.sliding_window_view, which was introduced in version 1.20.0.
Note that this solution (like the one proposed by #Hanwei Tang) does not exactly yield the result you was looking for, because in the second window ([2, 3, 2]) 2 is the minimum value and thus a -1 is returned instead of zero (what you requested). But maybe you should rethink whether you really want a zero for the second window or a -1.
EDIT: If a windows only contains same numbers, i.e. the minimum and maximum are the same, this method returns a zero.
import numpy as np
def rolling_max(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.max(windows, axis=-1)
def rolling_min(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.min(windows, axis=-1)
def check_prize(a, wsize):
rmax = rolling_max(a, wsize)
rmin = rolling_min(a, wsize)
ismax = np.where(a[wsize-1:] == rmax, 1, 0)
ismin = np.where(a[wsize-1:] == rmin, -1, 0)
result = np.zeros_like(a)
result[wsize-1:] = ismax + ismin
return result
a = np.array([1, 2, 3, 2, 1, 6])
check_prize(a, wsize=3)
# Output:
# array([ 0, 0, 1, -1, -1, 1])
b = np.array([1, 2, 4, 3, 1, 6])
check_prize(b, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])
c = np.array([1, 2, 2, 2, 1, 6])
check_prize(c, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])
Another approach using sliding_window_view with pad:
from numpy.lib.stride_tricks import sliding_window_view as swv
k = 3
a = np.array([1, 2, 3, 2, 1, 6])
# create sliding window
v = swv(np.pad(a.astype(float), (k-1, 0), constant_values=np.nan), k)
# compare each element to min/max of sliding window
out = np.select([np.max(v, 1)==a, np.min(v, 1)==a], [1, -1], 0)
Output: array([ 0, 0, 1, -1, -1, 1])

How to select r% samples from a list based on their values?

Let's say I have a list a = [2, 1, 4, 3, 5]. I want to do the following:
I define a percentage value r%. I would like to select r% of samples having low values and get their indices.
For examples, if r=80 - The output would be the indices of 1, 2, 3, 4 i.e. 1, 0, 3, 2
Use np.percentile and np.where
a = np.array([2, 1, 4, 3, 5])
r = 80
np.where(a < np.percentile(a, r))
> (array([0, 1, 2, 3]),)
Note: in your example you return the order of the indices as if the elements were sorted. It's not clear if this is important for you but if it is it's easy in NumPy! Just replace the last line with
np.argsort(a)[a < np.percentile(a, r)]
> array([1, 0, 3, 2])
def perc(r, number_list):
# Find number of samples based on the percentage (rounding to closest integer)
number_of_samples = len(number_list) * (r/ 100)
number_list.sort()
return [number_list[index] for index in range(number_of_samples)]

Numpy - Count Number of Values Until Condition Is Satisfied

If I have two numpy arrays of the same size.
ArrayOne = np.array([ 2, 5, 5, 6, 7, 10, 13])
ArrayTwo = np.array([ 8, 10, 12, 14, 16, 18, 24])
How can I count how many elements there are until the beginning of the array. Unless the condition ArrayOne >= ArrayTwo is satisfied. In which case how many elements until that condition. Then make an array out of the result.
So as an example for element [0] there are 0 elements in front. For element [1] there is 1 element in front, and ArrayOne >= ArrayTwo wasn't satisfied. At element [5] in ArrayOne is bigger than element[0] in ArrayTwo so there are four elements until element [1] in ArrayTwo Etc.
Giving the result
result = np.array([ 0, 1, 2, 3, 4, 4, 3])
Thanks in advance.
Basically, at index i you have the value
value = i -count(how often element i in array one was bigger than array two until index i)
Because I'm on mobile with damn autocorrect, I rename the two arrays to a and b.
def get_value(a, b, i):
max_value = a[i]
nb_smaller_elements = sum(1 for el in range(i) if b[el] < max_value)
return i - nb_smaller_elements
I think I got it. Using #Paul Panzer 's answer, I made a for loop that goes through the list.
def toggle(ArrayOne,ArrayTwo):
a = 0
sum = -1
linels = []
for i in range(len(ArrayOne)):
sum += 1
a = sum - np.searchsorted(ArrayTwo, ArrayOne[i])
linels.append(a)
return np.array(linels)
I get the result
linels = np.array([ 0, 1, 2, 3, 4, 4, 3])

Python - Convert the array in a tuple to just a normal array

I have a signal where I want to find the average height of the values. This is done by finding the zero crossings and calculating the max and min between each zero crossing, then averaging these values.
My problem occurs when I want to use np.where() to find where the signal is crossing zero. When I use np.where() I get the result in a tuple, but I want it in an array where I can count the amount of times zero is crossed.
I am new to Python and coming from Matlab it is a bit confusing with all the different classes. As you can see, I get an error because nu = len(zero_u) gives 1 as a result, because the whole array is written in a tuple as one element.
Any ideas how to go around this?
The code looks like this:
import numpy as np
def averageheight(f):
rms = np.std(f)
f = f + (rms * 10**-6)
# Find zero crossing
fsign = np.sign(f)
fdiff = np.diff(fsign)
zero_u = np.asarray(np.where(fdiff > 0)) + 1
zero_d = np.asarray(np.where(fdiff < 0)) + 1
nu = len(zero_u)
nd = len(zero_d)
value_max = np.zeros((nu, 1))
value_min = np.zeros((nu, 1))
imaxvec = np.zeros((nu, 1))
iminvec = np.zeros((nu, 1))
if (nu > 2) and (nd > 2):
if zero_u[0] > zero_d[0]:
zero_d[0] = []
nu = len(zero_u)
nd = len(zero_d)
ncross = np.fmin(nu, nd)
# Find Maxima:
for ic in range(0, ncross - 1):
up = int(zero_u[ic])
down = int(zero_d[ic])
fvec = f[up:down]
value_max[ic] = np.amax(fvec)
index_max = value_max.argmax()
imaxvec[ic] = up + index_max - 1
# Find Minima:
for ic in range(0, ncross - 2):
down = int(zero_d[ic])
up = int(zero_u[ic+1])
fvec = f[down:up]
value_min[ic] = np.amin(fvec)
index_min = value_min.argmin()
iminvec[ic] = down + index_min - 1
# Remove spurious values, bumps and zero_d
thr = rms/3
maxfind = np.where(value_max < thr)
for i in range(0, len(maxfind)):
imaxfind = np.where(value_max == maxfind[i])
imaxvec[imaxfind] = 0
value_max[imaxfind] = 0
minfind = np.where(value_min > -thr)
for j in range(0, len(minfind)):
iminfind = np.where(value_min == minfind[j])
value_min[iminfind] = 0
iminvec[iminfind] = 0
# Find Average Height
avh = np.mean(value_max) - np.mean(value_min)
else:
avh = 0
return avh
np.where, and np.nonzero even more so, clearly explains that it returns a tuple, with one array for each dimension of the condition array:
In [71]: arr = np.random.randint(-5,5,10)
In [72]: arr
Out[72]: array([ 3, 4, 2, -3, -1, 0, -5, 4, 2, -3])
In [73]: arr.shape
Out[73]: (10,)
In [74]: np.where(arr>=0)
Out[74]: (array([0, 1, 2, 5, 7, 8]),)
In [75]: arr[_]
Out[75]: array([3, 4, 2, 0, 4, 2])
That Out[74] tuple can be used directly as an index.
You can also extract the array from the tuple:
In [76]: np.where(arr>=0)[0]
Out[76]: array([0, 1, 2, 5, 7, 8])
That, I think is a better choice than the np.asarray(np.where(...))
This convention for where becomes clearer when we use it on a 2d array
In [77]: arr2 = arr.reshape(2,5)
In [78]: np.where(arr2>=0)
Out[78]: (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 2, 3]))
In [79]: arr2[_]
Out[79]: array([3, 4, 2, 0, 4, 2])
Again we are indexing with a tuple. arr2[1,3] is really arr2[(1,3)]. The values in [] indexing brackets are actually passed to the indexing function as a tuple of values.
np.argwhere applies transpose to the result of where, producing an array:
In [80]: np.transpose(np.where(arr2>=0))
Out[80]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 2],
[1, 3]])
That's the same indexing arrays, but arranged in a 2d column matrix.
If you need the count of where without the actual values, a slightly faster function is
In [81]: np.count_nonzero(arr>=0)
Out[81]: 6
In fact np.nonzero uses the count to first determine the size of the arrays that it will return.

I have need the N minimum (index) values in a numpy array

Hi I have an array with X amount of values in it I would like to locate the indexs of the ten smallest values. In this link they calculated the maximum effectively, How to get indices of N maximum values in a numpy array?
however I cant comment on links yet so I'm having to repost the question.
I'm not sure which indices i need to change to achieve the minimum and not the maximum values.
This is their code
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])
If you call
arr.argsort()[:3]
It will give you the indices of the 3 smallest elements.
array([0, 2, 1], dtype=int64)
So, for n, you should call
arr.argsort()[:n]
Since this question was posted, numpy has updated to include a faster way of selecting the smallest elements from an array using argpartition. It was first included in Numpy 1.8.
Using snarly's answer as inspiration, we can quickly find the k=3 smallest elements:
In [1]: import numpy as np
In [2]: arr = np.array([1, 3, 2, 4, 5])
In [3]: k = 3
In [4]: ind = np.argpartition(arr, k)[:k]
In [5]: ind
Out[5]: array([0, 2, 1])
In [6]: arr[ind]
Out[6]: array([1, 2, 3])
This will run in O(n) time because it does not need to do a full sort. If you need your answers sorted (Note: in this case the output array was in sorted order but that is not guaranteed) you can sort the output:
In [7]: sorted(arr[ind])
Out[7]: array([1, 2, 3])
This runs on O(n + k log k) because the sorting takes place on the smaller
output list.
I don't guarantee that this will be faster, but a better algorithm would rely on heapq.
import heapq
indices = heapq.nsmallest(10,np.nditer(arr),key=arr.__getitem__)
This should work in approximately O(N) operations whereas using argsort would take O(NlogN) operations. However, the other is pushed into highly optimized C, so it might still perform better. To know for sure, you'd need to run some tests on your actual data.
Just don't reverse the sort results.
In [164]: a = numpy.random.random(20)
In [165]: a
Out[165]:
array([ 0.63261763, 0.01718228, 0.42679479, 0.04449562, 0.19160089,
0.29653725, 0.93946388, 0.39915215, 0.56751034, 0.33210873,
0.17521395, 0.49573607, 0.84587652, 0.73638224, 0.36303797,
0.2150837 , 0.51665416, 0.47111993, 0.79984964, 0.89231776])
Sorted:
In [166]: a.argsort()
Out[166]:
array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2, 17, 11, 16, 8, 0, 13, 18,
12, 19, 6])
First ten:
In [168]: a.argsort()[:10]
Out[168]: array([ 1, 3, 10, 4, 15, 5, 9, 14, 7, 2])
This code save 20 index of maximum element of split_list in Twenty_Maximum:
Twenty_Maximum = split_list.argsort()[-20:]
against this code save 20 index of minimum element of split_list in Twenty_Minimum:
Twenty_Minimum = split_list.argsort()[:20]

Categories