Numpy: Optimal way to count indexs occurrence in an array

Numpy: Optimal way to count indexs occurrence in an array - python

I have an array indexs. It's very long (>10k), and each int value is rather small (<100). e.g.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known
Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times...), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:
UPDATE: _test4 is #Ch3steR's sol:
indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9
def _test1():
counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
for ind in indexs:
counts[ind] += 1
return counts
def _test2():
counts = np.zeros((indexs_max + 1,), dtype=np.int32)
uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
counts[uniq_vals] = uniq_cnts
# this is because some value in range may be missing
return counts
def _test3():
therange = np.arange(0, indexs_max + 1)
counts = np.sum(indexs[None] == therange[:, None], axis=1)
return counts
def _test4():
return np.bincount(indexs, minlength=indexs_max+1)
Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. Although _test3 is the fastest, it uses additional big memory.
So I'm asking for any better methods. Thank u :) (#Ch3steR)
UPDATE: np.bincount seems optimal so far.

You can use np.bincount to count the occurrences in an array.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2, 1, 1, 1])
# 0's 1's 2's 3's 4's count
There's a caveat to it np.bincount(x).size == np.amax(x)+1
Example:
indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
# 5's 10's count
Here's it would count occurrences of 0 to the max in the array, a workaround can be
c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1, 1])
# 5's 10's count
If you have no missing values from i.e from 0 to your_max you can use np.bincount.
Another caveat:
From docs:
Count the number of occurrences of each value in an array of non-negative ints.

Related

Numpy: How to check if a number is the minimum/maximum among the previous K numbers?

I'm trying to automate a trading strategy which should enter/exit a long position when the current price is the minimum/maximum among the previous k prices.
The result should contain 1 if the current number is maximum among previous k numbers, -1 if it is the minimum and 0 if none of the conditions are true.
For example if k = 3 and the numpyp array = [1, 2, 3, 2, 1, 6], the result should be an array like:
[0, 0, 1, 0, -1, 1].
I tried the numpy's max function but don't know how to take into account the previous k numbers instead of fixed index and how to switch to default condition for the first k - 1 numbers which should be 0 since there are not k number available to compare them with.

I will use Pandas
import pandas as pd
array = [1, 2, 3, 2, 1, 6]
df = pd.DataFrame(array)
df['rolling_max'] = df[0].rolling(3).max()
df['rolling_min'] = df[0].rolling(3).min()
df['result'] = df.apply(lambda row: 1 if row[0] == row['rolling_max'] else (-1 if row[0] == row['rolling_min'] else 0), axis=1)

Here is a solution with numpy using numpy.lib.stride_tricks.sliding_window_view, which was introduced in version 1.20.0.
Note that this solution (like the one proposed by #Hanwei Tang) does not exactly yield the result you was looking for, because in the second window ([2, 3, 2]) 2 is the minimum value and thus a -1 is returned instead of zero (what you requested). But maybe you should rethink whether you really want a zero for the second window or a -1.
EDIT: If a windows only contains same numbers, i.e. the minimum and maximum are the same, this method returns a zero.
import numpy as np
def rolling_max(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.max(windows, axis=-1)
def rolling_min(a, wsize):
windows = np.lib.stride_tricks.sliding_window_view(a, wsize)
return np.min(windows, axis=-1)
def check_prize(a, wsize):
rmax = rolling_max(a, wsize)
rmin = rolling_min(a, wsize)
ismax = np.where(a[wsize-1:] == rmax, 1, 0)
ismin = np.where(a[wsize-1:] == rmin, -1, 0)
result = np.zeros_like(a)
result[wsize-1:] = ismax + ismin
return result
a = np.array([1, 2, 3, 2, 1, 6])
check_prize(a, wsize=3)
# Output:
# array([ 0, 0, 1, -1, -1, 1])
b = np.array([1, 2, 4, 3, 1, 6])
check_prize(b, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])
c = np.array([1, 2, 2, 2, 1, 6])
check_prize(c, wsize=3)
# Output:
# array([ 0, 0, 1, 0, -1, 1])

Another approach using sliding_window_view with pad:
from numpy.lib.stride_tricks import sliding_window_view as swv
k = 3
a = np.array([1, 2, 3, 2, 1, 6])
# create sliding window
v = swv(np.pad(a.astype(float), (k-1, 0), constant_values=np.nan), k)
# compare each element to min/max of sliding window
out = np.select([np.max(v, 1)==a, np.min(v, 1)==a], [1, -1], 0)
Output: array([ 0, 0, 1, -1, -1, 1])

How to set numpy array value based on previous value?

Example:
a = np.array([1, 0, 2, 0, 3, 4])
Goal:
Set value to 0 if the previous value is 0
Desired output:
[1, 0, 0, 0, 0, 4]

Make a mask of the locations that are zero:
m = (a == 0)
Apply the mask to a shifted slice of the array:
a[1:][m[:-1]] = 0
In some cases, you may want to shift by incrementing the indices:
i = np.flatnonzero(m[:-1]) + 1
a[i] = 0

First group of non-zero values (by neglecting single occurrence of zero)

This is what I intend to do in Python:
I have an array (freq_arr). I want to find the indices of the first group of non-zero elements. I start searching for non-zero elements from start, when I find the first non-zero element (the first element is 5, in the example below), I record its index (4, in the example shown below). I search for the next one, and record its index (which will be 5). If I encounter a single zero, I want to neglect it and continue searching for non-zero values. This way, I consider the values 5,6,0,8,9,0,1 with indices 4,5,6,7,8,9 and 10. After these values, there are five zeros and hence I stop my search. Upto a maximum of two zeros can exist in the output, and search continues. However, if I encounter 3 or more zeros, I want to stop searching.
Input:
freq_arr = np.array([0, 0, 0, 0, 5, 6, 0, 8, 9, 0, 1, 0, 0, 0, 0, 3, 6, 0])
Output:
out_arr_indices = [4, 5, 6, 7, 8, 9, 10]
I know to code this using for loops, but I want to avoid it since it's not efficient. Kindly let me know how this can be done.
The array will be single dimension. Each element will be in the range of 5000 to 20000.

Here's one approach with slicing and argmax (to detect non-zeros and zeros) -
def start_stop_indices(freq_arr, W=3):
nnz_mask = freq_arr!=0
start_idx = nnz_mask.argmax()
m0 = nnz_mask[start_idx:]
kernel = np.ones(W,dtype=int)
last_idx = np.convolve(m0, kernel).argmin() + start_idx - W
return start_idx, last_idx
Sample runs -
In [203]: freq_arr
Out[203]: array([0, 0, 0, 0, 5, 6, 0, 8, 9, 0, 1, 0, 0, 0, 0, 3, 6, 0])
In [204]: start_stop_indices(freq_arr, W=3)
Out[204]: (4, 10)
In [205]: start_stop_indices(freq_arr, W=2)
Out[205]: (4, 10)
In [206]: start_stop_indices(freq_arr, W=1)
Out[206]: (4, 5)
Here's another for the fixed window search of length = 3, avoiding the use of convolution and making more use of slicing -
def start_stop_indices_v2(freq_arr):
nnz_mask = freq_arr!=0
start_idx = nnz_mask.argmax()
m0 = nnz_mask[start_idx:]
idx0 = (m0[:-2] | m0[1:-1] | m0[2:]).argmin()
last_idx = idx0 + start_idx - 1
return start_idx, last_idx

If I understand your problem right, you want to iterate through the list skipping two zeros or less in a row, and add the indices of non-zero values to an output array. Maybe something like below
freq_arr = [0, 0, 5, 6, 0, 8, 9, 0, 1, 0, 0, 0, 0, 3, 6, 0]
outputarr = []
count = 0
zerocount = 0
while count < len(freq_arr) and zerocount < 3:
if freq_arr[count] == 0:
zerocount += 1
else:
zerocount = 0
outputarr.append(count)
count += 1
If you provide more details we might be able to assist better.

Count number of clusters of non-zero values in Python?

My data looks something like this:
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.
Number of zeros between groups of non-zeros is variable
Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)

With a as the input array, we could have a vectorized solution -
m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]
Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -
out = np.count_nonzero(m[1:] > m[:-1]) + m[0]
Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.
Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.
Sample runs for three cases -
In [92]: a # Case1 :Given sample
Out[92]:
array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0,
0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7,
1])
In [93]: m = a!=0
In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3
In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start
In [96]: m = a!=0
In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4
In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end
In [100]: m = a!=0
In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5

You may achieve it via using itertools.groupby() with list comprehension expression as:
>>> from itertools import groupby
>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3

simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
print(count) # 3

pad array with a zero on both sides with np.concatenate
find where zero with a == 0
find boundaries with np.diff
sum up boundaries found with sum
divide by two because we will have found twice as many as we want
def nonzero_clusters(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
demonstration
nonzero_clusters(
[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)
3
nonzero_clusters([0, 1, 2, 0, 1, 2])
2
nonzero_clusters([0, 1, 2, 0, 1, 2, 0])
2
nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])
3
timing
a = np.random.choice((0, 1), 100000)
code
from itertools import groupby
def div(a):
m = a != 0
return (m[1:] > m[:-1]).sum() + m[0]
def pir(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
def jean(a):
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
return count
def moin(a):
return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
def user(a):
return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

Counting same elements in an array and create dictionary

This question might be too noob, but I was still not able to figure out how to do it properly.
I have a given array [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3] (arbitrary elements from 0-5) and I want to have a counter for the occurence of zeros in a row.
1 times 6 zeros in a row
1 times 4 zeros in a row
2 times 1 zero in a row
=> (2,0,0,1,0,1)
So the dictionary consists out of n*0 values as the index and the counter as the value.
The final array consists of 500+ million values that are unsorted like the one above.

This should get you what you want:
import numpy as np
a = [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3]
# Find indexes of all zeroes
index_zeroes = np.where(np.array(a) == 0)[0]
# Find discontinuities in indexes, denoting separated groups of zeroes
# Note: Adding True at the end because otherwise the last zero is ignored
index_zeroes_disc = np.where(np.hstack((np.diff(index_zeroes) != 1, True)))[0]
# Count the number of zeroes in each group
# Note: Adding 0 at the start so first group of zeroes is counted
count_zeroes = np.diff(np.hstack((0, index_zeroes_disc + 1)))
# Count the number of groups with the same number of zeroes
groups_of_n_zeroes = {}
for count in count_zeroes:
if groups_of_n_zeroes.has_key(count):
groups_of_n_zeroes[count] += 1
else:
groups_of_n_zeroes[count] = 1
groups_of_n_zeroes holds:
{1: 2, 4: 1, 6: 1}

Similar to #fgb's, but with a more numpythonic handling of the counting of the occurrences:
items = np.array([0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3])
group_end_idx = np.concatenate(([-1],
np.nonzero(np.diff(items == 0))[0],
[len(items)-1]))
group_len = np.diff(group_end_idx)
zero_lens = group_len[::2] if items[0] == 0 else group_len[1::2]
counts = np.bincount(zero_lens)
>>> counts[1:]
array([2, 0, 0, 1, 0, 1], dtype=int64)

This seems awfully complicated, but I can't seem to find anything better:
>>> l = [0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 2, 1, 0, 2, 3]
>>> import itertools
>>> seq = [len(list(j)) for i, j in itertools.groupby(l) if i == 0]
>>> seq
[6, 4, 1, 1]
>>> import collections
>>> counter = collections.Counter(seq)
>>> [counter.get(i, 0) for i in xrange(1, max(counter) + 1)]
[2, 0, 0, 1, 0, 1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Numpy: Optimal way to count indexs occurrence in an array - python

Related

Numpy: How to check if a number is the minimum/maximum among the previous K numbers?

How to set numpy array value based on previous value?

First group of non-zero values (by neglecting single occurrence of zero)

Count number of clusters of non-zero values in Python?

Counting same elements in an array and create dictionary

Categories

Resources