Count number of clusters of non-zero values in Python?

Count number of clusters of non-zero values in Python? - python

My data looks something like this:
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.
Number of zeros between groups of non-zeros is variable
Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)

With a as the input array, we could have a vectorized solution -
m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]
Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -
out = np.count_nonzero(m[1:] > m[:-1]) + m[0]
Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.
Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.
Sample runs for three cases -
In [92]: a # Case1 :Given sample
Out[92]:
array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0,
0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7,
1])
In [93]: m = a!=0
In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3
In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start
In [96]: m = a!=0
In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4
In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end
In [100]: m = a!=0
In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5

You may achieve it via using itertools.groupby() with list comprehension expression as:
>>> from itertools import groupby
>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3

simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
print(count) # 3

pad array with a zero on both sides with np.concatenate
find where zero with a == 0
find boundaries with np.diff
sum up boundaries found with sum
divide by two because we will have found twice as many as we want
def nonzero_clusters(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
demonstration
nonzero_clusters(
[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)
3
nonzero_clusters([0, 1, 2, 0, 1, 2])
2
nonzero_clusters([0, 1, 2, 0, 1, 2, 0])
2
nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])
3
timing
a = np.random.choice((0, 1), 100000)
code
from itertools import groupby
def div(a):
m = a != 0
return (m[1:] > m[:-1]).sum() + m[0]
def pir(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
def jean(a):
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
return count
def moin(a):
return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
def user(a):
return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

Related

Neater strategy to drop down the numbers in the 2D list

I have a problem. It is a 2D list of non-negative integers will be given like
0, 0, 2, 0, 1
0, 2, 1, 1, 0
3, 0, 2, 1, 0
0, 0, 0, 0, 0
I have to drop the numbers, number columns. e.g. drop down the 1's down 1 column, the 2's down 2 columns, the 3's down 3 columns, and so on. If the number can't be moved down enough, wrap it around the top. (e. g If there is a 3 in the second-to-last row, it should wrap around to the first row.) If two numbers map to the same slot, the biggest number takes that slot.
After this transformation the given matrix above will end up like:
0, 0, 2, 0, 0
3, 0, 0, 0, 1
0, 0, 2, 1, 0
0, 2, 0, 1, 0
Here's my trivial solution to the problem (Assumes a list l is pre-set):
new = [[0] * len(l[0]) for _ in range(len(l))]
idx = sorted([((n + x) % len(l), m, x) for n, y in enumerate(l) for m, x in enumerate(y)], key=lambda e: e[2])
for x, y, z in idx:
new[x][y] = z
print(new)
The strategy is:
Build a list new with 0s of the shape of l
Save the new indices of each number in l and each number as tuple pairs in idx
Sort idx by each number
Assign indices from idx to the respective numbers to new list
Print new
I am not satisfied with this strategy. Is there a neater/better way to do this? I can use numpy.

Let's say you have
a = np.array([
[0,0,2,0,1],
[0,2,1,1,0],
[3,0,2,1,0],
[0,0,0,0,0]])
You can get the locations of the elements with np.where or np.nonzero:
r, c = np.nonzero(a)
And the elements themselves with the index:
v = a[r, c]
Incrementing the row is simple now:
new_r = (r + v) % a.shape[0]
To settle collisions, sort the arrays so that large values come last:
i = v.argsort()
Now you can assign to a fresh matrix of zeros directly:
result = np.zeros_like(a)
result[new_r[i], c[i]] = v[i]
The result is
[[0 0 2 0 0]
[3 0 0 0 1]
[0 0 2 1 0]
[0 2 0 1 0]]

I suggest doing it like this if only because it's more readable :-
L = [[0, 0, 2, 0, 1],
[0, 2, 1, 1, 0],
[3, 0, 2, 1, 0],
[0, 0, 0, 0, 0]]
R = len(L)
NL = [[0]*len(L[0]) for _ in range(R)]
for i, r in enumerate(L):
for j, c in enumerate(r):
_r = (c + i) % R
if c > NL[_r][j]:
NL[_r][j] = c
print(NL)

Numpy: Optimal way to count indexs occurrence in an array

I have an array indexs. It's very long (>10k), and each int value is rather small (<100). e.g.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known
Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times...), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:
UPDATE: _test4 is #Ch3steR's sol:
indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9
def _test1():
counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
for ind in indexs:
counts[ind] += 1
return counts
def _test2():
counts = np.zeros((indexs_max + 1,), dtype=np.int32)
uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
counts[uniq_vals] = uniq_cnts
# this is because some value in range may be missing
return counts
def _test3():
therange = np.arange(0, indexs_max + 1)
counts = np.sum(indexs[None] == therange[:, None], axis=1)
return counts
def _test4():
return np.bincount(indexs, minlength=indexs_max+1)
Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. Although _test3 is the fastest, it uses additional big memory.
So I'm asking for any better methods. Thank u :) (#Ch3steR)
UPDATE: np.bincount seems optimal so far.

You can use np.bincount to count the occurrences in an array.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2, 1, 1, 1])
# 0's 1's 2's 3's 4's count
There's a caveat to it np.bincount(x).size == np.amax(x)+1
Example:
indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
# 5's 10's count
Here's it would count occurrences of 0 to the max in the array, a workaround can be
c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1, 1])
# 5's 10's count
If you have no missing values from i.e from 0 to your_max you can use np.bincount.
Another caveat:
From docs:
Count the number of occurrences of each value in an array of non-negative ints.

Dynamic way to compute linear constraints with multiple operators

Imagine a matrix A having one column with a lot of inequality/equality operators (≥, = ≤) and a vector b, where the number of rows in A is equal the number of elements in b. Then one row, in my setting would be computed by, e.g
dot(A[0, 1:], x) ≥ b[0]
where x is some vector, column A[,0] represents all operators and we'd know that for row 0 we were suppose to calculate using ≥ operator (e.i. A[0,0] == "≥" is true). Now, is there a way for dynamically calculate all rows in following so far imaginary way
dot(A[, 1:], x) A[, 0] b
My hope was for a dynamic evaluation of each row where we evaluate which operator is used for each row.
Example, let
A = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = [0, 1, 1]
and x be some given vector, e.g. x = [1,1,0] we wish to compute as following
A[,1:] x A[,0] b
dot([-2, 1, 1], [1, 1, 0]) >= 0
dot([0, 1, 0], [1, 1, 0]) >= 1
dot([0, 1, 1], [1, 1, 0]) == 1
The output would be [False, True, True]

If I understand correctly, this is a way to do that operation:
import numpy as np
# Input data
a = [
[">=", -2, 1, 1],
[">=", 0, 1, 0],
["==", 0, 1, 1]
]
b = np.array([0, 1, 1])
x = np.array([1, 1, 0])
# Split in comparison and data
a0 = np.array([lst[0] for lst in a])
a1 = np.array([lst[1:] for lst in a])
# Compute dot product
c = a1 # x
# Compute comparisons
leq = c <= b
eq = c == b
geq = c >= b
# Find comparison index for each row
cmps = np.array(["<=", "==", ">="]) # This array is lex sorted
cmp_idx = np.searchsorted(cmps, a0)
# Select the right result for each row
result = np.choose(cmp_idx, [leq, eq, geq])
# Convert to numeric type if preferred
result = result.astype(np.int32)
print(result)
# [0 1 1]

Python - Convert the array in a tuple to just a normal array

I have a signal where I want to find the average height of the values. This is done by finding the zero crossings and calculating the max and min between each zero crossing, then averaging these values.
My problem occurs when I want to use np.where() to find where the signal is crossing zero. When I use np.where() I get the result in a tuple, but I want it in an array where I can count the amount of times zero is crossed.
I am new to Python and coming from Matlab it is a bit confusing with all the different classes. As you can see, I get an error because nu = len(zero_u) gives 1 as a result, because the whole array is written in a tuple as one element.
Any ideas how to go around this?
The code looks like this:
import numpy as np
def averageheight(f):
rms = np.std(f)
f = f + (rms * 10**-6)
# Find zero crossing
fsign = np.sign(f)
fdiff = np.diff(fsign)
zero_u = np.asarray(np.where(fdiff > 0)) + 1
zero_d = np.asarray(np.where(fdiff < 0)) + 1
nu = len(zero_u)
nd = len(zero_d)
value_max = np.zeros((nu, 1))
value_min = np.zeros((nu, 1))
imaxvec = np.zeros((nu, 1))
iminvec = np.zeros((nu, 1))
if (nu > 2) and (nd > 2):
if zero_u[0] > zero_d[0]:
zero_d[0] = []
nu = len(zero_u)
nd = len(zero_d)
ncross = np.fmin(nu, nd)
# Find Maxima:
for ic in range(0, ncross - 1):
up = int(zero_u[ic])
down = int(zero_d[ic])
fvec = f[up:down]
value_max[ic] = np.amax(fvec)
index_max = value_max.argmax()
imaxvec[ic] = up + index_max - 1
# Find Minima:
for ic in range(0, ncross - 2):
down = int(zero_d[ic])
up = int(zero_u[ic+1])
fvec = f[down:up]
value_min[ic] = np.amin(fvec)
index_min = value_min.argmin()
iminvec[ic] = down + index_min - 1
# Remove spurious values, bumps and zero_d
thr = rms/3
maxfind = np.where(value_max < thr)
for i in range(0, len(maxfind)):
imaxfind = np.where(value_max == maxfind[i])
imaxvec[imaxfind] = 0
value_max[imaxfind] = 0
minfind = np.where(value_min > -thr)
for j in range(0, len(minfind)):
iminfind = np.where(value_min == minfind[j])
value_min[iminfind] = 0
iminvec[iminfind] = 0
# Find Average Height
avh = np.mean(value_max) - np.mean(value_min)
else:
avh = 0
return avh

np.where, and np.nonzero even more so, clearly explains that it returns a tuple, with one array for each dimension of the condition array:
In [71]: arr = np.random.randint(-5,5,10)
In [72]: arr
Out[72]: array([ 3, 4, 2, -3, -1, 0, -5, 4, 2, -3])
In [73]: arr.shape
Out[73]: (10,)
In [74]: np.where(arr>=0)
Out[74]: (array([0, 1, 2, 5, 7, 8]),)
In [75]: arr[_]
Out[75]: array([3, 4, 2, 0, 4, 2])
That Out[74] tuple can be used directly as an index.
You can also extract the array from the tuple:
In [76]: np.where(arr>=0)[0]
Out[76]: array([0, 1, 2, 5, 7, 8])
That, I think is a better choice than the np.asarray(np.where(...))
This convention for where becomes clearer when we use it on a 2d array
In [77]: arr2 = arr.reshape(2,5)
In [78]: np.where(arr2>=0)
Out[78]: (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 2, 3]))
In [79]: arr2[_]
Out[79]: array([3, 4, 2, 0, 4, 2])
Again we are indexing with a tuple. arr2[1,3] is really arr2[(1,3)]. The values in [] indexing brackets are actually passed to the indexing function as a tuple of values.
np.argwhere applies transpose to the result of where, producing an array:
In [80]: np.transpose(np.where(arr2>=0))
Out[80]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 2],
[1, 3]])
That's the same indexing arrays, but arranged in a 2d column matrix.
If you need the count of where without the actual values, a slightly faster function is
In [81]: np.count_nonzero(arr>=0)
Out[81]: 6
In fact np.nonzero uses the count to first determine the size of the arrays that it will return.

Count number of tails since the last head

Consider a sequence of coin tosses: 1, 0, 0, 1, 0, 1 where tail = 0 and head = 1.
The desired output is the sequence: 0, 1, 2, 0, 1, 0
Each element of the output sequence counts the number of tails since the last head.
I have tried a naive method:
def timer(seq):
if seq[0] == 1: time = [0]
if seq[0] == 0: time = [1]
for x in seq[1:]:
if x == 0: time.append(time[-1] + 1)
if x == 1: time.append(0)
return time
Question: Is there a better method?

Using NumPy:
import numpy as np
seq = np.array([1,0,0,1,0,1,0,0,0,0,1,0])
arr = np.arange(len(seq))
result = arr - np.maximum.accumulate(arr * seq)
print(result)
yields
[0 1 2 0 1 0 1 2 3 4 0 1]
Why arr - np.maximum.accumulate(arr * seq)? The desired output seemed related to a simple progression of integers:
arr = np.arange(len(seq))
So the natural question is, if seq = np.array([1, 0, 0, 1, 0, 1]) and the expected result is expected = np.array([0, 1, 2, 0, 1, 0]), then what value of x makes
arr + x = expected
Since
In [220]: expected - arr
Out[220]: array([ 0, 0, 0, -3, -3, -5])
it looks like x should be the cumulative max of arr * seq:
In [234]: arr * seq
Out[234]: array([0, 0, 0, 3, 0, 5])
In [235]: np.maximum.accumulate(arr * seq)
Out[235]: array([0, 0, 0, 3, 3, 5])

Step 1: Invert l:
In [311]: l = [1, 0, 0, 1, 0, 1]
In [312]: out = [int(not i) for i in l]; out
Out[312]: [0, 1, 1, 0, 1, 0]
Step 2: List comp; add previous value to current value if current value is 1.
In [319]: [out[0]] + [x + y if y else y for x, y in zip(out[:-1], out[1:])]
Out[319]: [0, 1, 2, 0, 1, 0]
This gets rid of windy ifs by zipping adjacent elements.

Using itertools.accumulate:
>>> a = [1, 0, 0, 1, 0, 1]
>>> b = [1 - x for x in a]
>>> list(accumulate(b, lambda total,e: total+1 if e==1 else 0))
[0, 1, 2, 0, 1, 0]
accumulate is only defined in Python 3. There's the equivalent Python code in the above documentation, though, if you want to use it in Python 2.
It's required to invert a because the first element returned by accumulate is the first list element, independently from the accumulator function:
>>> list(accumulate(a, lambda total,e: 0))
[1, 0, 0, 0, 0, 0]

The required output is an array with the same length as the input and none of the values are equal to the input. Therefore, the algorithm must be at least O(n) to form the new output array. Furthermore for this specific problem, you would also need to scan all the values for the input array. All these operations are O(n) and it will not get any more efficient. Constants may differ but your method is already in O(n) and will not go any lower.

Using reduce:
time = reduce(lambda l, r: l + [(l[-1]+1)*(not r)], seq, [0])[1:]

I try to be clear in the following code and differ from the original in using an explicit accumulator.
>>> s = [1,0,0,1,0,1,0,0,0,0,1,0]
>>> def zero_run_length_or_zero(seq):
"Return the run length of zeroes so far in the sequnece or zero"
accumulator, answer = 0, []
for item in seq:
accumulator = 0 if item == 1 else accumulator + 1
answer.append(accumulator)
return answer
>>> zero_run_length_or_zero(s)
[0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 0, 1]
>>>

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Count number of clusters of non-zero values in Python? - python

You may achieve it via using itertools.groupby() with list comprehension expression as: >>> from itertools import groupby >>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true]) 3

sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

Related

Neater strategy to drop down the numbers in the 2D list

Numpy: Optimal way to count indexs occurrence in an array

Dynamic way to compute linear constraints with multiple operators

Python - Convert the array in a tuple to just a normal array

Count number of tails since the last head

Categories

Resources