Vectorizing calculation in matrix with interdependent values - python

I am tracking multiple discrete time-series at multiple temporal resolutions, resulting in an SxRxB matrix where S is the number of time-series, R is the number of different resolutions and B is the buffer, i.e. how many values each series remembers. Each series is discrete and uses a limited range of natural numbers to represent its values. I will call these "symbols" here.
For each series I want to calculate how often any of the previous measurement's symbols directly precedes any of the current measurement's symbols, over all measurements. I have solved this with a for-loop as seen below, but would like to vectorize it for obvious reasons.
I'm not sure if my way of structuring data is efficient, so I'm open for suggestions there. Especially the ratios matrix could be done differently I think.
Thanks in advance!
def supports_loop(data, num_series, resolutions, buffer_size, vocab_size):
# For small test matrices we can calculate the complete matrix without problems
indices = []
indices.append(xrange(num_series))
indices.append(xrange(vocab_size))
indices.append(xrange(num_series))
indices.append(xrange(vocab_size))
indices.append(xrange(resolutions))
# This is huge! :/
# dimensions:
# series and value for which we calculate,
# series and value which precedes that measurement,
# resolution
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for idx in itertools.product(*indices):
s0, v0 = idx[0],idx[1] # the series and symbol for which we calculate
s1, v1 = idx[2],idx[3] # the series and symbol which should precede the we're calculating for
res = idx[4]
# Find the positions where s0==v0
found0 = np.where(data[s0, res, :] == v0)[0]
if found0.size == 0:
continue
#print('found {}={} at {}'.format(s0, v0, found0))
# Check how often s1==v1 right before s0==v0
candidates = (s1, res, (found0 - 1 + buffer_size) % buffer_size)
found01 = np.count_nonzero(data[candidates] == v1)
if found01 == 0:
continue
print('found {}={} following {}={} at {}'.format(s0, v0, s1, v1, found01))
# total01 = number of positions where either s0 or s1 is defined (i.e. >=0)
total01 = len(np.argwhere((data[s0, res, :] >= 0) & (data[s1, res, :] >= 0)))
ratio = (float(found01) / total01) if total01 > 0 else 0.0
ratios[idx] = ratio
return ratios
def stackoverflow_example(fnc):
data = np.array([
[[0, 0, 1], # series 0, resolution 0
[1, 3, 2]], # series 0, resolution 1
[[2, 1, 2], # series 1, resolution 0
[3, 3, 3]], # series 1, resoltuion 1
])
num_series = data.shape[0]
resolutions = data.shape[1]
buffer_size = data.shape[2]
vocab_size = np.max(data)+1
ratios = fnc(data, num_series, resolutions, buffer_size, vocab_size)
coordinates = np.argwhere(ratios > 0.0)
nz_values = ratios[ratios > 0.0]
print(np.hstack((coordinates, nz_values[:,None])))
print('0/0 precedes 0/0 in 1 out of 3 cases: {}'.format(np.isclose(ratios[0,0,0,0,0], 1.0/3.0)))
print('1/2 precedes 0/0 in 2 out of 3 cases: {}'.format(np.isclose(ratios[0,0,1,2,0], 2.0/3.0)))
Expected output (21 pairs, 5 columns for coordinates, followed by found count):
[[0 0 0 0 0 1]
[0 0 0 1 0 1]
[0 0 1 2 0 2]
[0 1 0 0 0 1]
[0 1 0 2 1 1]
[0 1 1 1 0 1]
[0 1 1 3 1 1]
[0 2 0 3 1 1]
[0 2 1 3 1 1]
[0 3 0 1 1 1]
[0 3 1 3 1 1]
[1 1 0 0 0 1]
[1 1 1 2 0 1]
[1 2 0 0 0 1]
[1 2 0 1 0 1]
[1 2 1 1 0 1]
[1 2 1 2 0 1]
[1 3 0 1 1 1]
[1 3 0 2 1 1]
[1 3 0 3 1 1]
[1 3 1 3 1 3]]
In the example above the 0 in series 0 follows a 2 in series 1 in two out of three cases (since the buffers are circular), so the ratio at [0, 0, 1, 2, 0] will be ~0.6666. Also series 0, value 0 follows itself in one out of three cases, so the ratio at [0, 0, 0, 0, 0] will be ~0.3333. There are some others which are >0.0 as well.
I am testing each answer on two datasets: a tiny one (as shown above) and a more realistic one (100 series, 5 resolutions, 10 values per series, 50 symbols).
Results
Answer Time (tiny) Time (huge) All pairs found (tiny=21)
-----------------------------------------------------------------------
Baseline ~1ms ~675s (!) Yes
Saedeas ~0.13ms ~1.4ms No (!)
Saedeas2 ~0.20ms ~4.0ms Yes, +cross resolutions
Elliot_1 ~0.70ms ~100s (!) Yes
Elliot_2 ~1ms ~21s (!) Yes
Kuppern_1 ~0.39ms ~2.4s (!) Yes
Kuppern_2 ~0.18ms ~28ms Yes
Kuppern_3 ~0.19ms ~24ms Yes
David ~0.21ms ~27ms Yes
Saedeas 2nd approach is the clear winner! Thank you so much, all of you :)

To start, you're doing yourself a bit of a disservice by not explicitly nesting the for loops. You wind up repeating a lot of effort and not saving anything in terms of memory. When the loop is nested, you can move some of the computations from one level to another and figure out which inner loops can be vectorized over.
def supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size):
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for res in xrange(resolutions):
for s0 in xrange(num_series):
# Find the positions where s0==v0
for v0 in np.unique(data[s0, res]):
# only need to find indices once for each series and value
found0 = np.where(data[s0, res, :] == v0)[0]
for s1 in xrange(num_series):
# Check how often s1==v1 right before s0==v0
candidates = (s1, res, (found0 - 1 + buffer_size) % buffer_size)
total01 = np.logical_or(data[s0, res, :] >= 0, data[s1, res, :] >= 0).sum()
# can skip inner loops if there are no candidates
if total01 == 0:
continue
for v1 in xrange(vocab_size):
found01 = np.count_nonzero(data[candidates] == v1)
if found01 == 0:
continue
ratio = (float(found01) / total01)
ratios[(s0, v0, s1, v1, res)] = ratio
return ratios
You'll see in the timings that the majority of the speed pickup comes from not duplicating effort.
Once you've made the nested structure, you can start looking at vectorizations and other optimizations.
def supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size):
# For small test matrices we can calculate the complete matrix without problems
# This is huge! :/
# dimensions:
# series and value for which we calculate,
# series and value which precedes that measurement,
# resolution
ratios = np.full((num_series, vocab_size, num_series, vocab_size, resolutions), 0.0)
for res in xrange(resolutions):
for s0 in xrange(num_series):
# find the counts where either s0 or s1 are present
total01 = np.logical_or(data[s0, res] >= 0,
data[:, res] >= 0).sum(axis=1)
s1s = np.where(total01)[0]
# Find the positions where s0==v0
v0s, counts = np.unique(data[s0, res], return_counts=True)
# sorting before searching will show gains as the datasets
# get larger
indarr = np.argsort(data[s0, res])
i0 = 0
for v0, count in itertools.izip(v0s, counts):
found0 = indarr[i0:i0+count]
i0 += count
for s1 in s1s:
candidates = data[(s1, res, (found0 - 1) % buffer_size)]
# can replace the innermost loop with numpy functions
v1s, counts = np.unique(candidates, return_counts=True)
ratios[s0, v0, s1, v1s, res] = counts / total01[s1]
return ratios
Unfortunately I could only really vectorize over the innermost loop, and that only bought an additional 10% speedup. Outside of the innermost loop you can't guarantee that all the vectors are the same size, so you can't build an array.
In [121]: (np.all(supports_loop(data, num_series, resolutions, buffer_size, vocab_size) == supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size)))
Out[121]: True
In [122]: (np.all(supports_loop(data, num_series, resolutions, buffer_size, vocab_size) == supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size)))
Out[122]: True
In [123]: %timeit(supports_loop(data, num_series, resolutions, buffer_size, vocab_size))
2.29 ms ± 73.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [124]: %timeit(supports_5_loop(data, num_series, resolutions, buffer_size, vocab_size))
949 µs ± 5.37 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [125]: %timeit(supports_4_loop(data, num_series, resolutions, buffer_size, vocab_size))
843 µs ± 3.21 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

If I'm understanding your problem correctly, I think this bit of code will get you the symbol pairs you're looking for in a relatively quick, vectorized fashion.
import numpy as np
import time
from collections import Counter
series = 2
resolutions = 2
buffer_len = 3
symbols = range(3)
#mat = np.random.choice(symbols, size=(series, resolutions, buffer_len)).astype('uint8')
mat = np.array([
[[0, 0, 1], # series 0, resolution 0
[1, 3, 2]], # series 0, resolution 1
[[2, 1, 2], # series 1, resolution 0
[3, 3, 3]], # series 1, resoltuion 1
])
start = time.time()
index_mat = np.indices(mat.shape)
right_shift_indices = np.roll(index_mat, -1, axis=3)
mat_shifted = mat[right_shift_indices[0], right_shift_indices[1], right_shift_indices[2]]
# These construct all the pairs directly
first_series = np.repeat(range(series), series*resolutions*buffer_len)
second_series = np.tile(np.repeat(range(series), resolutions*buffer_len), series)
res_loop = np.tile(np.repeat(range(resolutions), buffer_len), series*series)
mat_unroll = np.repeat(mat, series, axis=0)
shift_unroll = np.tile(mat_shifted, series)
# Constructs the pairs
pairs = zip(np.ravel(first_series),
np.ravel(second_series),
np.ravel(res_loop),
np.ravel(mat_unroll),
np.ravel(shift_unroll))
pair_time = time.time() - start
results = Counter(pairs)
end = time.time() - start
print("Mat: {}").format(mat)
print("Pairs: {}").format(results)
print("Number of Pairs: {}".format(len(pairs)))
print("Pair time is: {}".format(pair_time))
print("Count time is: {}".format(end-pair_time))
print("Total time is: {}".format(end))
The basic idea was to circularly shift each buffer by the appropriate amount depending on which time series it was (I think this is what your current code was doing). I can then generate all the symbol pairs by simply zipping lists offset by 1 together along the series axis.
Example output:
Mat: [[[0 0 1]
[1 3 2]]
[[2 1 2]
[3 3 3]]]
Pairs: Counter({(1, 1, 1, 3, 3): 3, (1, 0, 0, 2, 0): 2, (0, 0, 0, 0, 0): 1, (1, 1, 0, 2, 2): 1, (1, 1, 0, 2, 1): 1, (0, 1, 0, 0, 2): 1, (1, 0, 1, 3, 3): 1, (0, 0, 1, 1, 3): 1, (0, 0, 1, 3, 2): 1, (1, 0, 0, 1, 1): 1, (0, 1, 0, 0, 1): 1, (0, 1, 1, 2, 3): 1, (0, 1, 0, 1, 2): 1, (1, 1, 0, 1, 2): 1, (0, 1, 1, 3, 3): 1, (1, 0, 1, 3, 2): 1, (0, 0, 0, 0, 1): 1, (0, 1, 1, 1, 3): 1, (0, 0, 1, 2, 1): 1, (0, 0, 0, 1, 0): 1, (1, 0, 1, 3, 1): 1})
Number of Pairs: 24
Pair time is: 0.000135183334351
Count time is: 5.10215759277e-05
Total time is: 0.000186204910278
Edit: True final attempt. Fully vectorized.

A trick that makes this vectorizable is to make an array of comb[i] = buffer1[i]+buffer2[i-1]*voc_size for each pair of series. Each combination then gets a unique value in the array. And one can find the combination by doing v1[i] = comb[i] % voc_size, v2[i] = comb[i]//voc_size. As long as the number of series is not very high (<10000 i think) there is no point in doing any further vectorisations.
def support_vectorized(data, num_series, resolutions, buffer_size, vocab_size):
ratios = np.zeros((num_series, vocab_size, num_series, vocab_size, resolutions))
prev = np.roll(data, 1, axis=2) # Get previous values
prev *= vocab_size # To separate prev from data
for i, series in enumerate(data):
for j, prev_series in enumerate(prev):
comb = series + prev_series
for k, buffer in enumerate(comb):
idx, counts = np.unique(buffer, return_counts=True)
v = idx % vocab_size
v2 = idx // vocab_size
ratios[i, v, j, v2, k] = counts/buffer_size
return ratios
If however S or R is large, a full vectorization is possible but this uses a lot of memory:
def row_unique(comb):
comb.sort(axis=-1)
changes = np.concatenate((
np.ones((comb.shape[0], comb.shape[1], comb.shape[2], 1), dtype="bool"),
comb[:, :,:, 1:] != comb[:, :, :, :-1]), axis=-1)
vals = comb[changes]
idxs = np.nonzero(changes)
tmp = np.hstack((idxs[-1], 0))
counts = np.where(tmp[1:], np.diff(tmp), comb.shape[-1]-tmp[:-1])
return idxs, vals, counts
def supports_full_vectorized(data, num_series, resolutions, buffer_size, vocab_size):
ratios = np.zeros((num_series, vocab_size, num_series, vocab_size, resolutions))
prev = np.roll(data, 1, axis=2)*vocab_size
comb = data + prev[:, None] # Create every combination
idxs, vals, counts = row_unique(comb) # Get unique values and counts for each row
ratios[idxs[1], vals % vocab_size, idxs[0], vals // vocab_size, idxs[2]] = counts/buffer_size
return ratios
However, for S=100 this is slower than the previos solution. A middle ground is to keep a for loop over the series too reduce the memory usage:
def row_unique2(comb):
comb.sort(axis=-1)
changes = np.concatenate((
np.ones((comb.shape[0], comb.shape[1], 1), dtype="bool"),
comb[:, :, 1:] != comb[:, :, :-1]), axis=-1)
vals = comb[changes]
idxs = np.nonzero(changes)
tmp = np.hstack((idxs[-1], 0))
counts = np.where(tmp[1:], np.diff(tmp), comb.shape[-1]-tmp[:-1])
return idxs, vals, counts
def supports_half_vectorized(data, num_series, resolutions, buffer_size, vocab_size):
prev = np.roll(data, 1, axis=2)*vocab_size
ratios = np.zeros((num_series, vocab_size, num_series, vocab_size, resolutions))
for i, series in enumerate(data):
comb = series + prev
idxs, vals, counts = row_unique2(comb)
ratios[i, vals % vocab_size, idxs[0], vals // vocab_size, idxs[1]] = counts/buffer_size
return ratios
The running times for the different solutions show that support_half_vectorized is the fastest
In [41]: S, R, B, voc_size = (100, 5, 1000, 29)
In [42]: data = np.random.randint(voc_size, size=S*R*B).reshape((S, R, B))
In [43]: %timeit support_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 4.84 s per loop
In [44]: %timeit supports_full_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 5.3 s per loop
In [45]: %timeit supports_half_vectorized(data, S, R, B, voc_size)
1 loop, best of 3: 4.36 s per loop
In [46]: %timeit supports_4_loop(data, S, R, B, voc_size)
1 loop, best of 3: 36.7 s per loop

So this is kind of a cop out answer, but I've been working with #Saedeas's answer and based on timings on my machine have been able to optimize it slightly. I do believe that there is a way to do this without the loop, but the size of the intermediate array may be prohibitive.
The changes I have made have been to remove the concatenation that happens at the end of the run() function. This was creating a new array and is unnecessary. Instead we create the full size array at the beginning and just dont use the last row until the end.
Another change I have made is that the tiling of single was slightly inefficient. I have replaced this with very slightly faster code.
I do believe that this can be made faster, but would take some work. I was testing with larger sizes so please let me know what timings you get on your machine.
Code is below;
import numpy as np
import logging
import sys
import time
import itertools
import timeit
logging.basicConfig(stream=sys.stdout,
level=logging.DEBUG,
format='%(message)s')
def run():
series = 2
resolutions = 2
buffer_len = 3
symbols = range(50)
#mat = np.random.choice(symbols, size=(series, resolutions, buffer_len))
mat = np.array([
[[0, 0, 1], # series 0, resolution 0
[1, 3, 2]], # series 0, resolution 1
[[2, 1, 2], # series 1, resolution 0
[3, 3, 3]], # series 1, resoltuion 1
# [[4, 5, 6, 10],
# [7, 8, 9, 11]],
])
# logging.debug("Original:")
# logging.debug(mat)
start = time.time()
index_mat = np.indices((series, resolutions, buffer_len))
# This loop shifts all series but the one being looked at, and zips the
# element being looked at with every other member of that row
cross_pairs = np.empty((series, resolutions, buffer_len, series, 2), int)
#cross_pairs = []
right_shift_indices = [index_mat[0], index_mat[1], (index_mat[2] - 1) % buffer_len]
for i in range(series):
right_shift_indices[2][i] = (right_shift_indices[2][i] + 1) % buffer_len
# create a new matrix from the modified indices
mat_shifted = mat[right_shift_indices]
mat_shifted_t = mat_shifted.T.reshape(-1, series)
single = mat_shifted_t[:, i]
#print np.tile(single,(series-1,1)).T
#print single.reshape(-1,1).repeat(series-1,1)
#print single.repeat(series-1).reshape(-1,series-1)
mat_shifted_t = np.delete(mat_shifted_t, i, axis=1)
#cross_pairs[i,:,:,:-1] = (np.dstack((np.tile(single, (mat_shifted_t.shape[1], 1)).T, mat_shifted_t))).reshape(resolutions, buffer_len, (series-1), 2, order='F')
#cross_pairs[i,:,:,:-1] = (np.dstack((single.reshape(-1,1).repeat(series-1,1), mat_shifted_t))).reshape(resolutions, buffer_len, (series-1), 2, order='F')
cross_pairs[i,:,:,:-1] = np.dstack((single.repeat(series-1).reshape(-1,series-1), mat_shifted_t)).reshape(resolutions, buffer_len, (series-1), 2, order='F')
right_shift_indices[2][i] = (right_shift_indices[2][i] - 1) % buffer_len
#cross_pairs.extend([zip(itertools.repeat(x[i]), np.append(x[:i], x[i+1:])) for x in mat_shifted_t])
#consecutive_pairs = np.empty((series, resolutions, buffer_len, 2, 2), int)
#print "1", consecutive_pairs.shape
# tedious code to put this stuff in the right shape
in_series_zips = np.stack([mat[:, :, :-1], mat[:, :, 1:]], axis=3)
circular_in_series_zips = np.stack([mat[:, :, -1], mat[:, :, 0]], axis=2)
# This creates the final array.
# Index 0 is the preceding series
# Index 1 is the resolution
# Index 2 is the location in the buffer
# Index 3 is for the first n-1 elements, the following series, and for the last element
# it's the next element of the Index 0 series
# Index 4 is the index into the two element pair
cross_pairs[:,:,:-1,-1] = in_series_zips
cross_pairs[:,:,-1,-1] = circular_in_series_zips
end = time.time()
#logging.debug("Pairs encountered:")
#logging.debug(pairs)
logging.info("Elapsed: {}".format(end - start))
if __name__ == '__main__':
run()

Related

Numpy: Optimal way to count indexs occurrence in an array

I have an array indexs. It's very long (>10k), and each int value is rather small (<100). e.g.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known
Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times...), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:
UPDATE: _test4 is #Ch3steR's sol:
indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9
def _test1():
counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
for ind in indexs:
counts[ind] += 1
return counts
def _test2():
counts = np.zeros((indexs_max + 1,), dtype=np.int32)
uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
counts[uniq_vals] = uniq_cnts
# this is because some value in range may be missing
return counts
def _test3():
therange = np.arange(0, indexs_max + 1)
counts = np.sum(indexs[None] == therange[:, None], axis=1)
return counts
def _test4():
return np.bincount(indexs, minlength=indexs_max+1)
Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. Although _test3 is the fastest, it uses additional big memory.
So I'm asking for any better methods. Thank u :) (#Ch3steR)
UPDATE: np.bincount seems optimal so far.
You can use np.bincount to count the occurrences in an array.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2, 1, 1, 1])
# 0's 1's 2's 3's 4's count
There's a caveat to it np.bincount(x).size == np.amax(x)+1
Example:
indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
# 5's 10's count
Here's it would count occurrences of 0 to the max in the array, a workaround can be
c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1, 1])
# 5's 10's count
If you have no missing values from i.e from 0 to your_max you can use np.bincount.
Another caveat:
From docs:
Count the number of occurrences of each value in an array of non-negative ints.

Slice multiple frame of numpy array with multiple y1:y2, x1:x2

I have an numpy array of multiple frame (multiple_frames) and I want to slice height and width of each frame with different y1,y2,x1,x2 to draw a square of "1" in each frames.
(slice_yyxx) is a numpy array and contain one array of y1,y2,x1,x2 for each frame.
slice_yyxx = np.array(slice_yyxx).astype(int)
nbr_frame = slice_yyxx.shape[0]
multiple_frames = np.zeros(shape=(nbr_frame, target_shape[0], target_shape[1], target_shape[2]))
print(multiple_frames.shape)
# (5, 384, 640, 1)
print(slice_yyxx)
# Value ok
print(slice_yyxx.shape)
# (5, 4)
# Then 5 array of coord like [y1, y2, x1, x2] for slice each frames
print(slice_yyxx.dtype)
# np.int64
multiple_frames[:, slice_yyxx[:,0]:slice_yyxx[:,1], slice_yyxx[:,2]:slice_yyxx[:,3]] = 1
# ERROR: TypeError: only integer scalar arrays can be converted to a scalar index
The real question here is how to convert arbitrary slices into something you can use across multiple dimensions without looping. I would posit that the trick is to use a clever combination of fancy indexing, arange, and repeat.
The goal is to create an array of row and column indices that corresponds to each dimension. Let's take a simple case that is easy to visualize: a 3-frame set of 3x3 matrices, where we want to assign to the upper left and lower right 2x2 sub-arrays to the first two frames, and the entire thing to the last frame:
multi_array = np.zeros((3, 3, 3))
slice_rrcc = np.array([[0, 2, 0, 2], [1, 3, 1, 3], [0, 3, 0, 3]])
Let's come up with the indices that match each one, as well as the sizes and shapes:
nframes = slice_rrcc.shape[0] # 3
nrows = np.diff(slice_rrcc[:, :2], axis=1).ravel() # [2, 2, 3]
ncols = np.diff(slice_rrcc[:, 2:], axis=1).ravel() # [2, 2, 3]
sizes = nrows * ncols # [4, 4, 9]
We need the following fancy indices to be able to do the assignment:
frame_index = np.array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2])
row_index = np.array([0, 0, 1, 1, 1, 1, 2, 2, 0, 0, 0, 1, 1, 1, 2, 2, 2])
col_index = np.array([0, 1, 0, 1, 1, 2, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2])
If we can obtain the arrays frame_index, row_index, and col_index, we can set the data for each segment as follows:
multi_array[frame_index, row_index, col_index] = 1
frame_index index is easy to obtain:
frame_index = np.repeat(np.arange(nframes), sizes)
row_index takes a bit more work. You need to generate a set of nrows indices for each individual frame, and repeat them ncols times. You can do this by generating a continuous range and restarting the count at each frame using subtraction:
row_range = np.arange(nrows.sum())
row_offsets = np.zeros_like(row_range)
row_offsets[np.cumsum(nrows[:-1])] = nrows[:-1]
row_index = row_range - np.cumsum(row_offsets) + np.repeat(slice_rrcc[:, 0], nrows)
segments = np.repeat(ncols, nrows)
row_index = np.repeat(row_index, segments)
col_index will be less trivial still. You need to generate a sequence for each row with the right offset, and repeat it in chunks for each row, and then for each frame. The approach is similar to that for row_index, with an additional fancy index to get the order right:
col_index_index = np.arange(sizes.sum())
col_index_resets = np.cumsum(segments[:-1])
col_index_offsets = np.zeros_like(col_index_index)
col_index_offsets[col_index_resets] = segments[:-1]
col_index_offsets[np.cumsum(sizes[:-1])] -= ncols[:-1]
col_index_index -= np.cumsum(col_index_offsets)
col_range = np.arange(ncols.sum())
col_offsets = np.zeros_like(col_range)
col_offsets[np.cumsum(ncols[:-1])] = ncols[:-1]
col_index = col_range - np.cumsum(col_offsets) + np.repeat(slice_rrcc[:, 2], ncols)
col_index = col_index[col_index_index]
Using this formulation, you can even step it up and specify a different value for each frame. If you wanted to assign values = [1, 2, 3] to the frames in my example, just do
multi_array[frame_index, row_index, col_index] = np.repeat(values, sizes)
We'll see if there is a more efficient way to do this. One part I asked about is here.
Benchmark
A comparison of your loop vs my vectorized solution for nframes in {10, 100, 1000} and width and height of multi_array in {100, 1000, 10000}:
def set_slices_loop(arr, slice_rrcc):
for a, s in zip(arr, slice_rrcc):
a[s[0]:s[1], s[2]:s[3]] = 1
np.random.seed(0xABCDEF)
for nframes in [10, 100, 1000]:
for dim in [10, 32, 100]:
print(f'Size = {nframes}x{dim}x{dim}')
arr = np.zeros((nframes, dim, dim), dtype=int)
slice = np.zeros((nframes, 4), dtype=int)
slice[:, ::2] = np.random.randint(0, dim - 1, size=(nframes, 2))
slice[:, 1::2] = np.random.randint(slice[:, ::2] + 1, dim, size=(nframes, 2))
%timeit set_slices_loop(arr, slice)
arr[:] = 0
%timeit set_slices(arr, slice)
The results are overwhelmingly in favor of the loop, with the only exception of very large numbers of frames and small frame sizes. Most "normal" cases are an order of magnitude faster with looping:
Looping
| Dimension |
| 100 | 1000 | 10000 |
--------+---------+---------+---------+
F 10 | 33.8 µs | 35.8 µs | 43.4 µs |
r -----+---------+---------+---------+
a 100 | 310 µs | 331 µs | 401 µs |
m -----+---------+---------+---------+
e 1000 | 3.09 ms | 3.31 ms | 4.27 ms |
--------+---------+---------+---------+
Vectorized
| Dimension |
| 100 | 1000 | 10000 |
--------+---------+---------+---------+
F 10 | 225 µs | 266 µs | 545 µs |
r -----+---------+---------+---------+
a 100 | 312 µs | 627 µs | 4.11 ms |
m -----+---------+---------+---------+
e 1000 | 1.07 ms | 4.63 ms | 48.5 ms |
--------+---------+---------+---------+
TL;DR
Can be done, but not recommended:
def set_slices(arr, slice_rrcc, value):
nframes = slice_rrcc.shape[0]
nrows = np.diff(slice_rrcc[:, :2], axis=1).ravel()
ncols = np.diff(slice_rrcc[:, 2:], axis=1).ravel()
sizes = nrows * ncols
segments = np.repeat(ncols, nrows)
frame_index = np.repeat(np.arange(nframes), sizes)
row_range = np.arange(nrows.sum())
row_offsets = np.zeros_like(row_range)
row_offsets[np.cumsum(nrows[:-1])] = nrows[:-1]
row_index = row_range - np.cumsum(row_offsets) + np.repeat(slice_rrcc[:, 0], nrows)
row_index = np.repeat(row_index, segments)
col_index_index = np.arange(sizes.sum())
col_index_resets = np.cumsum(segments[:-1])
col_index_offsets = np.zeros_like(col_index_index)
col_index_offsets[col_index_resets] = segments[:-1]
col_index_offsets[np.cumsum(sizes[:-1])] -= ncols[:-1]
col_index_index -= np.cumsum(col_index_offsets)
col_range = np.arange(ncols.sum())
col_offsets = np.zeros_like(col_range)
col_offsets[np.cumsum(ncols[:-1])] = ncols[:-1]
col_index = col_range - np.cumsum(col_offsets) + np.repeat(slice_rrcc[:, 2], ncols)
col_index = col_index[col_index_index]
if values.size == 1:
arr[frame_index, row_index, col_index] = value
else:
arr[frame_index, row_index, col_index] = np.repeat(values, sizes)
This is a benchmarking post using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.
We are benchmarking set_slices from #Mad Physicist's soln with arr[frame_index, row_index, col_index] = 1 and set_slices_loop without any changes to get runtime (sec).
np.random.seed(0xABCDEF)
in_ = {}
for nframes in [10, 100, 1000]:
for dim in [10, 32, 100]:
arr = np.zeros((nframes, dim, dim), dtype=int)
slice = np.zeros((nframes, 4), dtype=int)
slice[:, ::2] = np.random.randint(0, dim - 1, size=(nframes, 2))
slice[:, 1::2] = np.random.randint(slice[:, ::2] + 1, dim, size=(nframes, 2))
in_[(nframes, dim)] = [arr, slice]
import benchit
funcs = [set_slices, set_slices_loop]
t = benchit.timings(funcs, in_, input_name=['NumFrames', 'Dim'], multivar=True)
t.plot(sp_argID=1, logx=True, save='timings.png')

Find position of maximum per unique bin (binargmax)

Setup
Suppose I have
bins = np.array([0, 0, 1, 1, 2, 2, 2, 0, 1, 2])
vals = np.array([8, 7, 3, 4, 1, 2, 6, 5, 0, 9])
k = 3
I need the position of maximal values by unique bin in bins.
# Bin == 0
# ↓ ↓ ↓
# [0 0 1 1 2 2 2 0 1 2]
# [8 7 3 4 1 2 6 5 0 9]
# ↑ ↑ ↑
# ⇧
# [0 1 2 3 4 5 6 7 8 9]
# Maximum is 8 and happens at position 0
(vals * (bins == 0)).argmax()
0
# Bin == 1
# ↓ ↓ ↓
# [0 0 1 1 2 2 2 0 1 2]
# [8 7 3 4 1 2 6 5 0 9]
# ↑ ↑ ↑
# ⇧
# [0 1 2 3 4 5 6 7 8 9]
# Maximum is 4 and happens at position 3
(vals * (bins == 1)).argmax()
3
# Bin == 2
# ↓ ↓ ↓ ↓
# [0 0 1 1 2 2 2 0 1 2]
# [8 7 3 4 1 2 6 5 0 9]
# ↑ ↑ ↑ ↑
# ⇧
# [0 1 2 3 4 5 6 7 8 9]
# Maximum is 9 and happens at position 9
(vals * (bins == 2)).argmax()
9
Those functions are hacky and aren't even generalizable for negative values.
Question
How do I get all such values in the most efficient manner using Numpy?
What I've tried.
def binargmax(bins, vals, k):
out = -np.ones(k, np.int64)
trk = np.empty(k, vals.dtype)
trk.fill(np.nanmin(vals) - 1)
for i in range(len(bins)):
v = vals[i]
b = bins[i]
if v > trk[b]:
trk[b] = v
out[b] = i
return out
binargmax(bins, vals, k)
array([0, 3, 9])
LINK TO TESTING AND VALIDATION
The numpy_indexed library:
I know this isn't technically numpy, but the numpy_indexed library has a vectorized group_by function which is perfect for this, just wanted to share as an alternative I use frequently:
>>> import numpy_indexed as npi
>>> npi.group_by(bins).argmax(vals)
(array([0, 1, 2]), array([0, 3, 9], dtype=int64))
Using a simple pandas groupby and idxmax:
df = pd.DataFrame({'bins': bins, 'vals': vals})
df.groupby('bins').vals.idxmax()
Using a sparse.csr_matrix
This option is very fast on very large inputs.
sparse.csr_matrix(
(vals, bins, np.arange(vals.shape[0]+1)), (vals.shape[0], k)
).argmax(0)
# matrix([[0, 3, 9]])
Performance
Functions
def chris(bins, vals, k):
return npi.group_by(bins).argmax(vals)
def chris2(df):
return df.groupby('bins').vals.idxmax()
def chris3(bins, vals, k):
sparse.csr_matrix((vals, bins, np.arange(vals.shape[0] + 1)), (vals.shape[0], k)).argmax(0)
def divakar(bins, vals, k):
mx = vals.max()+1
sidx = bins.argsort()
sb = bins[sidx]
sm = np.r_[sb[:-1] != sb[1:],True]
argmax_out = np.argsort(bins*mx + vals)[sm]
max_out = vals[argmax_out]
return max_out, argmax_out
def divakar2(bins, vals, k):
last_idx = np.bincount(bins).cumsum()-1
scaled_vals = bins*(vals.max()+1) + vals
argmax_out = np.argsort(scaled_vals)[last_idx]
max_out = vals[argmax_out]
return max_out, argmax_out
def user545424(bins, vals, k):
return np.argmax(vals*(bins == np.arange(bins.max()+1)[:,np.newaxis]),axis=-1)
def user2699(bins, vals, k):
res = []
for v in np.unique(bins):
idx = (bins==v)
r = np.where(idx)[0][np.argmax(vals[idx])]
res.append(r)
return np.array(res)
def sacul(bins, vals, k):
return np.lexsort((vals, bins))[np.append(np.diff(np.sort(bins)), 1).astype(bool)]
#njit
def piRSquared(bins, vals, k):
out = -np.ones(k, np.int64)
trk = np.empty(k, vals.dtype)
trk.fill(np.nanmin(vals))
for i in range(len(bins)):
v = vals[i]
b = bins[i]
if v > trk[b]:
trk[b] = v
out[b] = i
return out
Setup
import numpy_indexed as npi
import numpy as np
import pandas as pd
from timeit import timeit
import matplotlib.pyplot as plt
from numba import njit
from scipy import sparse
res = pd.DataFrame(
index=['chris', 'chris2', 'chris3', 'divakar', 'divakar2', 'user545424', 'user2699', 'sacul', 'piRSquared'],
columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000, 500000],
dtype=float
)
k = 5
for f in res.index:
for c in res.columns:
bins = np.random.randint(0, k, c)
k = 5
vals = np.random.rand(c)
df = pd.DataFrame({'bins': bins, 'vals': vals})
stmt = '{}(df)'.format(f) if f in {'chris2'} else '{}(bins, vals, k)'.format(f)
setp = 'from __main__ import bins, vals, k, df, {}'.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
Results
Results with a much larger k (This is where broadcasting gets hit hard):
res = pd.DataFrame(
index=['chris', 'chris2', 'chris3', 'divakar', 'divakar2', 'user545424', 'user2699', 'sacul', 'piRSquared'],
columns=[10, 50, 100, 500, 1000, 5000, 10000, 50000, 100000, 500000],
dtype=float
)
k = 500
for f in res.index:
for c in res.columns:
bins = np.random.randint(0, k, c)
vals = np.random.rand(c)
df = pd.DataFrame({'bins': bins, 'vals': vals})
stmt = '{}(df)'.format(f) if f in {'chris2'} else '{}(bins, vals, k)'.format(f)
setp = 'from __main__ import bins, vals, df, k, {}'.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
As is apparent from the graphs, broadcasting is a nifty trick when the number of groups is small, however the time complexity/memory of broadcasting increases too fast at higher k values to make it highly performant.
Here's one way by offsetting each group data so that we could use argsort on the entire data in one go -
def binargmax_scale_sort(bins, vals):
w = np.bincount(bins)
valid_mask = w!=0
last_idx = w[valid_mask].cumsum()-1
scaled_vals = bins*(vals.max()+1) + vals
#unique_bins = np.flatnonzero(valid_mask) # if needed
return len(bins) -1 -np.argsort(scaled_vals[::-1], kind='mergesort')[last_idx]
Okay, here's my linear-time entry, using only indexing and np.(max|min)inum.at. It assumes bins go up from 0 to max(bins).
def via_at(bins, vals):
max_vals = np.full(bins.max()+1, -np.inf)
np.maximum.at(max_vals, bins, vals)
expanded = max_vals[bins]
max_idx = np.full_like(max_vals, np.inf)
np.minimum.at(max_idx, bins, np.where(vals == expanded, np.arange(len(bins)), np.inf))
return max_vals, max_idx
How about this:
>>> import numpy as np
>>> bins = np.array([0, 0, 1, 1, 2, 2, 2, 0, 1, 2])
>>> vals = np.array([8, 7, 3, 4, 1, 2, 6, 5, 0, 9])
>>> k = 3
>>> np.argmax(vals*(bins == np.arange(k)[:,np.newaxis]),axis=-1)
array([0, 3, 9])
If you're going for readability, this might not be the best solution, but I think it works
def binargsort(bins,vals):
s = np.lexsort((vals,bins))
s2 = np.sort(bins)
msk = np.roll(s2,-1) != s2
# or use this for msk, but not noticeably better for performance:
# msk = np.append(np.diff(np.sort(bins)),1).astype(bool)
return s[msk]
array([0, 3, 9])
Explanation:
lexsort sorts the indices of vals according to the sorted order of bins, then by the order of vals:
>>> np.lexsort((vals,bins))
array([7, 1, 0, 8, 2, 3, 4, 5, 6, 9])
So then you can mask that by where sorted bins differ from one index to the next:
>>> np.sort(bins)
array([0, 0, 0, 1, 1, 1, 2, 2, 2, 2])
# Find where sorted bins end, use that as your mask on the `lexsort`
>>> np.append(np.diff(np.sort(bins)),1)
array([0, 0, 1, 0, 0, 1, 0, 0, 0, 1])
>>> np.lexsort((vals,bins))[np.append(np.diff(np.sort(bins)),1).astype(bool)]
array([0, 3, 9])
This is a fun little problem to solve. My approach is to to get an index into vals based on the values in bins. Using where to get the points where the index is True in combination with argmax on those points in vals gives the resulting value.
def binargmaxA(bins, vals):
res = []
for v in unique(bins):
idx = (bins==v)
r = where(idx)[0][argmax(vals[idx])]
res.append(r)
return array(res)
It's possible to remove the call to unique by using range(k) to get possible bin values. This speeds things up, but still leaves it with poor performance as the size of k increases.
def binargmaxA2(bins, vals, k):
res = []
for v in range(k):
idx = (bins==v)
r = where(idx)[0][argmax(vals[idx])]
res.append(r)
return array(res)
Last try, comparing each value slows things down substantially. This version computes the sorted array of values, rather than making a comparison for each unique value. Well, it actually computes the sorted indices and only gets the sorted values when needed, as that avoids one time loading vals into memory. Performance still scales with the number of bins, but much slower than before.
def binargmaxB(bins, vals):
idx = argsort(bins) # Find sorted indices
split = r_[0, where(diff(bins[idx]))[0]+1, len(bins)] # Compute where values start in sorted array
newmax = [argmax(vals[idx[i1:i2]]) for i1, i2 in zip(split, split[1:])] # Find max for each value in sorted array
return idx[newmax +split[:-1]] # Convert to indices in unsorted array
Benchmarks
Here's some benchmarks with the other answers.
3000 elements
With a somewhat larger dataset (bins = randint(0, 30, 3000); vals = randn(3000); k=30;)
171us binargmax_scale_sort2 by Divakar
209us this answer, version B
281us binargmax_scale_sort by Divakar
329us broadcast version by user545424
399us this answer, version A
416us answer by sacul, using lexsort
899us reference code by piRsquared
30000 elements
And an even larger dataset (bins = randint(0, 30, 30000); vals = randn(30000); k=30). Surprisingly this doesn't change the relative performance between solutions.
1.27ms this answer, version B
2.01ms binargmax_scale_sort2 by Divakar
2.38ms broadcast version by user545424
2.68ms this answer, version A
5.71ms answer by sacul, using lexsort
9.12ms reference code by piRSquared
Edit I didn't change k with the increasing number of possible bin values, now that I've fixed that the benchmarks are more even.
1000 bin values
Increasing the number unique bin values may also have an impact on performance. The solutions by Divakar and sacul are mostly unaffected, while the others have quite a substantial impact.
bins = randint(0, 1000, 30000); vals = randn(30000); k = 1000
1.99ms binargmax_scale_sort2 by Divakar
3.48ms this answer, version B
6.15ms answer by sacul, using lexsort
10.6ms reference code by piRsquared
27.2ms this answer, version A
129ms broadcast version by user545424
Edit Including benchmarks for the reference code in the question, it's surprisingly competitive especially with more bins.
I know you said to use Numpy, but if Pandas is acceptable:
import numpy as np; import pandas as pd;
(pd.DataFrame(
{'bins':np.array([0, 0, 1, 1, 2, 2, 2, 0, 1, 2]),
'values':np.array([8, 7, 3, 4, 1, 2, 6, 5, 0, 9])})
.groupby('bins')
.idxmax())
values
bins
0 0
1 3
2 9

Compute the cumulative sum of a list until a zero appears

I have a (long) list in which zeros and ones appear at random:
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
I want to get the list_b
sum of the list up to where 0 appears
where 0 appears, retain 0 in the list
list_b = [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
I can implement this as follows:
list_b = []
for i, x in enumerate(list_a):
if x == 0:
list_b.append(x)
else:
sum_value = 0
for j in list_a[i::-1]:
if j != 0:
sum_value += j
else:
break
list_b.append(sum_value)
print(list_b)
but the actual list's length is very long.
So, I want to improve code for high speed. (if it is not readable)
I change the code like this:
from itertools import takewhile
list_c = [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]
print(list_c)
But it is not fast enough. How can I do it in more efficient way?
You're overthinking this.
Option 1
You can just iterate over the indices and update accordingly (computing the cumulative sum), based on whether the current value is 0 or not.
data = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
for i in range(1, len(data)):
if data[i]:
data[i] += data[i - 1]
That is, if the current element is non-zero, then update the element at the current index as the sum of the current value, plus the value at the previous index.
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
Note that this updates your list in place. You can create a copy in advance if you don't want that - new_data = data.copy() and iterate over new_data in the same manner.
Option 2
You can use the pandas API if you need performance. Find groups based on the placement of 0s, and use groupby + cumsum to compute group-wise cumulative sums, similar to above:
import pandas as pd
s = pd.Series(data)
data = s.groupby(s.eq(0).cumsum()).cumsum().tolist()
print(data)
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
Performance
First, the setup -
data = data * 100000
s = pd.Series(data)
Next,
%%timeit
new_data = data.copy()
for i in range(1, len(data)):
if new_data[i]:
new_data[i] += new_data[i - 1]
328 ms ± 4.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
And, timing the copy separately,
%timeit data.copy()
8.49 ms ± 17.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So, the copy doesn't really take much time. Finally,
%timeit s.groupby(s.eq(0).cumsum()).cumsum().tolist()
122 ms ± 1.69 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
The pandas approach is conceptually linear (just like the other approaches) but faster by a constant degree because of the implementation of the library.
If you want a compact native Python solution that is probably the most memory efficient, although not the fastest (see the comments), you could draw extensively from itertools:
>>> from itertools import groupby, accumulate, chain
>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
The steps here are: group the list into sublists based on presence of 0 (which is falsy), take the cumulative sum of the values within each sublist, flatten the sublists.
As Stefan Pochmann comments, if your list is binary in contents (like consisting of only 1s and 0s only) then you don't need to pass a key to groupby() at all and it will fall back on the identity function. This is ~30% faster than using bool for this case:
>>> list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a)))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
Personally I would prefer a simple generator like this:
def gen(lst):
cumulative = 0
for item in lst:
if item:
cumulative += item
else:
cumulative = 0
yield cumulative
Nothing magic (when you know how yield works), easy to read and should be rather fast.
If you need more performance you could even wrap this as Cython extension type (I'm using IPython here). Thereby you lose the "easy to understand" portion and it's requiring "heavy dependencies":
%load_ext cython
%%cython
cdef class Cumulative(object):
cdef object it
cdef object cumulative
def __init__(self, it):
self.it = iter(it)
self.cumulative = 0
def __iter__(self):
return self
def __next__(self):
cdef object nxt = next(self.it)
if nxt:
self.cumulative += nxt
else:
self.cumulative = 0
return self.cumulative
Both need to be consumed, for example using list to give the desired output:
>>> list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
>>> list(gen(list_a))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
>>> list(Cumulative(list_a))
[1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
However since you asked about speed I wanted to share the results from my timings:
import pandas as pd
import numpy as np
import random
import pandas as pd
from itertools import takewhile
from itertools import groupby, accumulate, chain
def MSeifert(lst):
return list(MSeifert_inner(lst))
def MSeifert_inner(lst):
cumulative = 0
for item in lst:
if item:
cumulative += item
else:
cumulative = 0
yield cumulative
def MSeifert2(lst):
return list(Cumulative(lst))
def original1(list_a):
list_b = []
for i, x in enumerate(list_a):
if x == 0:
list_b.append(x)
else:
sum_value = 0
for j in list_a[i::-1]:
if j != 0:
sum_value += j
else:
break
list_b.append(sum_value)
def original2(list_a):
return [sum(takewhile(lambda x: x != 0, list_a[i::-1])) for i, d in enumerate(list_a)]
def Coldspeed1(data):
data = data.copy()
for i in range(1, len(data)):
if data[i]:
data[i] += data[i - 1]
return data
def Coldspeed2(data):
s = pd.Series(data)
return s.groupby(s.eq(0).cumsum()).cumsum().tolist()
def Chris_Rands(list_a):
return list(chain.from_iterable(accumulate(g) for _, g in groupby(list_a, bool)))
def EvKounis(list_a):
cum_sum = 0
list_b = []
for item in list_a:
if not item: # if our item is 0
cum_sum = 0 # the cumulative sum is reset (set back to 0)
else:
cum_sum += item # otherwise it sums further
list_b.append(cum_sum) # and no matter what it gets appended to the result
def schumich(list_a):
list_b = []
s = 0
for a in list_a:
s = a+s if a !=0 else 0
list_b.append(s)
return list_b
def jbch(seq):
return list(jbch_inner(seq))
def jbch_inner(seq):
s = 0
for n in seq:
s = 0 if n == 0 else s + n
yield s
# Timing setup
timings = {MSeifert: [],
MSeifert2: [],
original1: [],
original2: [],
Coldspeed1: [],
Coldspeed2: [],
Chris_Rands: [],
EvKounis: [],
schumich: [],
jbch: []}
sizes = [2**i for i in range(1, 20, 2)]
# Timing
for size in sizes:
print(size)
func_input = [int(random.random() < 0.75) for _ in range(size)]
for func in timings:
if size > 10000 and (func is original1 or func is original2):
continue
res = %timeit -o func(func_input) # if you use IPython, otherwise use the "timeit" module
timings[func].append(res)
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
fig = plt.figure(1)
ax = plt.subplot(111)
baseline = MSeifert2 # choose one function as baseline
for func in timings:
ax.plot(sizes[:len(timings[func])],
[time.best / ref.best for time, ref in zip(timings[func], timings[baseline])],
label=func.__name__) # you could also use "func.__name__" here instead
ax.set_ylim(0.8, 1e4)
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel('size')
ax.set_ylabel('time relative to {}'.format(baseline)) # you could also use "func.__name__" here instead
ax.grid(which='both')
ax.legend()
plt.tight_layout()
In case you're interested in the exact results I put them in this gist.
It's a log-log plot and relative to the Cython answer. In short: The lower the faster and the range between two major tick represents one order of magnitude.
So all solutions tend to be within one order of magnitude (at least when the list is big) except for the solutions you had. Strangely the pandas solution is quite slow compared to the pure Python approaches. However the Cython solution beats all of the other approaches by a factor of 2.
You are playing with the indices too much in the code you posted when you do not really have to. You can just keep track of a cumulative sum and reset it to 0 every time you meet a 0.
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
cum_sum = 0
list_b = []
for item in list_a:
if not item: # if our item is 0
cum_sum = 0 # the cumulative sum is reset (set back to 0)
else:
cum_sum += item # otherwise it sums further
list_b.append(cum_sum) # and no matter what it gets appended to the result
print(list_b) # -> [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
It doesn't have to be as complicated as made in the question asked, a very simple approach could be this.
list_a = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
list_b = []
s = 0
for a in list_a:
s = a+s if a !=0 else 0
list_b.append(s)
print list_b
Starting Python 3.8, and the introduction of assignment expressions (PEP 572) (:= operator), we can use and increment a variable within a list comprehension:
# items = [1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]
total = 0
[total := (total + x if x else x) for x in items]
# [1, 2, 3, 0, 1, 2, 0, 1, 0, 1, 2, 3]
This:
Initializes a variable total to 0 which symbolizes the running sum
For each item, this both:
either increments total with the current looped item (total := total + x) via an assignment expression or set it back to 0 if the item is 0
and at the same time, maps x to the new value of total
I would use a generator if you want performance (and it's simple too).
def weird_cumulative_sum(seq):
s = 0
for n in seq:
s = 0 if n == 0 else s + n
yield s
list_b = list(weird_cumulative_sum(list_a_))
I don't think you'll get better than that, in any case you'll have to iterate over list_a at least once.
Note that I called list() on the result to get a list like in your code but if the code using list_b is iterating over it only once with a for loop or something there is no use converting the result to a list, just pass it the generator.

Count number of clusters of non-zero values in Python?

My data looks something like this:
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
Essentially, there's a bunch of zeroes before non-zero numbers and I am looking to count the number of groups of non-zero numbers separated by zeros. In the example data above, there are 3 groups of non-zero data so the code should return 3.
Number of zeros between groups of non-zeros is variable
Any good ways to do this in python? (Also using Pandas and Numpy to help parse the data)
With a as the input array, we could have a vectorized solution -
m = a!=0
out = (m[1:] > m[:-1]).sum() + m[0]
Alternatively for performance, we might use np.count_nonzero which is very efficient to count bools as is the case here, like so -
out = np.count_nonzero(m[1:] > m[:-1]) + m[0]
Basically, we get a mask of non-zeros and count rising edges. To account for the first element that could be non-zero too and would not have any rising edge, we need to check it and add to the total sum.
Also, please note that if input a is a list, we need to use m = np.asarray(a)!=0 instead.
Sample runs for three cases -
In [92]: a # Case1 :Given sample
Out[92]:
array([ 0, 0, 0, 0, 0, 0, 10, 15, 16, 12, 11, 9, 10, 0, 0, 0, 0,
0, 6, 9, 3, 7, 5, 4, 0, 0, 0, 0, 0, 0, 4, 3, 9, 7,
1])
In [93]: m = a!=0
In [94]: (m[1:] > m[:-1]).sum() + m[0]
Out[94]: 3
In [95]: a[0] = 7 # Case2 :Add a non-zero elem/group at the start
In [96]: m = a!=0
In [97]: (m[1:] > m[:-1]).sum() + m[0]
Out[97]: 4
In [99]: a[-2:] = [0,4] # Case3 :Add a non-zero group at the end
In [100]: m = a!=0
In [101]: (m[1:] > m[:-1]).sum() + m[0]
Out[101]: 5
You may achieve it via using itertools.groupby() with list comprehension expression as:
>>> from itertools import groupby
>>> len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
3
simple python solution, just count changes from 0 to non-zero, by keeping track of the previous value (rising edge detection):
a=[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
print(count) # 3
pad array with a zero on both sides with np.concatenate
find where zero with a == 0
find boundaries with np.diff
sum up boundaries found with sum
divide by two because we will have found twice as many as we want
def nonzero_clusters(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
demonstration
nonzero_clusters(
[0,0,0,0,0,0,10,15,16,12,11,9,10,0,0,0,0,0,6,9,3,7,5,4,0,0,0,0,0,0,4,3,9,7,1]
)
3
nonzero_clusters([0, 1, 2, 0, 1, 2])
2
nonzero_clusters([0, 1, 2, 0, 1, 2, 0])
2
nonzero_clusters([1, 2, 0, 1, 2, 0, 1, 2])
3
timing
a = np.random.choice((0, 1), 100000)
code
from itertools import groupby
def div(a):
m = a != 0
return (m[1:] > m[:-1]).sum() + m[0]
def pir(a):
return int(np.diff(np.concatenate([[0], a, [0]]) == 0).sum() / 2)
def jean(a):
previous = 0
count = 0
for c in a:
if previous==0 and c!=0:
count+=1
previous = c
return count
def moin(a):
return len([is_true for is_true, _ in groupby(a, lambda x: x!=0) if is_true])
def user(a):
return sum([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])
sum ([1 for n in range (len (a) - 1) if not a[n] and a[n + 1]])

Categories