Fast conditional overlapping windowing (framing) of numpy array

Fast conditional overlapping windowing (framing) of numpy array - python

I have a huge list of numpy arrays (1 dimensional), which are time series for different events. Each point has a label, and I want to window the numpy arrays based on its label. The labels I have is 0, 1, and 2. Each window has a fixed size M.
The label of each window will be the biggest label available in the window. So if a window consists of both 0 an 1 labeled datapoints, the label will be 1 for the whole window.
But the problem is that, the windowing is not label agnostic. Because of class imbalance, I want to only do overlapped windowing in case of labels 1 and 2.
So far I have written this code:
# conditional framing
data = []
start_cursor = 0
while start_cursor < arr.size:
end_cursor = start_cursor + window_size
data.append(
{
"frame": arr[start_cursor:end_cursor],
"label": y[start_cursor:end_cursor].max(),
}
)
start_cursor = end_cursor
if np.any(y[start_cursor, end_cursor] != 0):
start_cursor = start_cursor - overlap_size
But this is clearly too verbose and just plain inefficient, especially because I will call this while loop on my huge list of separate arrays.
EDIT: to explain the problem more. Imagine you are going to window a signal with fixed length M. If there only exists 0 label points in the window, there will be no overlap between adjacent windows. But if there exists labels 1 and 2, there will be an overlap between two signals with percentage p%.

I think this does what you are asking to do. The visualization for checking isn't great, but it helps you see how the windowing works. Hopefully I understood your question right and this is what you are trying to do. Anytime there is a 1 or 2 in the time series (rather than a 0) the window steps forward some fraction of the full window length (here 50%).
To examine how to do this, start with a sample time series:
import matplotlib.pylab as plt
import numpy as np
N = 5000 # time series length
# create some sort of data set to work with
x = np.zeros(N)
# add a few 1s and 2s to the list (though really they are the same for the windowing)
y = np.random.random(N)
x[y < 0.01] = 1
x[y < 0.005] = 2
# assign a window length
M = 50 # window length
overlap = 0.5 # assume 50% overlap
M_overlap = int(M * (1-overlap))
My approach is to sum the window of interest over your time series. If the sum ==0, there is no overlap between windows and if it is >0 then there is overlap. The question, then, becomes how should we calculate these sums efficiently? I compare two approaches. The first is simply to walk through the time series and the second is to use convolution (which is much faster). For the first one, I also explore different ways of assessing window size after summation.
Summation (slow version)
def window_sum1():
# start of windows in list windows
windows = [0,]
while windows[-1] + M < N:
check = sum(x[windows[-1]:windows[-1]+M]) == 0
windows.append(windows[-1] + M_overlap + (M - M_overlap) * check)
if windows[-1] + M > N:
windows.pop()
break
# plotting stuff for checking
return(windows)
Niter = 10**4
print(timeit.timeit(window_sum1, number = Niter))
# 29.201083058
So this approach went through 10,000 time series of length 5000 in about 30 seconds. But the line windows.append(windows[-1] + M_overlap + (M - M_overlap) * check) can be streamlined in an if statement.
Summation (fast version, 33% faster than slow version)
def window_sum2():
# start of windows in list windows
windows = [0,]
while windows[-1] + M < N:
check = sum(x[windows[-1]:windows[-1]+M]) == 0
if check:
windows.append(windows[-1] + M)
else:
windows.append(windows[-1] + M_overlap)
if windows[-1] + M > N:
windows.pop()
break
# plotting stuff for checking
return(windows)
print(timeit.timeit(window_sum2, number = Niter))
# 20.456240447000003
We see a 1/3 reduction in time with the if statement.
Convolution (85% faster than fast summation)
We can use signal processing to get a lot faster, by convolving the time series with the window of interest using numpy.convolve. (Disclaimer: I got the idea from the accepted answer to this question.) Of course, it also makes sense to adopt the faster window size assessment from above.
def window_conv():
a = np.convolve(x,np.ones(M,dtype=int),'valid')
windows = [0,]
while windows[-1] + M < N:
if a[windows[-1]]:
windows.append(windows[-1] + M_overlap)
else:
windows.append(windows[-1] + M)
if windows[-1] + M > N:
windows.pop()
break
return(windows)
print(timeit.timeit(window_conv, number = Niter))
#3.3695770570000008
Sliding window
The last thing I will add is that, as shown in one of the comments of this question, as of numpy 1.20 there is a function called sliding_window_view. I still have numpy 1.19 running and was not able to test it to see if it's faster than convolution.

At first, I think you should revise this line if np.any(y[start_cursor, end_cursor] != 0): to if np.any(y[start_cursor:end_cursor] != 0):
Any way,
I think we can revise your code at some points.
Firstly, you can revise this part :
if np.any(y[start_cursor: end_cursor] != 0):
start_cursor = start_cursor - overlap_size
before these lines you have calculated y[start_cursor:end_cursor].max() so you know is there any label that is bigger than 0 or not. so this is a better one:
if data[-1]['label'] != 0):
start_cursor -= overlap_size
Although, the better way is that you set y[start_cursor:end_cursor].max() into the value for using for setting 'label' and checking "if expression"
Secondly, You used "append" for data. It is so inefficient. the best way is to allocate frames with zero (you have fix size for your frame and you know maximum number of frame is maxNumFrame=np.ceil((arr.size-overlap_size)/(window_size-overlap_size)). So, you should initialize frames=np.zeros((maxNumFrame,window_size)) at the first step, then you change frames in the "while" or if you want to use your customized structure, you can initialize your list with zero value, then change values in "while"
Thirdly, the best way is that in a while you calculate "start_cursor" and y and set them into the array of tuple or 2 arrays. ("end_cursor" is redundant)
After that, make the frames by using "map" in one of the ways that I said. (In one array or your customized structure)

Related

Is there an efficient way to create a binomial experiment of N bernoulli trials in a numpy array?

Suppose I have a coin with that lands on heads with probability P. The experiment to be performed is to continue flipping the coin x number of times. This experiment is to be repeated 1000 times.
Question Is there an efficient/vectorized approach to generate an array of random 1's(with probability P) and 0's (with probability 1-P)?
If I try something like this:
np.full(10,rng().choice((0,1),p= [.3,.7]))
The entire array is filled with the same selection. I have seen solutions that involve a fixed ratio of zeros to ones.
a = np.ones(n+m)
a[:m] = 0
np.random.shuffle(a)
However I'm not sure how to preserve the stochastic nature of the experiments with this set up.
Presently I am just looping through each iteration as follows, but it is very slow once the number of experiments gets large.
(The actual experiment involves terminating each trial when two consecutive heads are flipped, which is why there is a while loop in the code. For purposes of making the question specific I didn't want to address that here. )
Set = [0,1]
T = np.ones(Episodes)
for i in range(Episodes):
a = rng().choice(Set, p=[(1 - p), p])
counter = 1
while True:
b = rng().choice(Set, p=[(1-p),p])
counter += 1
if (a == 1) & (b == 1):
break
a = b
T[i] = counter
Any insights would be appreciated, thanks!

Answers provided by #Quang Hong and #Kevin as listed in the comments above. Just reposting with default_rng() so it is easier to reference later. But they are the true heroes here.
from numpy import default_rng as rng
rng().binomial(1, p = .7, size=(10,10))
rng().choice((0,1),p = [.3,.7], size=(10,10))

Why does np.add.at() return the wrong answer for large arrays?

I have a large data set, statistic, with statistic.shape = (1E10,) that I want to effectively bin (sum) into an array of zeros, out = np.zeros(1E10). Each entry in statistic has a corresponding index, idx, which tells me in which out bin it belongs. The indices are not unique so I cannot use out += statistic[idx] since this will only count the first time a particular index is encountered. Therefore I'm using np.add.at(out, idx, statistic). My problem is that for very large arrays, np.add.at() returns the wrong answer.
Below is an example script that shows this behaviour. The function check_add() should return 1.
import numpy as np
def check_add(N):
N = int(N)
out = np.zeros(N)
np.add.at(out, np.arange(N), np.ones(N))
return np.sum(out)/N
n_arr = [1E3, 1E5, 1E8, 1E10]
for n in n_arr:
print('N = {} (log(N) = {}); output ratio is {}'.format(n, np.log10(n), check_add(n)))
This example returns for me:
N = 1000.0 (log(N) = 3.0); output ratio is 1.0
N = 100000.0 (log(N) = 5.0); output ratio is 1.0
N = 100000000.0 (log(N) = 8.0); output ratio is 1.0
N = 10000000000.0 (log(N) = 10.0); output ratio is 0.1410065408
Can someone explain to me why the function fails for N=1E10?

This is an old bug, NumPy issue 13286. ufunc.at was using a too-small variable for the loop counter. It got fixed a while ago, so update your NumPy. (The fix is present in 1.16.3 and up.)

You're overflowing int32:
1E10 % (np.iinfo(np.int32).max - np.iinfo(np.int32).min + 1) # + 1 for 0
Out[]: 1410065408
There's your weird number (googling that number actually got me to here which is how I figured this out.)
Now, what's happening in your function is a bit more weird. By the documentation of ufunc.at you should just be accumulate-adding the 1 values in the indices that are lower than np.iinfo(np.int32).max and the negative indices above np.iinfo(np.int32).min - but it seems to be 1) working backwards and 2) stopping when it gets to the last overflow. Without digging into the c code I couldn't tell you why, but it's probably a good thing it does - your function would fail silently and with the "correct" mean if it had done things this way, while corrupting your results (having 2 or 3 in those indices and 0 in the middle).

It is most likely due to integer precision indeed. If you play around with the numpy data-type (e.g. you constrain it to an (unsigned) value between 0-255) by setting uint8, you will see that they ratios start declining already for the second array. I do not have enough memory to test it, but setting all dtypes to uint64 as below should help:
def check_add(N):
N = int(N)
out = np.zeros(N,dtype='uint64')
np.add.at(out, np.arange(N,dtype='uint64'), 1)
return np.sum(out)/N
To understand the behavior, I recommend setting dtype='uint8' and checking the behavior for smaller N. So what happens is that the np.arange function creates ascending integers for the vector elements until it reaches the integer limit. It then starts again at 0 and counts up again, so at the beginning (smaller Ns) you get correct sum (although your out vector contains a lot of elements >1 in the positions 0:limit and a lot of elements = 0 beyond the limit). If however you choose N large enough, the elements in your out vector start exceeding the integer limit and start again from 0. As soon as that happens your sum is vastly off. To double-check, realize that the uint8 limit is 255(256 integers) and 256^2=65536. Set N = 65536 with dtype='uint8' and check_add(65536) will return 0.
import numpy as np
def check_add(N):
N = int(N)
out = np.zeros(N,dtype='uint8')
np.add.at(out, np.arange(N,dtype='uint8'), 1)
return np.sum(out)/N
n_arr = [1E1, 1E3, 1E5,65536, 1E7]
for n in n_arr:
print('N = {} (log(N) = {}); output ratio is {}'.format(n, np.log10(n), check_add(n)))
Also note, that you don't need the np.ones vector but can simply replace it by 1, if all you care about is uniformly incrementing everything by 1.

Guessing as I couldn't run it, but could it be a problem that you are exceeding max integer value in python for the last option? Ie exceeds 2147483647.
Use longinteger type instead as per below.
Referring to: [enter link description here][1]https://docs.python.org/2.0/ref/integers.html
Hope this helps. Please let me know if it does work.

Fast calculation of sum for function defined over range of integers - (0,2^52)

I was looking at the code for a particular cryptocurrency casino game (EthCrash - if you're interested). The game generates crash points using a function (I call this crash(x)) where x is an integer that is randomly drawn from the space of integers (0,2^52).
I'd like to calculate the expected value of the crash points. The code below should explain everything, but a clean picture of the function is here: https://i.imgur.com/8dPBALa.png, and what I'm trying to calculate is here: https://i.imgur.com/nllykDQ.png (apologies - can't paste pictures yet).
I wrote the following code:
import math
two52 = 2**52
def crash(x):
crash_point = math.floor((100*two52-x)/(two52-x))
return(crash_point/100)
crashes_sum = 0
for i in range(two52+1):
crashes_sum += crash(i)
expected_crash = crashes_sum/two52
Unfortunately, the loop is taking too long to run - any ideas for how I can do this faster?

ok, if you cannot do it straightforward, time to get smart, right?
So idea to get ranges where whole sum could be computed fast. I will put some pseudocode which not even compiles, could have bugs etc. Use it as illustration.
First, lets rewrite the term in the sum as
floor( 100 + 99*x/(252 - x) )
First idea - get ranges where floor is not changing due to the fact that term
n =< 99*x/(252 - x) < n+1. Obviously, for this whole range we could add to sum range_length*(100 + n), no need to do it term by term
sum = 0
r_lo = 0
for k in range(0, 2*52): # LOOP OVER RANGES
r_hi = floor(2**52/(1 + 99/n))
sum += (100 + n -1)*(r_hi - r_lo)
if r_hi-r_lo == 1:
break
r_lo = r_hi + 1
Obviously, range size will shrink till it is equal to 1, and then this method will be useless, we break out. Obviously, by that time each term would be different from previous one by 1 or more.
Ok, second idea - again ranges, where sum is arithmetic series. First we have to find range where increment is equal to 1. Then range where increment is equal to 2, etc. Looks like you have to find roots of quadratic equation for this, but code would be about the same
r_lo = pos_for_increment(1)
t_lo = ... # term at r_lo
for n in range(2, 2*52): # LOOP OVER RANGES
r_hi = pos_for_increment(n) - 1
t_hi = ... # term at r_lo
sum += (t_lo + t_hi)*(r_hi - r_lo) / 2 # arith.series sum
if r_hi > 2**52:
break
r_lo = r_hi + 1
t_lo = t_hi + n
might think about something else, but those tricks are worth trying

Using the map function might help increase the speed since it makes the computation in parallel
import math
two52 = 2**52
def crash(x):
crash_point = math.floor((100*two52-x)/(two52-x))
return(crash_point/100)
crashes_sum = sum(map(crash,range(two52)))
expected_crash = crashes_sum/two52

I have been able to speed up your code by taking advantage of numpy vectorization:
import numpy as np
import time
two52 = 2**52
crash = lambda x: np.floor( ( 100 * two52 - x ) / ( two52 - x ) ) / 100
starttime = time.time()
icur = 0
ispan = 100000
crashes_sum = 0
while icur < two52-ispan:
i = np.arange(icur, icur+ispan, 1)
crashes_sum += np.sum(crash(i))
icur += ispan
crashes_sum += np.sum(crash(np.arange(icur, two52, 1)))
expected_crash = crashes_sum / two52
print(time.time() - starttime)
The trick is to compute the sum on a moving windows to take advantage of numpy's vectorization (written in C). I tried up to 2**30 and it takes 9 seconds on my laptop (and too long for your code to be able to benchmark).
Python is probably not the most suitable language for what you want to do, you may want to try C or Fortran for that (and take advantage of threading).

You will have to use a powerful GPU if you wan't the result within some hours.
A possible CPU implementation
import numpy as np
import numba as nb
import time
two52 = 2**52
loop_to=2**30
#nb.njit(fastmath=True,parallel=True)
def sum_over_crash(two52,loop_to): #loop_to is only for testing performance
crashes_sum = nb.float64(0)
for i in nb.prange(loop_to):#nb.prange(two52+1):
crashes_sum += np.floor((100*two52-i)/(two52-i))/100
return crashes_sum/two52
sum_over_crash(two52,2)#don't measure static compilation overhead
t1=time.time()
sum_over_crash(two52,2**30)
print(time.time()-t1)
This takes 0.57s for on my quadcore i7. eg. 28 days for the whole calculation.

As the calculation can not be minimized mathematically, the only option is to calculate it step by step.
This takes a long time (as stated in other answers). Your best bet on calculating it fast is to use a lower level language than python. Since python is an interpreted language, it is rather slow to calculate this kind of thing.
Additionally you can use multithreading (if availible in the chosen language) to make it even faster.
Cloud Computing is also an option that could be suitable for this, as you are only going to calculate the number once. Amazon and Google (and many more) provide this kind of service for a relatively small fee.
But before performing any of the calculations you need to adjust your formula, as with the way it stands right now, you're going to get a ZeroDivisionError at the very last iteration of your loop.

How to efficiently update np array depending on index and value?

I have an image of the sun, I found center and radius and now I want to process pixels differently if they are inside or outside the disk. The ideal solution would be to imterpolate the parameters of the processing function, in order to smoothly transition from disk to background.
Here is what I'm doing now:
for index,value in np.ndenumerate(sun_img):
if distance.euclidean(index,center) > radius:
sun_img[index] = processing_function(index,value)
Like this it works but it takes forever to compute the image. I'm sure there is a more efficient way to do that. How would you solve this?
Image shape is around (1000, 1000)
Processing_function is basically not doing anything right now: value += 1
The function should be something like a non-linear "step function" with 0.0 value till radius and 1.0 5px after. something like: _______/''''''''''''''''''''' multiplied by the value of the pixel. The slope should be on the value of the radius. I wanna do this in order to enhance the protuberances

Here's a vectorized way leveraging NumPy broadcasting -
m,n = sun_img.shape
I,J = np.ogrid[:m,:n]
sq_dist = (I - center[0])**2 + (J - center[1])**2
valid_mask = sq_dist > radius**2
Now, for a processing_function that just adds 1 to the valid places, defined by the IF-conditional, do -
sun_img[valid_mask] += 1
If you need to implement a custom operation with processing_function that needs those row, column indices, use np.where to get those indices and then iterate through the valid elements, like so -
r,c = np.where(valid_mask)
for index in zip(r,c):
sun_img[index] = processing_function(index,sun_img[r,c])
If you have a lot of such valid places, then computing r,c might make things slow. In that case, directly use the mask, like so -
for index,value in np.ndenumerate(sun_img):
if valid_mask[index]:
sun_img[index] = processing_function(index,value)
Compared to the original code, the benefit is that we have the conditional values pre-computed before going into the loop. The best way again would be to vectorize processing_function itself so that it works on a bigger chunk of data, but that would depend on its implementation.

Speeding up Numpy Masking

I'm still an amature when it comes to thinking about how to optimize. I have this section of code that takes in a list of found peaks and finds where these peaks,+/- some value, are located in a multidimensional array. It then adds +1 to their indices of a zeros array. The code works well, but it takes a long time to execute. For instance it is taking close to 45min to run if ind has 270 values and refVals has a shape of (3050,3130,80). I understand that its a lot of data to churn through, but is there a more efficient way of going about this?
maskData = np.zeros_like(refVals).astype(np.int16)
for peak in ind:
tmpArr = np.ma.masked_outside(refVals,x[peak]-2,x[peak]+2).astype(np.int16)
maskData[tmpArr.mask == False ] += 1
tmpArr = None
maskData = np.sum(maskData,axis=2)

Approach #1 : Memory permitting, here's a vectorized approach using broadcasting -
# Craate +,-2 limits usind ind
r = x[ind[:,None]] + [-2,2]
# Use limits to get inside matches and sum over the iterative and last dim
mask = (refVals >= r[:,None,None,None,0]) & (refVals <= r[:,None,None,None,1])
out = mask.sum(axis=(0,3))
Approach #2 : If running out of memory with the previous one, we could use a loop and use NumPy boolean arrays and that could be more efficient than masked arrays. Also, we would perform one more level of sum-reduction, so that we would be dragging less data with us when moving across iterations. Thus, the alternative implementation would look something like this -
out = np.zeros(refVals.shape[:2]).astype(np.int16)
x_ind = x[ind]
for i in x_ind:
out += ((refVals >= i-2) & (refVals <= i+2)).sum(-1)
Approach #3 : Alternatively, we could replace that limit based comparison with np.isclose in approach #2. Thus, the only step inside the loop would become -
out += np.isclose(refVals,i,atol=2).sum(-1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fast conditional overlapping windowing (framing) of numpy array - python

Related

Is there an efficient way to create a binomial experiment of N bernoulli trials in a numpy array?

Why does np.add.at() return the wrong answer for large arrays?

Fast calculation of sum for function defined over range of integers - (0,2^52)

How to efficiently update np array depending on index and value?

Speeding up Numpy Masking

Categories

Resources