While loop cycle not satisfying requirements

While loop cycle not satisfying requirements - python

I am measuring parameters on a battery (current, voltage, etc..) using an analogue to digital converter. The “While loop” cycle contains also the measurements functions which are not shown in this context because are not part of my question.
With the code here below, I am attempting to calculate Ampere/hours on each iteration of the cycle (Ahinst) simply multiplying the measured current by the elapsed time between two measurements. I am also summing up the Ah to get a cumulative value (TotAh) drained from the battery. This last value is shown only when the current (P2) is negative (battery not in charging mode). When the current (P2) reverse into charging mode I clear TotAh and just show 0.
timeMeas=[]
currInst=[]
Ah=[]
TotAh=0
while(True):
try:
#measurements routines are also running here
#......................
#Ah() in development
if (P2 < 0): #if current is negative (discharging)
Tnow = datetime.now() #get time_start reference to calculate elapsed time until next current measure
timeMeas.append (Tnow) #save time_start
currInst.append (P2) #save current at time_start
if (len(currInst) >1): #if current measurements are two
elapsed=(timeMeas[1]-timeMeas[0]).total_seconds() #calculate time elapsed between two mesurements
Ahinst=currInst[1]/3600*elapsed #calculate Ah per time interval
Ah.append(Ahinst) #save Ah per time interval
TotAh=round(sum(Ah),3)* -1 #update cumulative Ah
timeMeas=[] #cleanup data in array
currInst=[] #cleanup data in array
elif (P2 > 0):
TotAh=0
Ah=[]
time.sleep(1)
except KeyboardInterrupt:
break
The code is working but obviously is not giving me the correct result because in the second “if”condition I always clear the two arrays (timeMeas and CurrInst). Since the calculation requires at least two actual measurements “if (len(currInst)>1) ” to work, clearing the two arrays cause to lose one measurement at every iteration of the cycle. I have considered shifting the values position from 0 to 1 in the arrays at every iteration, but this would cause calculation mistakes when the cycle is restarted after the value P2 is reversed to charging and then discharging mode again.
I am very rusty with coding and doing this for hobby. I am battling to find a solution to calculate “Ahinst” at every cycle with the actual values.
Any help is appreciated. Thanks

If you only want to keep two measurements (current and previous) you can keep arrays of size two, and have idx = 1 - idx at the end of the loop to have it flip-flop between 0 and 1.
timeMeas = [None, None]
currInst = [None, None]
TotAh = 0.0
idx = 0
while True: # no need for parentheses
try:
if (P2 < 0):
Tnow = datetime.now()
timeMeas[idx] = Tnow
currInst[idx] = P2
if currInst[1] is not None: #meaning we have at least two measurements
elapsed = (timeMeas[idx]-timeMeas[1-idx]).total_seconds()
TotAh + = currInst[idx]/3600*elapsed
elif (P2 > 0): # is "elif" really correct here?
TotAh = 0.0
# Do we want to reset these, too?
timeMeas = [None, None]
currInst = [None, None]
# should this really be inside the elif?
time.sleep(1)
idx = 1 - idx
except KeyboardInterrupt:
break
In some sense, it would be simpler to have two dict variables curr and prev, and set prev = None when you start or reset them. Then simply set curr = prev at the end of the loop, and populate curr with new values in each iteration, like curr['when'] = datetime.now() and curr['measurement'] = P2.

Related

Amortized O(1) rolling minimum implemented in Python Numba/NumPy

I am trying to implement a rolling minimum that has an amortized O(1) get_min(). The amortized O(1) algorithm comes from the accepted answer in this post
Original function:
import pandas as pd
import numpy as np
from numba import njit, prange
def rolling_min_original(data, n):
return pd.Series(data).rolling(n).min().to_numpy()
My attempt to implement the amortized O(1) get_min() algorithm:(this function has decent performance for non-small n)
#njit
def rollin_min(data, n):
"""
brief explanations:
param: stk2: the stack2 in the algorithm, except here it only stores the min stack
param: stk2_top: it starts at n-1, and drops gradually until it hits -1 then it comes backup to n-1
if stk2_top= 0 in the current iteration(it will become -1 at the end):
that means stk2_top is pointing at the bottom element in stk2,
after it drops to -1 from 0, in the next iteration, stk2 will be reassigned to a new array data[i-n+1:i+1],
because we need to include the current index.
at each iteration:
if stk2_top <0: (i.e. we have 0 stuff in stk2(aka stk2_top <0)
- copy the past n items(including the current one) to stk2, so that stk2 has n items now
- pick the top min from stk2(stk2_top = n-1 momentarily)
- move down the pointer by 1 after the operation(n-1 becomes n-2)
else: (i.e. we have j(1<=j<= n-1) stuff in stk2)
- pick the top min from stk2(stk2_top is j-1 momentarily)
- move down the pointer by 1 after the operation(j-1 becomes j-2)
"""
if n >1:
def min_getter_rev(arr1):
arr = arr1[::-1]
result = np.empty(len(arr), dtype = arr1.dtype)
result[0]= local_min = arr[0]
for i in range(1,len(arr)):
if arr[i] < local_min:
local_min = arr[i]
result[i] = local_min
return result
result_min= np.empty(len(data), dtype= data.dtype)
for i in prange(n-1):
result_min[i] =np.nan
stk2 = min_getter_rev(data[:n])
stk2_top = n-2#it is n-2 because the loop starts at n(not n-1)which is the second non nan term
stk1_min = data[n-1]#stk1 needs to be the first item of the stk1
result_min[n-1]= min(stk1_min, stk2[-1])
for i in range(n,len(data)):
if stk2_top >= 0:
if data[i] < stk1_min:
stk1_min= min(data[i], stk1_min)#the stk1 min
result_min[i] = min(stk1_min, stk2[stk2_top])#min of the top element in stk2 and the current element
else:
stk2 = min_getter_rev(data[i-n+1:i+1])
stk2_top= n-1
stk1_min = data[i]
result_min[i]= min(stk1_min, stk2[n-1])
stk2_top -= 1
return result_min
else:
return data
A naive implementation when n is small:
#njit(parallel= True)
def rolling_min_smalln(data, n):
result= np.empty(len(data), dtype= data.dtype)
for i in prange(n-1):
result[i]= np.nan
for i in prange(n-1, len(data)):
result[i]= data[i-n+1: i+1].min()
return result
Some little code for testing
def remove_nan(arr):
return arr[~np.isnan(arr)]
if __name__ == '__main__':
np.random.seed(0)
data_size = 200000
data = np.random.uniform(0,1000, size = data_size)+29000
w_size = 37
r_min_original= rolling_min_original(data, w_size)
rmin1 = rollin_min(data, w_size)
r_min_original = remove_nan(r_min_original)
rmin1 = remove_nan(rmin1)
print(np.array_equal(r_min_original,rmin1))
The function rollin_min() has nearly constant runtime and lower runtime than rolling_min_original() when n is large, which is nice. But it has poor performance when n is low(around n < 37 in my pc, in this range rollin_min() can easily be beaten by a naive implementation rolling_min_smalln()).
I am struggling to find ways to improve rollin_min(), but so far I am stuck, which is why I am seeking for help here.
My questions are the following:
Is the algorithm I am implementing the best out there for rolling/sliding window min/max?
If not, what is the best/better algorithm? If so, how can I further improve the function from the algorithm's point of view?
Besides the algorithm itself, what other ways can further improve the performance of the function rollin_min()?
EDIT: Moved my latest answer to the answer section upon multiple requests

The primary cause of slowness in your code is probably the allocation of a new array in min_getter_rev. You should reuse the same storage throughout.
Then, because you don't really have to implement a queue, you can make more optimizations. For example the size of the two stacks is at most (and usually) n, so you you can keep them in the same array with size n. Grow one from the start and one from the end.
You would notice that there is a very regular pattern - fill the array from start to end in order, recalculate the minimums from the end, generate output as you refill the array, repeat...
This leads to an actually simpler algorithm with a simpler explanation that doesn't refer to stacks at all. Here is an implementation, with comments about how it works. Note that I didn't bother stuffing the start with NaNs:
def rollin_min(data, n):
#allocate the result. Note the number valid windows is len(data)-(n-1)
result = np.empty(len(data)-(n-1), data.dtype)
#every nth position is a "mark"
#every window therefore contains exactly 1 mark
#the minimum in the window is the minimum of:
# the minimum from the window start to the following mark; and
# the minimum from the window end the the preceding (same) mark
#calculate the minimum from every window start index to the next mark
for mark in range(n-1, len(data), n):
v = data[mark]
if (mark < len(result)):
result[mark] = v
for i in range(mark-1, mark-n, -1):
v = min(data[i],v)
if (i < len(result)):
result[i] = v
#for each window, calculate the running total from the preceding mark
# to its end. The first window ends at the first mark
#then combine it with the first distance to get the window minimum
nextMarkPos = 0
for i in range(0,len(result)):
if i == nextMarkPos:
v = data[i+n-1]
nextMarkPos += n
else:
v = min(data[i+n-1],v)
result[i] = min(result[i],v)
return result

Moved this from the Question EDIT section to here upon multiple requests.
Inspired by the simpler implementation given by Matt Timmermans in the answer, I have made a cpu-multicore version of the rolling min. The code is as follows:
#njit(parallel= True)
def rollin_min2(data, n):
"""
1) make a loop that iterates over K sections of n elements; each section is independent so that it can benefit from multicores cpu
2) for each iteration of sections, generate backward local minimum(sec_min2) and forward minimum(sec_min1)
say m=8, len(data)= 23, then we only need the idx= (reversed to 7,6,5,...,1,0 (0 means minimum up until idx= 0)),
1st iter
result[7]= min_until 0,
result[8]= min(min(data[7:9]) and min_until 1),
result[9]= min(min(data[7:10]) and m_til 2)
...
result[14]= min(min(data[7:15]) and m_til 7)
2nd iter
result[15]= min_until 8,
result[16]= min(min(data[15:17]) and m_til 9),
result[17]= min(min(data[15:18]) and m_til 10)
...
result[22]= min(min(data[15:23]) and m_til 15)
"""
ar_len= len(data)
sec_min1= np.empty(ar_len, dtype = data.dtype)
sec_min2= np.empty(ar_len, dtype = data.dtype)
for i in prange(n-1):
sec_min1[i]= np.nan
for sec in prange(ar_len//n):
s2_min= data[n*sec+ n-1]
s1_min= data[n*sec+ n]
for i in range(n-1,-1,-1):
if data[n*sec+i] < s2_min:
s2_min= data[n*sec+i]
sec_min2[n*sec+i]= s2_min
sec_min1[n*sec+ n-1]= sec_min2[n*sec]
for i in range(n-1):
if n*sec+n+i < ar_len:
if data[n*sec+n+i] < s1_min:
s1_min= data[n*sec+n+i]
sec_min1[n*sec+n+i]= min(s1_min, sec_min2[n*sec+i+1])
else:
break
return sec_min1
I have actually spent an hour testing various implementations of rolling min. In my 6C/12T laptop, this multi-core version works best when n is "medium size". When n is at least 30% of the length of the source data though, other implementation starts to outshine. There must be even better ways to improve this function, but at the time of this edit I am not aware of them just yet.

python time results not as expected: time.time() - time.time()

In playing around with the python execution of time, I found an odd behavior when calling time.time() twice within a single statement. There is a very small processing delay in obtaining time.time() during statement execution.
E.g. time.time() - time.time()
If executed immediately in a perfect world, would compute in a result of 0.
However, in real world, this results in a very small number as there is an assumed delay in when the processor executes the first time.time() computation and the next. However, when running this same execution and comparing it to a variable computed in the same way, the results are skewed in one direction.
See the small code snippet below.
This also holds true for very large data sets
import time
counts = 300000
def at_once():
first = 0
second = 0
x = 0
while x < counts:
x += 1
exec_first = time.time() - time.time()
exec_second = time.time() - time.time()
if exec_first > exec_second:
first += 1
else:
second += 1
print('1sts: %s' % first)
print('2nds: %s' % second)
prints:
1sts: 39630
2nds: 260370
Unless I have my logic incorrect, I would expect the results to very close to 50:50, but it does not seem to be the case. Is there anyone who could explain what causes this behavior or point out a potential flaw with the code logic that is making the results skewed in one direction?

Could it be that exec_first == exec_second? Your if-else would add 1 to second in that case.
Try changing you if-else to something like:
if exec_first > exec_second:
first += 1
elif exec_second > exec_first:
second += 1
else:
pass

You assign all of the ties to one category. Try it with a middle ground:
import time
counts = 300000
first = 0
second = 0
same = 0
for _ in range(counts):
exec_first = time.time() - time.time()
exec_second = time.time() - time.time()
if exec_first == exec_second:
same += 1
elif exec_first > exec_second:
first += 1
else:
second += 1
print('1sts: %s' % first)
print('same: %s' % same)
print('2nds: %s' % second)
Output:
$ python3 so.py
1sts: 53099
same: 194616
2nds: 52285
$ python3 so.py
1sts: 57529
same: 186726
2nds: 55745
Also, I'm confused as to why you think that a function call might take 0 time. Every invocation requires at least access to the system clock and copying that value to a temporary location of some sort. This isn't free of overhead on any current computer.

Python - faster alternative to 'for' loops

I am trying to construct a binomial lattice model in Python. The idea is that there are multiple binomial lattices and based on the value in particular lattice, a series of operations are performed in other lattices.
These operations are similar to 'option pricing model' ( Reference to Black Scholes models) in a way that calculations start at the last column of the lattice and those are iterated to previous column one step at a time.
For example,
If I have a binomial lattice with n columns,
1. I calculate the values in nth column for a single or multiple lattices.
2. Based on these values, I update the values in (n-1)th column in same or other binomial lattices
3. This process continues until I reach the first column.
So in short, I cannot process the calculations for all of the lattice simultaneously as value in each column depends on the values in next column and so on.
From coding perspective,
I have written a function that does the calculations for a particular column in a lattice and outputs the numbers that are used as input for next column in the process.
def column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,
call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new,tau, r, cs, dt,call_trigger,
putPrice,callPrice):
for k in range(1, n+1-tau):
ConvertProb_col_new[n-k] = 0.5*(ConvertProb_col[n-1-k] + ConvertProb_col[n-k])
y_col_new[n-k] = ConvertProb_col_new[n-k]*r + (1- ConvertProb_col_new[n-k]) *(r + cs)
# Calculate the holding value
ContinuationValue_col_new[n-k] = 0.5*(ContinuationValue_col[n-1-k]/(1+y_col[n-1-k]*dt) + ContinuationValue_col[n-k]/(1+y_col[n-k]*dt))
# Coupon payment date
if np.isin(n-1-tau, coupon_dates_index) == True:
ContinuationValue_col_new[n-k] = ContinuationValue_col_new[n-k] + Principal*(1/2*c);
# check put/call schedule
callflag = (np.isin(n-1-tau, call_dates_index)) & (StockPrices_col[n-k] >= call_trigger)
putflag = np.isin(n-1-tau, put_dates_index)
convertflag = np.isin(n-1-tau, convert_dates_index)
# if t is in call date
if (np.isin(n-1-tau, call_dates_index) == True) & (StockPrices_col[n-k] >= call_trigger):
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, min(callPrice, ContinuationValue_col_new[n-k])] )
# if t is not call date
else:
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, ContinuationValue_col_new[n-k]] )
# 1. if Conversion happens
if node_val == ConversionValue_col[n-k]*convertflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 1
# 2. if put happens
elif node_val == putPrice*putflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
# 3. if call happens
elif node_val == callPrice*callflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
else:
ContinuationValue_col_new[n-k] = node_val
return ConvertProb_col_new, ContinuationValue_col_new, y_col_new
I am calling this function for every column in the lattice through a for loop.
So essentially I am running a nested for loop for all the calculations.
My issue is - This is very slow.
The function doesn't take much time. but the second iteration where I am calling the function through the for loop is very time consuming ( avg. times the function will be iterated in below for loop is close to 1000 or 1500 ) It takes almost 2.5 minutes to run the complete model which is very slow from standard modeling standpoint.
As mentioned above, most of the time is taken by the nested for loop shown below:
temp_mat = np.empty((n,3))*(np.nan)
temp_mat[:,0] = ConvertProb[:, n-1]
temp_mat[:,1] = ContinuationValue[:, n-1]
temp_mat[:,2] = y[:, n-1]
ConvertProb_col_new = np.empty((n,1))*(np.nan)
ContinuationValue_col_new = np.empty((n,1))*(np.nan)
y_col_new = np.empty((n,1))*(np.nan)
for tau in range(1,n):
ConvertProb_col = temp_mat[:,0]
ContinuationValue_col = temp_mat[:,1]
y_col = temp_mat[:,2]
ConversionValue_col = ConversionValue[:, n-tau-1]
StockPrices_col = StockPrices[:, n-tau-1]
out = column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new, tau, r, cs, dt,call_trigger,putPrice,callPrice)
temp_mat[:,0] = out[0].reshape(np.shape(out[0])[0],)
temp_mat[:,1] = out[1].reshape(np.shape(out[1])[0],)
temp_mat[:,2] = out[2].reshape(np.shape(out[2])[0],)
#Final value
print(temp_mat[-1][1])
Is there any way I can reduce the time consumed in nested for loop? or is there any alternative that I can use instead of nested for loop.
Please let me know. Thanks a lot !!!

Python interval interesction

My problem is as follows:
having file with list of intervals:
1 5
2 8
9 12
20 30
And a range of
0 200
I would like to do such an intersection that will report the positions [start end] between my intervals inside the given range.
For example:
8 9
12 20
30 200
Beside any ideas how to bite this, would be also nice to read some thoughts on optimization, since as always the input files are going to be huge.

this solution works as long the intervals are ordered by the start point and does not require to create a list as big as the total range.
code
with open("0.txt") as f:
t=[x.rstrip("\n").split("\t") for x in f.readlines()]
intervals=[(int(x[0]),int(x[1])) for x in t]
def find_ints(intervals, mn, mx):
next_start = mn
for x in intervals:
if next_start < x[0]:
yield next_start,x[0]
next_start = x[1]
elif next_start < x[1]:
next_start = x[1]
if next_start < mx:
yield next_start, mx
print list(find_ints(intervals, 0, 200))
output:
(in the case of the example you gave)
[(0, 1), (8, 9), (12, 20), (30, 200)]

Rough algorithm:
create an array of booleans, all set to false seen = [False]*200
Iterate over the input file, for each line start end set seen[start] .. seen[end] to be True
Once done, then you can trivially walk the array to find the unused intervals.
In terms of optimisations, if the list of input ranges is sorted on start number, then you can track the highest seen number and use that to filter ranges as they are processed -
e.g. something like
for (start,end) in input:
if end<=lowest_unseen:
next
if start<lowest_unseen:
start=lowest_unseen
...
which (ignoring the cost of the original sort) should make the whole thing O(n) - you go through the array once to tag seen/unseen and once to output unseens.
Seems I'm feeling nice. Here is the (unoptimised) code, assuming your input file is called input
seen = [False]*200
file = open('input','r')
rows = file.readlines()
for row in rows:
(start,end) = row.split(' ')
print "%s %s" % (start,end)
for x in range( int(start)-1, int(end)-1 ):
seen[x] = True
print seen[0:10]
in_unseen_block=False
start=1
for x in range(1,200):
val=seen[x-1]
if val and not in_unseen_block:
continue
if not val and in_unseen_block:
continue
# Must be at a change point.
if val:
# we have reached the end of the block
print "%s %s" % (start,x)
in_unseen_block = False
else:
# start of new block
start = x
in_unseen_block = True
# Handle end block
if in_unseen_block:
print "%s %s" % (start, 200)
I'm leaving the optimizations as an exercise for the reader.

If you make a note every time that one of your input intervals either opens or closes, you can do what you want by putting together the keys of opens and closes, sort into an ordered set, and you'll be able to essentially think, "okay, let's say that each adjacent pair of numbers forms an interval. Then I can focus all of my logic on these intervals as discrete chunks."
myRange = range(201)
intervals = [(1,5), (2,8), (9,12), (20,30)]
opens = {}
closes = {}
def open(index):
if index not in opens:
opens[index] = 0
opens[index] += 1
def close(index):
if index not in closes:
closes[index] = 0
closes[index] += 1
for start, end in intervals:
if end > start: # Making sure to exclude empty intervals, which can be problematic later
open(start)
close(end)
# Sort all the interval-endpoints that we really need to look at
oset = {0:None, 200:None}
for k in opens.keys():
oset[k] = None
for k in closes.keys():
oset[k] = None
relevant_indices = sorted(oset.keys())
# Find the clear ranges
state = 0
results = []
for i in range(len(relevant_indices) - 1):
start = relevant_indices[i]
end = relevant_indices[i+1]
start_state = state
if start in opens:
start_state += opens[start]
if start in closes:
start_state -= closes[start]
end_state = start_state
if end in opens:
end_state += opens[end]
if end in closes:
end_state -= closes[end]
state = end_state
if start_state == 0:
result_start = start
result_end = end
results.append((result_start, result_end))
for start, end in results:
print(str(start) + " " + str(end))
This outputs:
0 1
8 9
12 20
30 200
The intervals don't need to be sorted.

This question seems to be a duplicate of Merging intervals in Python.
If I understood well the problem, you have a list of intervals (1 5; 2 8; 9 12; 20 30) and a range (0 200), and you want to get the positions outside your intervals, but inside given range. Right?
There's a Python library that can help you on that: python-intervals (also available from PyPI using pip). Disclaimer: I'm the maintainer of that library.
Assuming you import this library as follows:
import intervals as I
It's quite easy to get your answer. Basically, you first want to create a disjunction of intervals based on the ones you provide:
inters = I.closed(1, 5) | I.closed(2, 8) | I.closed(9, 12) | I.closed(20, 30)
Then you compute the complement of these intervals, to get everything that is "outside":
compl = ~inters
Then you create the union with [0, 200], as you want to restrict the points to that interval:
print(compl & I.closed(0, 200))
This results in:
[0,1) | (8,9) | (12,20) | (30,200]

how to create a mask from time points for a numpy array?

data is a matrix containing 2500 time series of a measurment. I need to average each time series over time, discarding data points that were recorded around a spike (in the interval tspike-dt*10... tspike+10*dt). The number of spiketimes is variable for each neuron and stored in a dictionary with 2500 entries. My current code iterates over neurons and spiketimes and sets the masked values to NaN. Then bottleneck.nanmean() is called. However this code is to slow in the current version, and I am wondering wheater there is a faster solution. thanks!
import bottleneck
import numpy as np
from numpy.random import rand, randint
t = 1
dt = 1e-4
N = 2500
dtbin = 10*dt
data = np.float32(ones((N, t/dt)))
times = np.arange(0,t,dt)
spiketimes = dict.fromkeys(np.arange(N))
for key in spiketimes:
spiketimes[key] = rand(randint(100))
means = np.empty(N)
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
if len(spike_times) > 0:
for spike_time in spike_times:
start=max(spike_time-dtbin,0)
end=min(spike_time+dtbin,t)
idx = np.all([times>=start,times<=end],0)
datarow[idx] = np.NaN
means[i] = bottleneck.nanmean(datarow)

The vast majority of the processing time in your code comes from this line:
idx = np.all([times>=start,times<=end],0)
This is because for each spike, you are comparing every value in times against start and end. Since you have uniform time steps in this example (and I presume this is true in your data as well), it is much faster to simply compute the start and end indexes:
# This replaces the last loop in your example:
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
if len(spike_times) > 0:
for spike_time in spike_times:
start=max(spike_time-dtbin,0)
end=min(spike_time+dtbin,t)
#idx = np.all([times>=start,times<=end],0)
#datarow[idx] = np.NaN
datarow[int(start/dt):int(end/dt)] = np.NaN
## replaced this with equivalent for testing
means[i] = datarow[~np.isnan(datarow)].mean()
This reduces the run time for me from ~100s to ~1.5s.
You can also shave off a bit more time by vectorizing the loop over spike_times. The effect of this will depend on the characteristics of your data (should be most effective for high spike rates):
kernel = np.ones(20, dtype=bool)
for i in range(N):
spike_times = spiketimes[i]
datarow = data[i]
mask = np.zeros(len(datarow), dtype=bool)
indexes = (spike_times / dt).astype(int)
mask[indexes] = True
mask = np.convolve(mask, kernel)[10:-9]
means[i] = datarow[~mask].mean()

Instead of using nanmean you could just index the values you need and use mean.
means[i] = data[ (times<start) | (times>end) ].mean()
If I misunderstood and you do need your indexing, you might try
means[i] = data[numpy.logical_not( np.all([times>=start,times<=end],0) )].mean()
Also in the code you probably want to not use if len(spike_times) > 0 (I assume you remove the spike time at each iteration or else that statement will always be true and you'll have an infinite loop), only use for spike_time in spike_times.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

While loop cycle not satisfying requirements - python

Related

Amortized O(1) rolling minimum implemented in Python Numba/NumPy

python time results not as expected: time.time() - time.time()

Python - faster alternative to 'for' loops

Python interval interesction

how to create a mask from time points for a numpy array?

Categories

Resources