How to fill array by keypoints - python

I have an array like this:
arr = [[180, 210, 240, 270, 300],[38.7, 38.4, 38.2, 37.9,37.7]]
It contains frame numbers from a video and the value of a sensor recorded during that frame.
I have a program to evaluate the video material and need to know the value of the sensor for each frame. However, the data is only recorded in these large steps not to overwhelm the sensor.
How would I go about creating a computationally cheap function, that returns the sensor value for frames which are not listed? The video is not evaluated from the first frame but some unknown offset.
The data should be filled halfway up and down to the next listed frame, e.g.
Frames 195 - 225 would all use the value recorded for frame 210.
The step is always constant for one video but could vary between videos.
The recording of the sensor starts some time into the video, here 3s in. I wanted to also use that values for all frames before, and similar for the end.
So f(0) == f(180) and f(350) == f(300) for this example.
I don't want to do a binary search through the array every time i need a value.
I also thought about filling a second array with single step in a for loop at the beginning of the program, but the array is much larger than the example above. I am worried about memory consumption and again lookup performance.
This is my try at filing the array at the beginning:
sparse_data = [[180, 210, 240, 270, 300],[38.7, 38.4, 38.2, 37.9,37.7]]
delta = sparse_data[0][1] - sparse_data[0][0]
fr_max = sparse_data[0][-1] + delta
fr_min = sparse_data[0][0]
cur_sparse_idx = 1
self.magn_data = np.zeros(fr_max, dtype=np.float32)
for i in range(fr_max):
if i <= (fr_min + delta//2):
self.magn_data[i] = sparse_data[1][0]
elif i > fr_max:
self.magn_data[i] = sparse_data[1][-1]
else:
if (i+delta//2) % delta == 0: cur_sparse_idx += 1
self.magn_data[i] = sparse_data[1][cur_sparse_idx]
Is there a batter way?

You could define a query function to get the value for a specific frame as follows:
def query(frames: list[int], vals: list[float], frame: int):
# edge cases, based on the array lengths
assert len(frames) == len(vals)
if len(frames) == 1:
return vals[0]
if len(frames) == 0:
return ValueError()
# arrays have at least length 2
delta = frames[1] - frames[0]
start = frames[0]
end = frames[-1]
# edge cases for the frame argument
if frame <= start:
return vals[0]
if frame >= end:
return vals[-1]
# frame is somewhere "inside" the recorded frames
frame -= start
i = frame // delta
if frame % delta > delta // 2:
return vals[i + 1]
else:
return vals[i]
This function uses constant time and memory.
Here are some test cases based on your example, which show how you can use query:
frames, vals = [[180, 210, 240, 270, 300], [38.7, 38.4, 38.2, 37.9, 37.7]]
assert query(frames, vals, 170) == 38.7
assert query(frames, vals, 245) == 38.2
assert query(frames, vals, 196) == 38.4
assert query(frames, vals, 195) == 38.7

Here is a O(1) solution for your problem, it allows you to find the recorded frame that is the closest to the frame you're passing in parameter, and thus the corresponding sensor value.
def get_sensor_value(frame, sparse_data):
#Assuming the step size of frames is unique
#Assuming len(sparse_data[0]) == len(sparse_data[1])
mx, mn = sparse_data[0][-1], sparse_data[0][0]
if frame <= mn:
return sparse_data[1][0]
elif frame >= mx:
return sparse_data[1][-1]
return sparse_data[1][int(round((frame-mn)/(mx-mn)*(len(sparse_data[0])-1)))]
You can make it a one liner if you want to.
Another solution you might consider is linear interpolation.

Related

While loop cycle not satisfying requirements

I am measuring parameters on a battery (current, voltage, etc..) using an analogue to digital converter. The “While loop” cycle contains also the measurements functions which are not shown in this context because are not part of my question.
With the code here below, I am attempting to calculate Ampere/hours on each iteration of the cycle (Ahinst) simply multiplying the measured current by the elapsed time between two measurements. I am also summing up the Ah to get a cumulative value (TotAh) drained from the battery. This last value is shown only when the current (P2) is negative (battery not in charging mode). When the current (P2) reverse into charging mode I clear TotAh and just show 0.
timeMeas=[]
currInst=[]
Ah=[]
TotAh=0
while(True):
try:
#measurements routines are also running here
#......................
#Ah() in development
if (P2 < 0): #if current is negative (discharging)
Tnow = datetime.now() #get time_start reference to calculate elapsed time until next current measure
timeMeas.append (Tnow) #save time_start
currInst.append (P2) #save current at time_start
if (len(currInst) >1): #if current measurements are two
elapsed=(timeMeas[1]-timeMeas[0]).total_seconds() #calculate time elapsed between two mesurements
Ahinst=currInst[1]/3600*elapsed #calculate Ah per time interval
Ah.append(Ahinst) #save Ah per time interval
TotAh=round(sum(Ah),3)* -1 #update cumulative Ah
timeMeas=[] #cleanup data in array
currInst=[] #cleanup data in array
elif (P2 > 0):
TotAh=0
Ah=[]
time.sleep(1)
except KeyboardInterrupt:
break
The code is working but obviously is not giving me the correct result because in the second “if”condition I always clear the two arrays (timeMeas and CurrInst). Since the calculation requires at least two actual measurements “if (len(currInst)>1) ” to work, clearing the two arrays cause to lose one measurement at every iteration of the cycle. I have considered shifting the values position from 0 to 1 in the arrays at every iteration, but this would cause calculation mistakes when the cycle is restarted after the value P2 is reversed to charging and then discharging mode again.
I am very rusty with coding and doing this for hobby. I am battling to find a solution to calculate “Ahinst” at every cycle with the actual values.
Any help is appreciated. Thanks
If you only want to keep two measurements (current and previous) you can keep arrays of size two, and have idx = 1 - idx at the end of the loop to have it flip-flop between 0 and 1.
timeMeas = [None, None]
currInst = [None, None]
TotAh = 0.0
idx = 0
while True: # no need for parentheses
try:
if (P2 < 0):
Tnow = datetime.now()
timeMeas[idx] = Tnow
currInst[idx] = P2
if currInst[1] is not None: #meaning we have at least two measurements
elapsed = (timeMeas[idx]-timeMeas[1-idx]).total_seconds()
TotAh + = currInst[idx]/3600*elapsed
elif (P2 > 0): # is "elif" really correct here?
TotAh = 0.0
# Do we want to reset these, too?
timeMeas = [None, None]
currInst = [None, None]
# should this really be inside the elif?
time.sleep(1)
idx = 1 - idx
except KeyboardInterrupt:
break
In some sense, it would be simpler to have two dict variables curr and prev, and set prev = None when you start or reset them. Then simply set curr = prev at the end of the loop, and populate curr with new values in each iteration, like curr['when'] = datetime.now() and curr['measurement'] = P2.

Calculate column in Pandas Dataframe using adjacent rows without iterating through each row

I would like to see if there is a way to calculate a column in a dataframe that uses something similar to a moving average without iterating through each row.
Current working code:
def create_candles(ticks, instrument, time_slice):
candlesticks = ticks.price.resample(time_slice, base=00).ohlc().bfill()
volume = ticks.amount.resample(time_slice, base=00).sum()
candlesticks['volume'] = volume
candlesticks['instrument'] = instrument
candlesticks['ttr'] = 0
# candlesticks['vr_7'] = 0
candlesticks['vr_10'] = 0
candlesticks = calculate_indicators(candlesticks, instrument, time_slice)
return candlesticks
def calculate_indicators(candlesticks, instrument):
candlesticks.sort_index(inplace=True)
# candlesticks['rsi_14'] = talib.RSI(candlesticks.close, timeperiod=14)
candlesticks['lr_50'] = talib.LINEARREG(candlesticks.close, timeperiod=50)
# candlesticks['lr_150'] = talib.LINEARREG(candlesticks.close, timeperiod=150)
# candlesticks['ema_55'] = talib.EMA(candlesticks.close, timeperiod=55)
# candlesticks['ema_28'] = talib.EMA(candlesticks.close, timeperiod=28)
# candlesticks['ema_18'] = talib.EMA(candlesticks.close, timeperiod=18)
# candlesticks['ema_9'] = talib.EMA(candlesticks.close, timeperiod=9)
# candlesticks['wma_21'] = talib.WMA(candlesticks.close, timeperiod=21)
# candlesticks['wma_12'] = talib.WMA(candlesticks.close, timeperiod=12)
# candlesticks['wma_11'] = talib.WMA(candlesticks.close, timeperiod=11)
# candlesticks['wma_5'] = talib.WMA(candlesticks.close, timeperiod=5)
candlesticks['cmo_9'] = talib.CMO(candlesticks.close, timeperiod=9)
for row in candlesticks.itertuples():
current_index = candlesticks.index.get_loc(row.Index)
if current_index >= 1:
previous_close = candlesticks.iloc[current_index - 1, candlesticks.columns.get_loc('close')]
candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] = max(
row.high - row.low,
abs(row.high - previous_close),
abs(row.low - previous_close))
if current_index > 10:
candlesticks.iloc[current_index, candlesticks.columns.get_loc('vr_10')] = candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] / (
max(candlesticks.high[current_index - 9: current_index].max(), candlesticks.close[current_index - 11]) -
min(candlesticks.low[current_index - 9: current_index].min(), candlesticks.close[current_index - 11]))
candlesticks['timestamp'] = pd.to_datetime(candlesticks.index)
candlesticks['instrument'] = instrument
candlesticks.fillna(0, inplace=True)
return candlesticks
in the iteration, i am calculating the True Range ('TTR') and then the Volatility Ratio ('VR_10')
TTR is calculated on every row in the DF except for the first one. It uses the previous row's close column, and the current row's high and low column.
VR_10 is calculated on every row except for the first 10. it uses the high and low column of the previous 9 rows and the close of the 10th row back.
EDIT 2
I have tried many ways to add a text based data frame in this question, there just doesnt seem to be a solution with the width of my frame. there is no difference in the input and output dataframes other than the column TTR and VR_10 have all 0s in the input, and have non-zero values in the output.
an example would be this dataframe:
Is there a way I can do this without iteration?
With the nudge from Andreas to use rolling, I came to an answer:
first, I had to find out how to use rolling with multiple columns. found that here.
I made a modification because I need to roll up, not down
def roll(df, w, **kwargs):
df.sort_values(by='timestamp', ascending=0, inplace=True)
v = df.values
d0, d1 = v.shape
s0, s1 = v.strides
a = stride(v, (d0 - (w - 1), w, d1), (s0, s0, s1))
rolled_df = pd.concat({
row: pd.DataFrame(values, columns=df.columns)
for row, values in zip(df.index, a)
})
return rolled_df.groupby(level=0, **kwargs)
after that, I created 2 functions:
def calculate_vr(window):
return window.iloc[0].ttr / (max(window.high[1:9].max(), window.iloc[10].close) - min(window.low[1:9].min(), window.iloc[10].close))
def calculate_ttr(window):
return max(window.iloc[0].high - window.iloc[0].low, abs(window.iloc[0].high - window.iloc[1].close), abs(window.iloc[0].low - window.iloc[1].close))
and called those functions like this:
candlesticks['ttr'] = roll(candlesticks, 3).apply(calculate_ttr)
candlesticks['vr_10'] = roll(candlesticks, 11).apply(calculate_vr)
added timers to both ways and this way is roughly 3X slower than iteration.

Python - faster alternative to 'for' loops

I am trying to construct a binomial lattice model in Python. The idea is that there are multiple binomial lattices and based on the value in particular lattice, a series of operations are performed in other lattices.
These operations are similar to 'option pricing model' ( Reference to Black Scholes models) in a way that calculations start at the last column of the lattice and those are iterated to previous column one step at a time.
For example,
If I have a binomial lattice with n columns,
1. I calculate the values in nth column for a single or multiple lattices.
2. Based on these values, I update the values in (n-1)th column in same or other binomial lattices
3. This process continues until I reach the first column.
So in short, I cannot process the calculations for all of the lattice simultaneously as value in each column depends on the values in next column and so on.
From coding perspective,
I have written a function that does the calculations for a particular column in a lattice and outputs the numbers that are used as input for next column in the process.
def column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,
call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new,tau, r, cs, dt,call_trigger,
putPrice,callPrice):
for k in range(1, n+1-tau):
ConvertProb_col_new[n-k] = 0.5*(ConvertProb_col[n-1-k] + ConvertProb_col[n-k])
y_col_new[n-k] = ConvertProb_col_new[n-k]*r + (1- ConvertProb_col_new[n-k]) *(r + cs)
# Calculate the holding value
ContinuationValue_col_new[n-k] = 0.5*(ContinuationValue_col[n-1-k]/(1+y_col[n-1-k]*dt) + ContinuationValue_col[n-k]/(1+y_col[n-k]*dt))
# Coupon payment date
if np.isin(n-1-tau, coupon_dates_index) == True:
ContinuationValue_col_new[n-k] = ContinuationValue_col_new[n-k] + Principal*(1/2*c);
# check put/call schedule
callflag = (np.isin(n-1-tau, call_dates_index)) & (StockPrices_col[n-k] >= call_trigger)
putflag = np.isin(n-1-tau, put_dates_index)
convertflag = np.isin(n-1-tau, convert_dates_index)
# if t is in call date
if (np.isin(n-1-tau, call_dates_index) == True) & (StockPrices_col[n-k] >= call_trigger):
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, min(callPrice, ContinuationValue_col_new[n-k])] )
# if t is not call date
else:
node_val = max([putPrice * putflag, ConversionValue_col[n-k] * convertflag, ContinuationValue_col_new[n-k]] )
# 1. if Conversion happens
if node_val == ConversionValue_col[n-k]*convertflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 1
# 2. if put happens
elif node_val == putPrice*putflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
# 3. if call happens
elif node_val == callPrice*callflag:
ContinuationValue_col_new[n-k] = node_val
ConvertProb_col_new[n-k] = 0
else:
ContinuationValue_col_new[n-k] = node_val
return ConvertProb_col_new, ContinuationValue_col_new, y_col_new
I am calling this function for every column in the lattice through a for loop.
So essentially I am running a nested for loop for all the calculations.
My issue is - This is very slow.
The function doesn't take much time. but the second iteration where I am calling the function through the for loop is very time consuming ( avg. times the function will be iterated in below for loop is close to 1000 or 1500 ) It takes almost 2.5 minutes to run the complete model which is very slow from standard modeling standpoint.
As mentioned above, most of the time is taken by the nested for loop shown below:
temp_mat = np.empty((n,3))*(np.nan)
temp_mat[:,0] = ConvertProb[:, n-1]
temp_mat[:,1] = ContinuationValue[:, n-1]
temp_mat[:,2] = y[:, n-1]
ConvertProb_col_new = np.empty((n,1))*(np.nan)
ContinuationValue_col_new = np.empty((n,1))*(np.nan)
y_col_new = np.empty((n,1))*(np.nan)
for tau in range(1,n):
ConvertProb_col = temp_mat[:,0]
ContinuationValue_col = temp_mat[:,1]
y_col = temp_mat[:,2]
ConversionValue_col = ConversionValue[:, n-tau-1]
StockPrices_col = StockPrices[:, n-tau-1]
out = column_calc(StockPrices_col, ConvertProb_col, y_col, ContinuationValue_col, ConversionValue_col, coupon_dates_index, convert_dates_index ,call_dates_index, put_dates_index, ConvertProb_col_new, ContinuationValue_col_new, y_col_new, tau, r, cs, dt,call_trigger,putPrice,callPrice)
temp_mat[:,0] = out[0].reshape(np.shape(out[0])[0],)
temp_mat[:,1] = out[1].reshape(np.shape(out[1])[0],)
temp_mat[:,2] = out[2].reshape(np.shape(out[2])[0],)
#Final value
print(temp_mat[-1][1])
Is there any way I can reduce the time consumed in nested for loop? or is there any alternative that I can use instead of nested for loop.
Please let me know. Thanks a lot !!!

Speeding a numpy correlation program using the fact that lists are sorted

I am currently using python and numpy for calculations of correlations between 2 lists: data_0 and data_1. Each list contains respecively sorted times t0 and t1.
I want to calculate all the events where 0 < t1 - t0 < t_max.
for time_0 in np.nditer(data_0):
delta_time = np.subtract(data_1, np.full(data_1.size, time_0))
delta_time = delta_time[delta_time >= 0]
delta_time = delta_time[delta_time < time_max]
Doing so, as the list are sorted, I am selecting a subarray of data_1 of the form data_1[index_min: index_max].
So I need in fact to find two indexes to get what I want.
And what's interesting is that when I go to the next time_0, as data_0 is also sorted, I just need to find the new index_min / index_max such as new_index_min >= index_min / new_index_max >= index_max.
Meaning that I don't need to scann again all the data_1.
(data list from scratch).
I have implemented such a solution not using the numpy methods (just with while loop) and it gives me the same results as before but not as fast than before (15 times longer!).
I think as normally it requires less calculation, there should be a way to make it faster using numpy methods but I don't know how to do it.
Does anyone have an idea?
I am not sure if I am super clear so if you have any questions, do not hestitate.
Thank you in advance,
Paul
Here is a vectorized approach using argsort. It uses a strategy similar to your avoid-full-scan idea:
import numpy as np
def find_gt(ref, data, incl=True):
out = np.empty(len(ref) + len(data) + 1, int)
total = (data, ref) if incl else (ref, data)
out[1:] = np.argsort(np.concatenate(total), kind='mergesort')
out[0] = -1
split = (out < len(data)) if incl else (out >= len(ref))
if incl:
out[~split] -= len(data)
split[0] = False
return np.maximum.accumulate(np.where(split, -1, out))[split] + 1
def find_intervals(ref, data, span, incl=(True, True)):
index_min = find_gt(ref, data, incl[0])
index_max = len(ref) - find_gt(-ref[::-1], -span-data[::-1], incl[1])[::-1]
return index_min, index_max
ref = np.sort(np.random.randint(0,20000,(10000,)))
data = np.sort(np.random.randint(0,20000,(10000,)))
span = 2
idmn, idmx = find_intervals(ref, data, span, (True, True))
print('checking')
for d,mn,mx in zip(data, idmn, idmx):
assert mn == len(ref) or ref[mn] >= d
assert mn == 0 or ref[mn-1] < d
assert mx == len(ref) or ref[mx] > d+span
assert mx == 0 or ref[mx-1] <= d+span
print('ok')
It works by
indirectly sorting both sets together
finding for each time in one set the preceding time in the other
this is done using maximum.reduce
the preceding steps are applied twice, the second time the times in
one set are shifted by span

Spawning objects in groups when the first object of the group was spawned randomly Python

I'm currently doing a project, and in the code I have, I'm trying to get trees .*. and mountains .^. to spawn in groups around the first tree or mountain which is spawned randomly, however, I can't figure out how to get the trees and mountains to spawn in groups around a single randomly generated point. Any help?
grid = []
def draw_board():
row = 0
for i in range(0,625):
if grid[i] == 1:
print("..."),
elif grid[i] == 2:
print("..."),
elif grid[i] == 3:
print(".*."),
elif grid[i] == 4:
print(".^."),
elif grid[i] == 5:
print("[T]"),
else:
print("ERR"),
row = row + 1
if row == 25:
print ("\n")
row = 0
return
There's a number of ways you can do it.
Firstly, you can just simulate the groups directly, i.e. pick a range on the grid and fill it with a specific figure.
def generate_grid(size):
grid = [0] * size
right = 0
while right < size:
left = right
repeat = min(random.randint(1, 5), size - right) # *
right = left + repeat
grid[left:right] = [random.choice(figures)] * repeat
return grid
Note that the group size need not to be uniformly distributed, you can use any convenient distribution, e.g. Poisson.
Secondly, you can use a Markov Chain. In this case group lengths will implicitly follow a Geometric distribution. Here's the code:
def transition_matrix(A):
"""Ensures that each row of transition matrix sums to 1."""
copy = []
for i, row in enumerate(A):
total = sum(row)
copy.append([item / total for item in row])
return copy
def generate_grid(size):
# Transition matrix ``A`` defines the probability of
# changing from figure i to figure j for each pair
# of figures i and j. The grouping effect can be
# obtained by setting diagonal entries A[i][i] to
# larger values.
#
# You need to specify this manually.
A = transition_matrix([[5, 1],
[1, 5]]) # Assuming 2 figures.
grid = [random.choice(figures)]
for i in range(1, size):
current = grid[-1]
next = choice(figures, A[current])
grid.append(next)
return grid
Where the choice function is explained in this StackOverflow answer.

Categories