How to iterate over rows of each column in a dataframe

How to iterate over rows of each column in a dataframe - python

My current code functions and produces a graph if there is only 1 sensor, i.e. if col2, and col3 are deleted in the example data provided below, leaving one column.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}
df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)
index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3
baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = DataFrame(baseline)
baseline_num = DataFrame(baseline_num)
v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)
sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000
for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.
v[i] = dfn.loc[i].to_numpy().sum() - sensors
for colname,colitems in df.iteritems():
for rownum,rowitem in colitems.iteritems():
#d[rownum] = dfone.loc[rownum].to_numpy()
#d[colname][rownum] = df.loc[colname][rownum]
if v[rownum] >= sensitivity:
sensors_on = True
off_count = 0
baseline_num[rownum] = 0
else:
sensors_on = False
off_count += 1
if off_count == off_require:
for x in range(0, (off_require)):
baseline_num[colname][rownum] += df[colname][rownum - x]
elif off_count > off_require:
baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established
baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points
dfx = DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)
dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)
Basically, instead of 1 graph as this produces, I would like to iterate over each column only for this loop:
for colname,colitems in df.iteritems():
for rownum,rowitem in colitems.iteritems():
#d[rownum] = dfone.loc[rownum].to_numpy()
#d[colname][rownum] = df.loc[colname][rownum]
if v[rownum] >= sensitivity:
sensors_on = True
off_count = 0
baseline_num[rownum] = 0
else:
sensors_on = False
off_count += 1
if off_count == off_require:
for x in range(0, (off_require)):
baseline_num[colname][rownum] += df[colname][rownum - x]
elif off_count > off_require:
baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established
I've tried some other solutions from other questions but none of them seem to solve this case.
As of now, I've tried multiple conversions to things like lists and tuples, and then calling them something like this:
baseline_num[i,column] += d[i - x,column]
as well as
baseline_num[i][column += d[i - x][column]
while iterating over the loop using
for column in columns
However no matter how I seem to arrange the solution, there is always some keyerror of expecting integer or slice indices, among other errors.
See pictures for expected/possible outputs of one column on actual data.with varying input parameters (sensitivity value, and off_require is varied in different cases.)
One such solution which didn't work was the looping method from this link:
https://www.geeksforgeeks.org/iterating-over-rows-and-columns-in-pandas-dataframe/
I've also tried creating a loop using iteritems as the outer loop. This did not function as well.
Below are links to possible graph outputs for various sensitivity values, and windows in my actual dataset, with only one column. (i.e i manually deleted other columns, and plotted just the one using the current program)
sensitivity 1000, window 8
sensitivity 800, window 5
sensitivity 1500, window 5
If there's anything I've left out that would be helpful to solving this, please let me know so I can rectify it immediately.
See this picture for my original df.head:
df.head

Did you try,
for colname,colitems in df.iteritems():
for rownum,rowitem in colitems.iteritems():
print(df[colname][rownum])
The first loop iterates over all the columns, and the 2nd loops iterates over all the rows for that column.
edit:
From our conversation below, I think that your baseline and df dataframes don't have the same column names because of how you created them and how you are accessing the elements.
My suggestion is that you create the baseline dataframe to be a copy of your df dataframe and edit the information within it from there.
Edit:
I have managed to make your code work for 1 loop, but I run into an index error, I am not sure what your optimisation function does but i think that is what is causing it, take a look.
It is this part baseline_num[colname][rownum - 1], in the second loop i guess because you do rownum (0) -1, you get index -1. You need to change it so that in the first loop rownum is 1 or something, I am not sure what you are trying to do there.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
d = {'col1': [-2587.944231, -1897.324231,-2510.304231,-2203.814231,-2105.734231,-2446.964231,-2963.904231,-2177.254231, 2796.354231,-2085.304231], 'col2': [-3764.468462,-3723.608462,-3750.168462,-3694.998462,-3991.268462,-3972.878462,3676.608462,-3827.808462,-3629.618462,-1841.758462,], 'col3': [-166.1357692,-35.36576923, 321.4157692,108.9257692,-123.2257692, -10.84576923, -100.7457692, 89.27423077, -211.0857692, 101.5342308]}
df = pd.DataFrame(data=d)
sensors = 3
window_size = 5
dfn = df.rolling(window_size).corr(pairwise = True)
index = df.index #index of values in the data frame.
rows = len(index) #len(index) returns number of rows in the data.
sensors = 3
baseline_num = [0]*(rows) #baseline numerator, by default zero
baseline = [0]*(rows) #initialize baseline value
baseline = pd.DataFrame(df)
baseline_num = pd.DataFrame(df)
#print(baseline_num)
v = [None]*(rows) # Initialize an empty array v[] equal to amount of rows in .csv file
s = [None]*(rows) #Initialize another empty array for the slope values for detecting when there is an exposure
d = [0]*(rows)
sensors_on = True #Is the sensor detecting something (True) or not (False).
off_count = 0
off_require = 8 # how many offs until baseline is updated
sensitivity = 1000
for i in range(0, (rows)): #This iterates over each index value, i.e. each row, and sums the values and returns them in list format.
v[i] = dfn.loc[i].to_numpy().sum() - sensors
for colname,colitems in df.iteritems():
#print(colname)
for rownum,rowitem in colitems.iteritems():
#print(rownum)
#display(baseline[colname][rownum])
#d[rownum] = dfone.loc[rownum].to_numpy()
#d[colname][rownum] = df.loc[colname][rownum]
if v[rownum] >= sensitivity:
sensors_on = True
off_count = 0
baseline_num[rownum] = 0
else:
sensors_on = False
off_count += 1
if off_count == off_require:
for x in range(0, (off_require)):
baseline_num[colname][rownum] += df[colname][rownum - x]
elif off_count > off_require:
baseline_num[colname][rownum] += baseline_num[colname][rownum - 1] + df[colname][rownum] - (df[colname][rownum - off_require]) #this loop is just an optimization, one calculation per loop once the first calculation is established
baseline[colname][rownum] = ((baseline_num[colname][rownum])//(off_require)) #mean of the last "off_require" points
print(baseline[colname][rownum])
dfx = pd.DataFrame(v, columns =['Sensor Correlation']) #converts the summed correlation tables back from list format to a DataFrame, with the sole column name 'Sensor Correlation'
dft = pd.DataFrame(baseline, columns =['baseline'])
dft = dft.astype(float)
dfx.plot(figsize=(50,25), linewidth=5, fontsize=40) # plots dfx dataframe which contains correlated and summed data
dft.plot(figsize=(50,25), linewidth=5, fontsize=40)
My output looks like this,
-324.0
-238.0
-314.0
-276.0
-264.0
-306.0
-371.0
-806.0
638.0
-412.0
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
354 try:
--> 355 return self._range.index(new_key)
356 except ValueError as err:
ValueError: -1 is not in range
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
3 frames
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/range.py in get_loc(self, key, method, tolerance)
355 return self._range.index(new_key)
356 except ValueError as err:
--> 357 raise KeyError(key) from err
358 raise KeyError(key)
359 return super().get_loc(key, method=method, tolerance=tolerance)
KeyError: -1

I don't have enough rep to comment, but below is what I was able to work out. Hope it helps!
I tried to use the to_list() function while working out an answer, and it threw me an error:
AttributeError: 'DataFrame' object has no attribute 'to_list'
So, I decided to circumvent that method and came up with this:
indexes = [x for x in df.index]
row_vals = []
for index in indexes :
for val in df.iloc[i].values:
row_vals.append(val)
The object row_vals will contain all values in row order.
If you only want to get the row values for a particular row or set of rows, you would need to do this:
indx_subset = [`list of row indices`] #(Ex. [1, 2, 5, 6, etc...])
row_vals = []
for indx in indx_subset:
for val in df.loc[indx].values:
row_vals.append(val)
row_vals will then have all the row values from the specified indices.

Related

Problem with for-if loop statement operation on pandas dataframe

I have a dataset which I want to create a new column that is based on a division of two other columns using a for-loop with if-conditions.
This is the dataset, with the empty 'solo_fare' column created beforehand.
The task is to loop through each row and divide 'Fare' by 'relatives' to get the per-passenger fare. However, there are certain if-conditions to follow (passengers in this category should see per-passenger prices of between 3 and 8)
The code I have tried here doesn't seem to fill in the 'solo_fare' rows at all. It returns an empty column (same as above df).
for i in range(0, len(fare_result)):
p = fare_result.iloc[i]['Fare']/fare_result.iloc[i]['relatives']
q = fare_result.iloc[i]['Fare']
r = fare_result.iloc[i]['relatives']
# if relatives == 0, return original Fare amount
if (r == 0):
fare_result.iloc[i]['solo_fare'] = q
# if the divided fare is below 3 or more than 8, return original Fare amount again
elif (p < 3) or (p > 8):
fare_result.iloc[i]['solo_fare'] = q
# else, return the divided fare to get solo_fare
else:
fare_result.iloc[i]['solo_fare'] = p
How can I get this to work?

You should probably not use a loop for this but instead just use loc
if you first create the 'solo fare' column and give every row the default value from Fare you can then change the value for the conditions you have set out
fare_result['solo_fare'] = fare_result['Fare']
fare_results.loc[(
(fare_results.Fare / fare_results.relatives) >= 3) & (
(fare_results.Fare / fare_results.relatives) <= 8), 'solo_fare'] = (
fare_results.Fare / fare_results.relatives)

Did you try to initialize those new colums first ?
By that I mean that the statement fare_result.iloc[i]['solo_fare'] = q
only means that you are assigning the value q to the field solo_fare of the line i
The issue there is that at this moment, the line i does not have any solo_fare key. Hence, you are only filling the last value of your table here.
To solve this issue, try declaring the solo_fare column before the for loop like:
fare_result['solo_fare'] = np.nan

One way to do is to define a row-wise function, and apply it to the dataframe:
# row-wise function (mockup)
def foo(fare, relative):
# your logic here. Mine just serves as example
if relative > 100:
res = fare/relative
elif (relative < 10):
res = fare
else:
res = 10
return res
Then apply it to the dataframe (row-wise):
fare_result['solo_fare'] = fare_result.apply(lambda row: foo(row['Fare'], row['relatives']) , axis=1)

Remove following rows that are above or under by X amount from the current row['x']

I am calculating correlations and the data frame I have needs to be filtered.
I am looking to remove the rows under the current row from the data frame that are above or under by X amount starting with the first row and looping through the dataframe all the way until the last row.
example:
df['y'] has the values 50,51,52,53,54,55,70,71,72,73,74,75
if X = 10 it would start at 50 and see 51,52,53,54,55 as within that 10+- range and delete the rows. 70 would stay as it is not within that range and the same test would start again at 70 where 71,72,73,74,75 and respective rows would be deleted
the filter if X=10 would thus leave us with the rows including 50,75 for df.
It would leave me with a clean dataframe that deletes the instances that are linked to the first instance of what is essentially the same observed period. I tried coding a loop to do that but I am left with the wrong result and desperate at this point. Hopefully someone can correct the mistake or point me in the right direction.
df6['index'] = df6.index
df6.sort_values('index')
boom = len(dataframe1.index)/3
#Taking initial comparison values from first row
c = df6.iloc[0]['index']
#Including first row in result
filters = [True]
#Skipping first row in comparisons
for index, row in df6.iloc[1:].iterrows():
if c-boom <= row['index'] <= c+boom:
filters.append(False)
else:
filters.append(True)
# Updating values to compare based on latest accepted row
c = row['index']
df2 = df6.loc[filters].sort_values('correlation').drop('index', 1)
df2
OUTPUT BEFORE
OUTPUT AFTER

IIUC, your main issue is to filter consecutive values within a threshold.
You can use a custom function for that that acts on a Series (=column) to return the list of valid indices:
def consecutive(s, threshold = 10):
prev = float('-inf')
idx = []
for i, val in s.iteritems():
if val-prev > threshold:
idx.append(i)
prev = val
return idx
Example of use:
import pandas as pd
df = pd.DataFrame({'y': [50,51,52,53,54,55,70,71,72,73,74,75]})
df2 = df.loc[consecutive(df['y'])]
Output:
y
0 50
6 70
variant
If you prefer the function to return a boolean indexer, here is a varient:
def consecutive(s, threshold = 10):
prev = float('-inf')
idx = [False]*len(s)
for i, val in s.iteritems():
if val-prev > threshold:
idx[i] = True
prev = val
return idx

Pandas: If statments with multiple criteria

I am trying to figure out a way to write an if statement based on a couple criteria. I have a large CSV file that I have cleaned and already organized. There are a couple things I need to do:
I first need a way that will check to see if the machine is "on" for more than 3 rows. If that is true then I need to get its corredsponding pressure for that cycle and find the average of it. For example, in the df aboce, in rows 14-19 the machine is on for more than 3 rows so then I need to get the average pressure across all instances in this period.
This data contains 40,000 rows, so I will then need this to cycle through the entire sheet to keep track of the total number of times the machine is on, and each corresponding average pressure.
import pandas as pd
import numpy as np
header_list = ['Time']
df = pd.read_csv('S8-1.csv' , skiprows=6 , names = header_list)
#splits the data into proper columns
df[['Date/Time','Pressure']] = df.Time.str.split(",,", expand=True)
#deletes orginal messy column
df.pop('Time')
#convert Pressure from object to numeric
df['Pressure'] = pd.to_numeric(df['Pressure'], errors = 'coerce')
#converts to a time
df['Date/Time'] = pd.to_datetime(df['Date/Time'], format = '%m/%d/%y %H:%M:%S.%f' , errors = 'coerce')
df['Moving Average'] = df['Pressure'].rolling(window=5).mean()
df['Rolling Average Center']= df['Pressure'].rolling(window=5, center=True).mean()
df['Machine On/Off'] = ['1' if x >= 115 else '0' for x in df['Rolling Average Center'] ]
arr = df['Machine On/Off']
def find_runs(x):
x = np.asanyarray(x)
if x.ndim !=1:
raise Valueerror('Only 1D array supported')
n = x.shape[0]
if n == 0:
return np.array([]), np.array([]), np.array([])
else:
loc_run_start = np.empty(n, dtype=bool)
loc_run_start[0] = True
np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
run_starts = np.nonzero(loc_run_start)[0]
# find run values
run_values = x[loc_run_start]
# find run lengths
run_lengths = np.diff(np.append(run_starts, n))
return run_values, run_starts, run_lengths
run = find_runs(arr)
df.iloc[_start:run_length]['whatever column']

Suggested first step: make new column with ones and zeros... 1 for on, 0 for off.
df['newcolumnname'] = 0
df['newcolumnname'][df['machine on/off'] == 'on'] = 1
Grab that column as a numpy array:
arr = df['newcolumnname'].to_numpy()
Then using the following code credit: https://gist.github.com/alimanfoo/c5977e87111abe8127453b21204c1065
import numpy as np
def find_runs(x):
"""Find runs of consecutive items in an array."""
# ensure array
x = np.asanyarray(x)
if x.ndim != 1:
raise ValueError('only 1D array supported')
n = x.shape[0]
# handle empty array
if n == 0:
return np.array([]), np.array([]), np.array([])
else:
# find run starts
loc_run_start = np.empty(n, dtype=bool)
loc_run_start[0] = True
np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
run_starts = np.nonzero(loc_run_start)[0]
# find run values
run_values = x[loc_run_start]
# find run lengths
run_lengths = np.diff(np.append(run_starts, n))
return run_values, run_starts, run_lengths
get the "runs"...should only have zeros and ones.
run_values, run_starts, run_lengths = find_runs(arr)
print(run_values) #just to see what order they are in
print(run_starts.shape)#get an idea of the shapes
print(run_lengths.shape)
_ix = run_values.tolist().index(1)
To get data from your pandas,
for _start, _run_length in zip(run_starts[_ix], run_lengths[_ix]):
tmp_df = df.iloc[_start:_start+_run_length]
#do what you want
print(tmp_df)
break #remove this...I just don't know how big your data is
You're correct when you are thinking "this answer does not run". But it should give you enough to get going and solve your problem.
Incomplete question gets an incomplete response.
TODO: make your machine on/off to be integer...not sure if it is absolutely needed, but if it's a numerical numpy array then there are more options available to you.
df['Machine On/Off'] = [1 if x >= 115 else 0 for x in df['Rolling Average Center'] ]

Subset a row based on the column with similar name

Assuming a pandas dataframe like the one in the picture, I would like to fill the na values based with the value of the other variable similar to it. To be more clear, my variables are
mean_1, mean_2 .... , std_1, std_2, ... min_1, min_2 ...
So I would like to fill the na values with the values of the other columns, but not all the columns, only those whose represent the same metric, in the picture i highligted 2 na values. The first one I would like to fill it with the mean obtain from the variables 'MEAN' at row 2, while the second na I would like to fill it with the mean obtain from variable 'MIN' at row 9. Is there a way to do it?

you can find the unique prefixes, iterate through each and do fillna for subsets seperately
uniq_prefixes = set([x.split('_')[0] for x in df.columns])
for prfx in uniq_prefixes:
mask = [col for col in df if col.startswith(prfx)]
# Transpose is needed because row wise fillna is not implemented yet
df.loc[:,mask] = df[mask].T.fillna(df[mask].mean(axis=1)).T

Yes, it is possible doing it using the loop. Below is the naive approach, but even for fancier ones, it is not much optimisation (at least I don't see them).
for i, row in df.iterrows():
sum_means = 0
n_means = 0
sum_stds = 0
n_stds = 0
fill_mean_idxs = []
fill_std_idxs = []
for idx, item in item.iteritems():
if idx.startswith('mean') and item is None:
fill_mean_idxs.append(idx)
elif idx.startswith('mean'):
sum_means += float(item)
n_means += 1
elif idx.startswith('std') and item is None:
fill_std_idxs.append(idx)
elif idx.startswith('std'):
sum_stds += float(item)
n_stds += 1
ave_mean = sum_means / n_means
std_mean = sum_stds / n_stds
for idx in fill_mean_idx:
df.loc[i, idx] = ave_mean
for idx in fill_std_idx:
df.loc[i, idx] = std_mean

Calculate column in Pandas Dataframe using adjacent rows without iterating through each row

I would like to see if there is a way to calculate a column in a dataframe that uses something similar to a moving average without iterating through each row.
Current working code:
def create_candles(ticks, instrument, time_slice):
candlesticks = ticks.price.resample(time_slice, base=00).ohlc().bfill()
volume = ticks.amount.resample(time_slice, base=00).sum()
candlesticks['volume'] = volume
candlesticks['instrument'] = instrument
candlesticks['ttr'] = 0
# candlesticks['vr_7'] = 0
candlesticks['vr_10'] = 0
candlesticks = calculate_indicators(candlesticks, instrument, time_slice)
return candlesticks
def calculate_indicators(candlesticks, instrument):
candlesticks.sort_index(inplace=True)
# candlesticks['rsi_14'] = talib.RSI(candlesticks.close, timeperiod=14)
candlesticks['lr_50'] = talib.LINEARREG(candlesticks.close, timeperiod=50)
# candlesticks['lr_150'] = talib.LINEARREG(candlesticks.close, timeperiod=150)
# candlesticks['ema_55'] = talib.EMA(candlesticks.close, timeperiod=55)
# candlesticks['ema_28'] = talib.EMA(candlesticks.close, timeperiod=28)
# candlesticks['ema_18'] = talib.EMA(candlesticks.close, timeperiod=18)
# candlesticks['ema_9'] = talib.EMA(candlesticks.close, timeperiod=9)
# candlesticks['wma_21'] = talib.WMA(candlesticks.close, timeperiod=21)
# candlesticks['wma_12'] = talib.WMA(candlesticks.close, timeperiod=12)
# candlesticks['wma_11'] = talib.WMA(candlesticks.close, timeperiod=11)
# candlesticks['wma_5'] = talib.WMA(candlesticks.close, timeperiod=5)
candlesticks['cmo_9'] = talib.CMO(candlesticks.close, timeperiod=9)
for row in candlesticks.itertuples():
current_index = candlesticks.index.get_loc(row.Index)
if current_index >= 1:
previous_close = candlesticks.iloc[current_index - 1, candlesticks.columns.get_loc('close')]
candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] = max(
row.high - row.low,
abs(row.high - previous_close),
abs(row.low - previous_close))
if current_index > 10:
candlesticks.iloc[current_index, candlesticks.columns.get_loc('vr_10')] = candlesticks.iloc[current_index, candlesticks.columns.get_loc('ttr')] / (
max(candlesticks.high[current_index - 9: current_index].max(), candlesticks.close[current_index - 11]) -
min(candlesticks.low[current_index - 9: current_index].min(), candlesticks.close[current_index - 11]))
candlesticks['timestamp'] = pd.to_datetime(candlesticks.index)
candlesticks['instrument'] = instrument
candlesticks.fillna(0, inplace=True)
return candlesticks
in the iteration, i am calculating the True Range ('TTR') and then the Volatility Ratio ('VR_10')
TTR is calculated on every row in the DF except for the first one. It uses the previous row's close column, and the current row's high and low column.
VR_10 is calculated on every row except for the first 10. it uses the high and low column of the previous 9 rows and the close of the 10th row back.
EDIT 2
I have tried many ways to add a text based data frame in this question, there just doesnt seem to be a solution with the width of my frame. there is no difference in the input and output dataframes other than the column TTR and VR_10 have all 0s in the input, and have non-zero values in the output.
an example would be this dataframe:
Is there a way I can do this without iteration?

With the nudge from Andreas to use rolling, I came to an answer:
first, I had to find out how to use rolling with multiple columns. found that here.
I made a modification because I need to roll up, not down
def roll(df, w, **kwargs):
df.sort_values(by='timestamp', ascending=0, inplace=True)
v = df.values
d0, d1 = v.shape
s0, s1 = v.strides
a = stride(v, (d0 - (w - 1), w, d1), (s0, s0, s1))
rolled_df = pd.concat({
row: pd.DataFrame(values, columns=df.columns)
for row, values in zip(df.index, a)
})
return rolled_df.groupby(level=0, **kwargs)
after that, I created 2 functions:
def calculate_vr(window):
return window.iloc[0].ttr / (max(window.high[1:9].max(), window.iloc[10].close) - min(window.low[1:9].min(), window.iloc[10].close))
def calculate_ttr(window):
return max(window.iloc[0].high - window.iloc[0].low, abs(window.iloc[0].high - window.iloc[1].close), abs(window.iloc[0].low - window.iloc[1].close))
and called those functions like this:
candlesticks['ttr'] = roll(candlesticks, 3).apply(calculate_ttr)
candlesticks['vr_10'] = roll(candlesticks, 11).apply(calculate_vr)
added timers to both ways and this way is roughly 3X slower than iteration.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to iterate over rows of each column in a dataframe - python

Related

Problem with for-if loop statement operation on pandas dataframe

Remove following rows that are above or under by X amount from the current row['x']

Pandas: If statments with multiple criteria

Subset a row based on the column with similar name

Calculate column in Pandas Dataframe using adjacent rows without iterating through each row

Categories

Resources