I need help in the following code for finding the max value in the window of previous 5 rows. i don't why its not working. Could anyone please help?
I am trying to put the conditions as:
if day(1), then max value = value[day(1)]
elif day(1<n<=5), then max value = value[day(n)] if value[day(n)]>value[day(n-1)]
else, max value will be the last max value iterated. Thanks for your time. Also, len(src) is 29969, if required.
def high_change(src, lkbk) :
highest_high = []
last_val = np.nan
for i in range(len(src)) :
for a in range(i, i+lkbk) :
if a == i :
highest_high = high_df1[a] # first day value is max value
last_val = high_df1[a]
elif high_df1[a] > high_df1[(a-1)] :
highest_high = high_df1[a] # then max high value in ref to previous value.
else :
highest_high = last_val
return highest_high
df1['h_h'] = pd.Series(perc_change(df1, 5))
answering my question, i got the result from the following code, hope it helps and any better code is welcomed.
def high_change(i, j) :
last_hval = np.nan
for a in range(i, j) :
if a == i :
last_hval = high_df1[a]
elif high_df1[a] > high_df1[(a-1)] :
last_hval = high_df1[a]
else :
last_hval
return last_hval
def perc_change(src, lkbk) :
highest_high = []
for i in range(len(src)) :
if i < lkbk :
highest_high.append(np.nan)
else :
highest_high.append(high_change(i-lkbk, i))
return highest_high
df['h_h'] = pd.Series(perc_change(df, 5), dtype=float).round(2)
First, compute if the value of current day is greater than the value of the previous day. Apply cumsum to get increasing values and rolling with a window of 6 (last 5 previous days and the current one). Finally, get the max value of the window excluding the current day:
WINDOW = 5
df['highest5'] = df['high'].gt(df['high'].shift()).cumsum() \
.rolling(WINDOW+1) \
.apply(lambda x: df.loc[x[:WINDOW].idxmax(), 'high'])
>>> df
high highest5
0 13996 NaN
1 14021 NaN
2 14019 NaN
3 14013 NaN
4 14019 NaN
5 14018 14019.0
6 14022 14019.0
7 14023 14022.0
8 14021 14023.0
9 14020 14023.0
10 14014 14023.0
Related
I am trying to have a rolling average of all the highs of ['RSIndex'] greater than 52, series will have NaN value if the first ref value is less than 52, but I want to have the previous iterated value of ['high_r'] that the function has generated if any other ref value is less than 52. If, anyone has any solution to this, i ll be really grateful. Thanks
def AvgHigh(src, val) :
for a in range(len(src)) :
if src[a] > val :
yield src[a]
elif src[a] <= val and a == 0 :
yield np.nan
elif src[a] <= val and a != 0 :
yield src[a-1]
df1['high_r'] = pd.Series(AvgHigh(df1['RSIndex'], 52))
df1['RSI_high'] = df1['high_r'].rolling(window = 25, min_periods = 25).mean()
Desired Output:
You can use np.where() and .shift() to simply your codes, as follows:
df1['high_r'] = np.where(df1['RSIndex'] > 52, df1['RSIndex'], df1['RSIndex'].shift())
df1['RSI_high'] = df1['high_r'].rolling(window = 25, min_periods = 25).mean()
np.where() checks for the condition of first parameter, if the condition is true, it uses values from the second parameter [similar to your if statement in the loop]. When the condition is false [similar to your 2 elif statements in the loop]., it uses the values from the third parameter.
.shift() takes the previous row value [similar to your second elif statement] and for the first value where there is no previous, it takes the value given by its fill_value parameter, which defaults to use np.nan for column of numeric values. As such, it achieves the same effect of your first elif to set np.nan for first entry.
Edit
If you want the values to take from last loop iteration instead of initial column values, you can define a list to accumulate the values of the loop, as follows:
def AvgHigh(src, val) :
dat_list = []
last_src = np.nan # init variable that keeps the prev iteration value
for a in range(len(src)) :
if src[a] > val :
# yield src[a]
dat_list.append(src[a])
last_src = src[a] # update prev iteration value (for subsequent iteration(s))
elif (src[a] <= val) and (a == 0) :
# yield np.nan
dat_list.append(np.nan)
elif (src[a] <= val) and (a != 0) :
# yield src[a-1]
dat_list.append(last_src)
return dat_list
df1['high_r'] = AvgHigh(df1['RSIndex'], 52)
df1['RSI_high'] = df1['high_r'].rolling(window = 25, min_periods = 25).mean()
I have created an array d:
d = [[[1.60269836 1.97347391 1.76414466 1.53102548 1.35352821]
[1.0153325 1.53331695 1.36105004 1.76111151 1.62595392]
[1.5156144 1.77076004 1.24249056 1.94406171 1.98917422]]
[[1.44790465 1.46990159 1.48156613 1.92963951 1.11459211]
[1.10674091 1.57711027 1.85275685 1.84640848 1.34216641]
[1.63670185 1.69894884 1.45114395 1.09750849 1.09564564]]]
Which have a max, min and mean:
Max value of d is: 1.98917422158341
Min value of d is: 1.0153325043494292
The mean of d is: 1.5377490722289062
Also I created an empty array f:
f = np.empty((2,3,5))
f = [[[1.60269836 1.97347391 1.76414466 1.53102548 1.35352821]
[1.0153325 1.53331695 1.36105004 1.76111151 1.62595392]
[1.5156144 1.77076004 1.24249056 1.94406171 1.98917422]]
[[1.44790465 1.46990159 1.48156613 1.92963951 1.11459211]
[1.10674091 1.57711027 1.85275685 1.84640848 1.34216641]
[1.63670185 1.69894884 1.45114395 1.09750849 1.09564564]]]
With all of this, I need to check for each value in d and:
If this d value it's larger than d_min but smaller than d_mean, assign 25 to the corresponding value in f.
If a value in d is larger than d_mean but smaller than d_max, assign 75 to the corresponding value in f.
If a value equals to d_mean, assign 50 to the corresponding value in f.
Assign 0 to the corresponding value(s) in f for d_min in d.
Assign 100 to the corresponding value(s) in f for d_max in d.
In the end, f should have only the following values: 0, 25, 50, 75, and 100.
Up to now, my best solution has been:
for value in d:
if value.any() > d_min and value.any() < d_mean:
value = 25
value.append(f)
if value.any() > d_mean and value.any() < d_max:
value = 75
value.append(f)
if value.any() == d_mean:
value = 50
value.append(f)
if value.any() == d_min:
value = 0
value.append(f)
if value.any() == d_max:
value = 100
value.append(f)
I don't receive any error but neither the result that I need. I'm just starting to learn python and numpy so I'm sure it will exist a better way to do it but after reading the documentation and look for many examples, I don't get the solution.
Is that what you looking for ?
for i, value_i in enumerate(d):
for j, value_j in enumerate(value_i):
for index, value in enumerate(value_j):
if value > d_min and value < d_mean:
value = 25
f[i][j][index] = value
if value > d_mean and value < d_max:
value = 75
f[i][j][index] = value
if value == d_mean:
value = 50
f[i][j][index] = value
if value == d_min:
value = 0
f[i][j][index] = value
if value == d_max:
value = 100
f[i][j][index] = value
In a panda series it should go through the series and stop if one value has increased 5 times. With a simple example it works so far:
list2 = pd.Series([2,3,3,4,5,1,4,6,7,8,9,10,2,3,2,3,2,3,4])
def cut(x):
y = iter(x)
for i in y:
if x[i] < x[i+1] < x[i+2] < x[i+3] < x[i+4] < x[i+5]:
return x[i]
break
out = cut(list2)
index = list2[list2 == out].index[0]
So I get the correct Output of 1 and Index of 5.
But if I use a second list with series type and instead of (19,) which has (23999,) values then I get the Error:
pandas._libs.hashtable.Int64HashTable.get_item
KeyError: 3489660928
You can do something like this:
# compare list2 with the previous values
s = list2.gt(list2.shift())
# looking at last 5 values
s = s.rolling(5).sum()
# select those equal 5
list2[s.eq(5)]
Output:
10 9
11 10
dtype: int64
The first index where it happens is
s.eq(5).idxmax()
# output 10
Also, you can chain them together:
(list2.gt(list2.shift())
.rolling(5).sum()
.eq(5).idxmax()
)
I started learning Python < 2 weeks ago.
I'm trying to make a function to compute a 7 day moving average for data. Something wasn't going right so I tried it without the function.
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
if j == (i+6):
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
If I run this and look at the value of sum_7, it's just a single value in the numpy array which made all the moving_average values wrong. But if I remove the first for loop with the variable i and manually set i = 0 or any number in the range of the data set and run the exact same code from the inner for loop, sum_7 comes out as a length 7 numpy array. Originally, I just did sum += temp[j] but the same problem occurred, the total sum ended up as just the single value.
I've been staring at this trying to fix it for 3 hours and I'm clueless what's wrong. Originally I wrote the function in R so all I had to do was convert to python language and I don't know why sum_7 is coming up as a single value when there are two for loops. I tried to manually add an index variable to act as i to use it in the range(i, i+7) but got some weird error instead. I also don't know why that is.
https://gyazo.com/d900d1d7917074f336567b971c8a5cee
https://gyazo.com/132733df8bbdaf2847944d1be02e57d2
Hey you can using rolling() function and mean() function from pandas.
Link to the documentation :
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.rolling.html
df['moving_avg'] = df['your_column'].rolling(7).mean()
This would give you some NaN values also, but that is a part of rolling mean because you don't have all past 7 data points for first 6 values.
Seems like you misindented the important line:
moving_average = np.array([])
i = 0
for i in range(len(temp)-6):
sum_7 = np.array([])
avg_7 = 0
missing = 0
total = 7
j = 0
for j in range(i,i+7):
if pd.isnull(temp[j]):
total -= 1
missing += 1
if missing == 7:
moving_average = np.append(moving_average, np.nan)
break
# The following condition should be indented one more level
if not pd.isnull(temp[j]):
sum_7 = np.append(sum_7, temp[j])
#^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
if j == (i+6):
# this ^ condition does not do what you meant
# you should use a flag instead
avg_7 = sum(sum_7)/total
moving_average = np.append(moving_average, avg_7)
Instead of a flag you can use a for-else construct, but this is not readable. Here's the relevant documentation.
Shorter way to do this:
moving_average = np.array([])
for i in range(len(temp)-6):
ngram_7 = [t for t in temp[i:i+7] if not pd.isnull(t)]
average = (sum(ngram_7) / len(ngram_7)) if ngram_7 else np.nan
moving_average = np.append(moving_average, average)
This could be refactored further:
def average(ngram):
valid = [t for t in temp[i:i+7] if not pd.isnull(t)]
if not valid:
return np.nan
return sum(valid) / len(valid)
def ngrams(seq, n):
for i in range(len(seq) - n):
yield seq[i:i+n]
moving_average = [average(k) for k in ngrams(temp, 7)]
I am trying to figure out what is wrong with my code. Currently, I am trying to get the averages of everything with the same temp (ex temp 18 = 225 conductivity average, temp 19 = 15 conductivity average, etc).
Could someone tell me if this is a simple coding mistake or a algorithm mistake and offer some help to fix this problem?
temp = [18,18,19,19,20]
conductivity = [200,250,20,10,15]
tempcheck = temp[0];
conductivitysum = 0;
datapoint = 0;
assert len(temp) == len(conductivity)
for i in range(len(temp)):
if tempcheck == temp[i]:
datapoint+=1
conductivitysum+=conductivity[i]
else:
print conductivitysum/datapoint
datapoint=0
conductivitysum=0
tempcheck=temp[i]
For some reason, it is printing out
225
10
When it should be printing out
225
15
15
in else clause
put :
conductivitysum=0
datapoint=0
tempcheck = temp[i]
conductivitysum+=conductivity[i]
datapoint+=1
because when you go to else clause, you miss that particular conductivity of i. It doesn't get saved. So before moving to next i, save that conductivity
Change the else to:
for i in range(len(temp)):
if tempcheck == temp[i]:
datapoint+=1
conductivitysum+=conductivity[i]
else:
print conductivitysum/datapoint
datapoint=1
conductivitysum=conductivity[i]
tempcheck=temp[i]
When you get to the pair (19, 20) you need to keep them and count one datapoint, not 0 datapoints. At the moment you are skipping them and only keeping the next one - (19, 10).
Alternatively, rewrite it as
>>> temp = [18,18,19,19,20]
>>> conductivity = [200,250,20,10,15]
# build a dictionary to group the conductivities by temperature
>>> groups = {}
>>> for (t, c) in zip(temp, conductivity):
... groups[t] = groups.get(t, []) + [c]
...
# view it
>>> groups
{18: [200, 250], 19: [20, 10], 20: [15]}
# average the conductivities for each temperature
>>> for t, cs in groups.items():
print t, float(sum(cs))/len(cs)
...
18 225
19 15
20 15
>>>
When I saw this code, the first thing that popped into my head was the zip function. I hope that the following code is what you want.
temp = [18,18,19,19,20]
conductivity = [200,250,20,10,15]
assert len(temp) == len(conductivity)
# Matches each temp value to its corresponding conductivity value with zip
relations = [x for x in zip(temp, conductivity)]
for possible_temp in set(temp): # Takes each possible temparature (18,19,20)
total = 0
divide_by = 0
# The next four lines of code will check each match and figure out the
# summed total conductivity value for each temp value and how much it
# should be divided by to create an average.
for relation in relations:
if relation[0] == possible_temp:
total += relation[1]
divide_by += 1
print(int(total / divide_by))