The below code (calculation of moving average over N-days) works for well. But I want to replace other numbers (e.g., 5, 10, 20, etc.) with 50. Not sure if I can turn the below code into something in for loop. Could anybody please help me?
df['ma50pfret']= df['ret']
df.loc[df.adjp >= df.ma50, 'adjp > ma50']= 1
df.loc[df.adjp < df.ma50, 'adjp > ma50']= 0
df.iloc[0, -1]= 1
df['adjp > ma50']= df['adjp > ma50'].astype(int)
df.loc[df['adjp > ma50'].shift(1)== 0, 'ma50pfret']= 1.000079 # 1.02**(1/250)
df['cum_ma50pfret']=df['ma50pfret'].cumprod()
df.head(10)
Do you just mean to replace the 50's with 5, 10, 20, etc? If so, that could be done by always using brackets to access columns, and using f-strings (or some other string formatting method) to replace 50 with the other numbers, like this:
for num in [5, 10, 20, 50]:
df[f'ma{num}pfret']= df['ret']
df.loc[df.adjp >= df[f'ma{num}'], f'adjp > ma{num}']= 1
df.loc[df.adjp < df[f'ma{num}], f'adjp > ma{num}']= 0
df.iloc[0, -1]= 1
df[f'adjp > ma{num}']= df[f'adjp > ma{num}'].astype(int)
df.loc[df[f'adjp > ma{num}'].shift(1)== 0, f'ma{num}pfret']= 1.000079 # 1.02**(1/250)
df[f'cum_ma{num}pfret']=df[f'ma{num}pfret'].cumprod()
df.head(10)
Related
I have a problem changing my data into binary. It's not so complicated just using basic math; like if a = 60, then the result is "good", and when a >= 60, then it is "very good", but this is implemented in the data below:
This is my data; I want to change new_cases data to be binary value when the data >=1; I want the result to be 1, but when I use
Dt[Dt['new_cases'] >= 1 ] = 1
It doesn't work.
Please, is anyone able to run it? Any ideas? What could be causing this issue?
Thanks!
You have to specify the column where you want to change the values:
Dt.loc[Dt['new_cases'] >= 1, 'new_cases'] = 1
Use
Dt["new_cases"] = Dt["new_cases"].apply(lambda x: 1 if x >= 1 else 0)
OR
Dt["new_cases"] = 1
Dt.loc[Dt["new_cases"] < 1, "new_cases"] = 0
So I am trying to change some values in a df using pandas and, having already tried with df.replace, df.mask, and df.where I got to the conclusion that it must be a logical mistake since it keeps throwing the same mistake:
ValueError: The truth value of a Series is ambiguous.
I am trying to normalize a column in a dataset, thus the function and not just a single line. I need to understand why my logic is wrong, it seems to be such a dumb mistake.
This is my function:
def overweight_normalizer():
if df[df["overweight"] > 25]:
df.where(df["overweight"] > 25, 1)
elif df[df["overweight"] < 25]:
df.where(df["overweight"] < 25, 0)
df[df["overweight"] > 25] is not a valid condition.
Try this:
def overweight_normalizer():
df = pd.DataFrame({'overweight': [2, 39, 15, 45, 9]})
df["overweight"] = [1 if i > 25 else 0 for i in df["overweight"]]
return df
overweight_normalizer()
Output:
overweight
0 0
1 1
2 0
3 1
4 0
Consider the following code.
import pandas as pd
np.random.seed(0)
df_H = pd.DataFrame( {'L0': np.random.randn(100),
'OneAndZero': np.random.randn(100),
'OneAndTwo': np.random.randn(100),
'GTwo': np.random.randn(100),
'Decide': np.random.randn(100)})
I would like to create a new column named Result, which depends on the value of the column Decide. So if the value in Decide is less than 0, I would like Result to have the corresponding value of the row in L0. If the value on the row in Decide is between 1 and 0, it should grab the value in OneAndZero, between 1 and 2, it should grab OneAndTwo and if the value of decide is > 2, then it should grab GTwo.
How would one do this with df.apply since I have only seen examples with fixed values and not values from other columns?
Just because it is Good Friday, we can try the following. Else it is a commonly asked question.
c1=df_H['Decide'].le(0)
c2=df_H['Decide'].between(0,1)
c3=df_H['Decide'].between(1,2)
c4=df_H['Decide'].gt(2)
cond=[c1,c2,c3,c4]
choices=[df_H['L0'],df_H['OneAndZero'],df_H['OneAndTwo'],df_H['GTwo']]
df_H['Result']=np.select(cond,choices)
df_H
If you really want to use apply
def choose_res(x):
if x['Decide'] <= 0:
return x['L0']
if 0 < x['Decide'] <= 1:
return x['OneAndZero']
if 1 < x['Decide'] <= 2:
return x['OneAndTwo']
if x['Decide'] > 2:
return x['GTwo']
df_H['Result'] = df_H.apply(axis=1, func=choose_res, result_type='expand')
df.iloc
df_H.reset_index(drop=True, inplace=True)
for i in range(len(df_H)):
a = df_H['Decide'].iloc[i]
if 0 <= a <=1 :
b = df_H['OneAndZero'].iloc[i]
df_H.loc[i,'Result']= b
if 1.1 <= a <= 2:
b = df_H['OneAndTwo'].iloc[i]
df_H.loc[i,'Result']= b
maybe you can try this way.
df_apply
if you want to use apply..
create the function that have the condition, and the output,
then used this code:
df_H['Result'] = df_H.apply(your function name)
I want to check if two consecutive values in a column are bigger than 0. If yes, then data['Exit'] = 1, else 0
My code:
data['Exit'] = 0
for row in range(len(data)):
if (data['Mkt_Return'].iloc[row] > 0) and (data['Mkt_Return'].iloc[row-1] > 0):
data['Exit'] = 1
Right now all my values are equal to 1, but I know some values are smaller than 0 and therefore shouldn't be equal to 1.
Is .iloc[row-1] wrong?
Your condition logic is a bit faulty, for instance, it compares the first row data with the last row data. You might need to correct to maybe this form
for row in range(1,len(data)):
if ((data['Mkt_Return'].iloc[row-1] > 0) and (data['Mkt_Return'].iloc[row] > 0)):
data['Exit'] = 1
How about this?
data["Mkt_Return_2"] = data["Mkt_Return"].shift(-1)
import numpy as np
data["foo"] = np.where(((data["Mkt_Return_2"] > 0) & (data["Mkt_Return"] > 0)), 1, 0)
In my dataframe I want to substitute every value below 1 and higher than 5 with nan.
This code works
persDf = persDf.mask(persDf < 1000)
and I get every value as an nan but this one does not:
persDf = persDf.mask((persDf < 1) and (persDf > 5))
and I have no idea why this is so. I have checked the man page and different solutions on apparentely similar problems but could not find a solution. Does anyone have have an idea that could help me on this?
Use the | operator, because a value cant be < 1 AND > 5:
persDf = persDf.mask((persDf < 1) | (persDf > 5))
Another method would be to use np.where and call that inside pd.DataFrame:
pd.DataFrame(data=np.where((df < 1) | (df > 5), np.NaN, df),
columns=df.columns)