I want to create a directional pandas pct_change function, so a negative number in a prior row, followed by a larger negative number in a subsequent row will result in a negative pct_change (instead of positive).
I have created the following function:
```
ef pct_change_directional(x):
if x.shift() > 0.0:
return x.pct_change() #compute normally if prior number > 0
elif x.shift() < 0.0 and x > x.shift:
return abs(x.pct_change()) # make positive
elif x.shift() <0.0 and x < x.shift():
return -x.pct_change() #make negative
else:
return 0
```
However when I apply it to my pandas dataframe column like so:
df['col_pct_change'] = pct_change_directional(df[col1])
I get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
any ideas how I can make this work?
Thanks!
CWE
As #Wen said multiple where, not unlikely np.select
mask1 = df[col].shift() > 0.0
mask2 = ((df[col].shift() < 0.0) & (df[col] > df[col].shift())
mask3 = ((df[col].shift() < 0.0) & (df[col] < df[col].shift())
np.select([mask1, mask2, mask3],
[df[col].pct_change(), abs(df[col].pct_change()),
-df[col].pct_change()],
0)
Much detail about select and where you can see here
Related
I have a data frame that contains some daily,monthly and weekly statistics and lost weight.
I would like to create the boolean column that contains the information whether the lost weight was bigger or lower than the threshold. I tried using if loop nad np.where
if df_prod_stats.loc[df_prod_stats['frequency'] == "daily"]:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5)),1,0)
elif df_prod_stats.loc[df_prod_stats['frequency'] == "monthly"]:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 15)),1,0)
else:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 3.5)),1,0)
But i get an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think you will need to do this a different way. I think you're trying to go through each row to see if it's weekly/monthly and checking the loss weight accordingly, however that is not what your code actually does. In the if df_prod_stats.loc[...], the loc will return a subset of the data frame, which will evaluate to true if it has data in, but then your next line of trying to fill in the new column will just apply to the entire original data frame, not the rows that just matched the loc statement. You can achieve what (I think) you want using several loc statements as below:
create target_met column and set to 0:
df_prod_stats['target_met'] = 0
Then use .loc to filter your first if statement condition (frequency is daily, loss weight is less than 0.5), and set target met to be 1:
df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
& (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1
elif condition (frequency is monthly, loss weight is less than 15):
df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
& (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1
else condition (frequency is neither daily or monthly, and loss weight is less than 3.5):
df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
& (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1
Put together you get:
df_prod_stats['target_met'] = 0
df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
& (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1
df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
& (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1
df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
& (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1
Output:
frequency loss_weight target_met
0 daily -0.42 1
1 daily -0.35 1
2 daily -0.67 1
3 daily -0.11 1
4 daily -0.31 1
I hope that is what you're trying to achieve.
I found out it's possible also to use simple set of conditions in np.whereas follows:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5) & ( df_prod_stats['frequency'] == "daily")
| (df_prod_stats['loss_weight'] < 15.0) & ( df_prod_stats['frequency'] == "monthly")
| (df_prod_stats['loss_weight'] < 3.5) & ( df_prod_stats['frequency'] == "weekly")),1,0)
I have two columns in pandas dataframe and want to compare their values against each other and return a third column processing a simple formula.
if post_df['pos'] == 1:
if post_df['lastPrice'] < post_df['exp']:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
post_df['pos'] = 0
else:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
However, when I run the above code I get the following error:
if post_df['pos'] == 1:
File "/Users/srikanthiyer/Environments/emacs/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have tried using np.where which works but since I intend to build a complex conditional structure want to keep it simple using if statements.
I would try something like this:
def calculate_profp(row):
profP = None
if row['pos'] == 1:
if row['lastPrice'] < row['exp']:
profP = row['lastPrice'] - row['ltP']
else:
profP = row['lastPrice'] - row['ltP']
return profP
post_df['profP'] = post_df.apply(calculate_profp, axis=1)
What do you want to do with rows where row['pos'] is not 1?
afterwards, you can run:
post_df['pos'] = post_df.apply(
lambda row: 0 if row['pos'] == 1 and row['lastPrice'] < row['exp'] else row['pos'],
axis=1)
to set pos from 1 to 0
or:
post_df['pos'] = post_df['pos'].map(lambda pos: 0 if pos == 1 else pos)
I know following error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
has been asked a long time ago.
However, I am trying to create a basic function and return a new column with df['busy'] with 1 or 0. My function looks like this,
def hour_bus(df):
if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
return df['busy'] == 1
else:
return df['busy'] == 0
I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create that function. I used & instead of and in my if clause.
Anyhow, when I do the following, I get my desired output.
df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')
Any ideas on what mistake am I making in my hour_bus function?
The
(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
gives a boolean array, and when you index your df with that you'll get a (probably) smaller part of your df.
Just to illustrate what I mean:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0 False
# 1 False
# 2 True
# 3 True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
# a
# 2 3
# 3 4
However it's still a DataFrame so it's ambiguous to use it as expression that requires a truth value (in your case an if).
bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You could use the np.where you used - or equivalently:
def hour_bus(df):
mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
res = df['busy'] == 0
res[mask] = (df['busy'] == 1)[mask] # replace the values where the mask is True
return res
However the np.where will be the better solution (it's more readable and probably faster).
I'm trying to filter out certain rows in my dataframe that is allowing two combinations of values for two columns. For example columns 'A' and 'B' can just be either 'A' > 0 and 'B' > 0 OR 'A' < 0 and 'B' < 0. Any other combination I want to filter.
I tried the following
df = df.loc[(df['A'] > 0 & df['B'] > 0) or (df['A'] < 0 & df['B'] < 0)]
which gives me an error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I know this is probably a very trivial questions but I couldn't find any solution to be honest and I can't figure out what the problem with my approach ist.
You need some parenthesis and to format for pandas (and/or to become &/|):
df = df[((df['A'] > 0) & (df['B'] > 0)) | ((df['A'] < 0) & (df['B'] < 0))]
Keep in mind what this is doing - you're just building a giant list of [True, False, True, True] and passing that into the df index, telling it to keep each row depending on whether it gets a True or a False in the corresponding list.
For some reason this is not working:
sample data:
dt = pd.DataFrame({'sid':['a']*9 + ['b']*9 + ['c']*9,
'src': [1] *18 + [2] * 9,
'val':np.random.randn(27),
'dval': [0]*18 + np.random.rand(9)})
I want to multi-group by src,sid and change a dval row value, for those rows that are c, based on some val criteria.
I keep getting a StopIteration error.
# -- set bycp threshold for probability val to alert
def quantg(g):
try:
g['dval'] = g['dval'].apply(lambda x: x > x['val'].quantile(.90) and 1 or 0 )
print '***** bycp ', g.head(2)
#print 'discretize bycp ', g.head()
return g
except (Exception,StopIteration) as e:
print '**bycp error\n', e
print g.info()
pass
Then I try to filter by row before the groupby:
d = d[d['alert_t']=='bycp'].groupby(['source','subject_id','alert_t','variable']).apply(quantg )
I also tried mulitlevel select:
# -- xs for multilevel select
g['dval'] = g.xs(('c','sid')).map(lambda x: len(g['value']) and\
#(x>g['value'].quantile(.90) and 1 or 0 ))
But no luck!
Get frameindex or stopiteration type errors.
what gives, how can i get this done ?
The following doesn't do what you think it does:
x > x['val'].quantile(.90) and 1 or 0
Ifn fact, if you try it with a Series it ought to raise a ValueError.
In [11]: dt and True
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
When writing something like that you want to use np.where:
np.where(x > x['val'].quantile(.90), 1, 0)
Note: astype('int64') would also work, or just leaving it as bool...
However, I think I might use a transform here (to extract each groups quantile and then mask off this), with something like:
q90 = g.transform(lambda x: x.quantile(.90))
df[df.val > q90]