Iterate with iterator-range in python - python

In this dataframe I want to iterate with a span of 3 rows
df = pd.DataFrame(index=range(0, 43), columns=['slow', 'fast', 'p'])
df.slow = 5
df.fast = [
2,2,2,3,3,3,3,3,4,4,
5,6,6,4,5,6,
6,5,4,5,6,6,7,
7,7,6,5,5,4,5,6,6,7,
8,8,9,8,7,7,7,7,7,7
]
df.p = [
1,1,1,1,2,3,3,4,5,6,
7,6,5,4,4,5,
6,7,6,6,7,7,8,
7,6,8,9,10,4,5,3,2,2,
4,4,5,6,7,8,8,8,8,8
]
the logic:
If fast > slow and p >= fast and p[-1] p[-2] p[-3] > slow = array append True
my attempt:
iterarray = [-1, -2, -3]
array = []
for i in range(len(df.index[2:])):
if df.fast[i] > df.slow[i] and df.p[i] >= df.fast[i] and df.p[i:i+len(iterarray)] > df.slow[i:i+len(iterarray)]:
array.append(True)
else:
array.append(False)
But I get an error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How can I achieve the proper iteration?

In your last condition df.p[i:i+len(iterarray)] > df.slow[i:i+len(iterarray)] you compare a 3 pair of numbers. This 3 pairs have 3 pair result (True or False) and python couldn't merge these 3 results naturally.
You must use .all() that if all pairs is True return True.
...
if df.fast[i] > df.slow[i] and df.p[i] >= df.fast[i] and (df.p[i:i+len(iterarray)] > df.slow[i:i+len(iterarray)]).all():
...

If you want to check if the condition (fast greater than slow) is true and also for some records ago, you can do this:
for i in [1, 2, 3]:
df[f"col_-{i}"] = (df['slow'] < df['fast']) & (df['fast'] <= df['p']) &(df['slow'].shift(i) < df['p'].shift(i))

Related

Apply 'if' condition to compare two pandas column and form a third column with the value of the second column

I have two columns in pandas dataframe and want to compare their values against each other and return a third column processing a simple formula.
if post_df['pos'] == 1:
if post_df['lastPrice'] < post_df['exp']:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
post_df['pos'] = 0
else:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
However, when I run the above code I get the following error:
if post_df['pos'] == 1:
File "/Users/srikanthiyer/Environments/emacs/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have tried using np.where which works but since I intend to build a complex conditional structure want to keep it simple using if statements.
I would try something like this:
def calculate_profp(row):
profP = None
if row['pos'] == 1:
if row['lastPrice'] < row['exp']:
profP = row['lastPrice'] - row['ltP']
else:
profP = row['lastPrice'] - row['ltP']
return profP
post_df['profP'] = post_df.apply(calculate_profp, axis=1)
What do you want to do with rows where row['pos'] is not 1?
afterwards, you can run:
post_df['pos'] = post_df.apply(
lambda row: 0 if row['pos'] == 1 and row['lastPrice'] < row['exp'] else row['pos'],
axis=1)
to set pos from 1 to 0
or:
post_df['pos'] = post_df['pos'].map(lambda pos: 0 if pos == 1 else pos)

Create a "directional" pandas pct_change function

I want to create a directional pandas pct_change function, so a negative number in a prior row, followed by a larger negative number in a subsequent row will result in a negative pct_change (instead of positive).
I have created the following function:
```
ef pct_change_directional(x):
if x.shift() > 0.0:
return x.pct_change() #compute normally if prior number > 0
elif x.shift() < 0.0 and x > x.shift:
return abs(x.pct_change()) # make positive
elif x.shift() <0.0 and x < x.shift():
return -x.pct_change() #make negative
else:
return 0
```
However when I apply it to my pandas dataframe column like so:
df['col_pct_change'] = pct_change_directional(df[col1])
I get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
any ideas how I can make this work?
Thanks!
CWE
As #Wen said multiple where, not unlikely np.select
mask1 = df[col].shift() > 0.0
mask2 = ((df[col].shift() < 0.0) & (df[col] > df[col].shift())
mask3 = ((df[col].shift() < 0.0) & (df[col] < df[col].shift())
np.select([mask1, mask2, mask3],
[df[col].pct_change(), abs(df[col].pct_change()),
-df[col].pct_change()],
0)
Much detail about select and where you can see here

The truth value of a Series is ambiguous - Error when calling a function

I know following error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
has been asked a long time ago.
However, I am trying to create a basic function and return a new column with df['busy'] with 1 or 0. My function looks like this,
def hour_bus(df):
if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
return df['busy'] == 1
else:
return df['busy'] == 0
I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create that function. I used & instead of and in my if clause.
Anyhow, when I do the following, I get my desired output.
df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')
Any ideas on what mistake am I making in my hour_bus function?
The
(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
gives a boolean array, and when you index your df with that you'll get a (probably) smaller part of your df.
Just to illustrate what I mean:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0 False
# 1 False
# 2 True
# 3 True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
# a
# 2 3
# 3 4
However it's still a DataFrame so it's ambiguous to use it as expression that requires a truth value (in your case an if).
bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You could use the np.where you used - or equivalently:
def hour_bus(df):
mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
res = df['busy'] == 0
res[mask] = (df['busy'] == 1)[mask] # replace the values where the mask is True
return res
However the np.where will be the better solution (it's more readable and probably faster).

Pandas: Index rows by an OR condition

I'm trying to filter out certain rows in my dataframe that is allowing two combinations of values for two columns. For example columns 'A' and 'B' can just be either 'A' > 0 and 'B' > 0 OR 'A' < 0 and 'B' < 0. Any other combination I want to filter.
I tried the following
df = df.loc[(df['A'] > 0 & df['B'] > 0) or (df['A'] < 0 & df['B'] < 0)]
which gives me an error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I know this is probably a very trivial questions but I couldn't find any solution to be honest and I can't figure out what the problem with my approach ist.
You need some parenthesis and to format for pandas (and/or to become &/|):
df = df[((df['A'] > 0) & (df['B'] > 0)) | ((df['A'] < 0) & (df['B'] < 0))]
Keep in mind what this is doing - you're just building a giant list of [True, False, True, True] and passing that into the df index, telling it to keep each row depending on whether it gets a True or a False in the corresponding list.

pandas multi-group apply() change view value

For some reason this is not working:
sample data:
dt = pd.DataFrame({'sid':['a']*9 + ['b']*9 + ['c']*9,
'src': [1] *18 + [2] * 9,
'val':np.random.randn(27),
'dval': [0]*18 + np.random.rand(9)})
I want to multi-group by src,sid and change a dval row value, for those rows that are c, based on some val criteria.
I keep getting a StopIteration error.
# -- set bycp threshold for probability val to alert
def quantg(g):
try:
g['dval'] = g['dval'].apply(lambda x: x > x['val'].quantile(.90) and 1 or 0 )
print '***** bycp ', g.head(2)
#print 'discretize bycp ', g.head()
return g
except (Exception,StopIteration) as e:
print '**bycp error\n', e
print g.info()
pass
Then I try to filter by row before the groupby:
d = d[d['alert_t']=='bycp'].groupby(['source','subject_id','alert_t','variable']).apply(quantg )
I also tried mulitlevel select:
# -- xs for multilevel select
g['dval'] = g.xs(('c','sid')).map(lambda x: len(g['value']) and\
#(x>g['value'].quantile(.90) and 1 or 0 ))
But no luck!
Get frameindex or stopiteration type errors.
what gives, how can i get this done ?
The following doesn't do what you think it does:
x > x['val'].quantile(.90) and 1 or 0
Ifn fact, if you try it with a Series it ought to raise a ValueError.
In [11]: dt and True
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
When writing something like that you want to use np.where:
np.where(x > x['val'].quantile(.90), 1, 0)
Note: astype('int64') would also work, or just leaving it as bool...
However, I think I might use a transform here (to extract each groups quantile and then mask off this), with something like:
q90 = g.transform(lambda x: x.quantile(.90))
df[df.val > q90]

Categories