Data
Time,PM2.5,
1/1/2014,9
2/1/2014,10
import pandas as pd
df = pd.read_csv('xx.csv')
data = pd.DataFrame(df)
def calculation(y):
if 0 < y and y < 12:
bello=data.assign(API=(50/12)*y)
elif 12.1 <= y and y <= 50.4:
bello=data.assign(API=(((100-51)/(50.4-12.1))*(y-12.1))+51)
elif 50.5 <= y and y <= 55.4:
bello=data.assign(API=(((150-101)/(55.4-50.5))*(y-50.5))+101)
elif 55.5 <= y and y <= 150.4:
bello=data.assign(API=(((200-151)/(150.4-55.5))*(y-55.5))+151)
elif 150.5 <= y and y <= 250.4:
bello=data.assign(API=(((300-201)/(250.4-150.5))*(y-150.5))+201)
elif 250.5 <= y and y <= 350.4:
bello=data.assign(API=(((400-301)/(350.4-250.5))*(y-250.5))+301)
else:
bello=data.assign(API=(((500-401)/(500.4-350.5))*(y-350.5))+401)
return bello
y=data['PM2.5']
print(calculation(y))
Hi everyone,
I want to convert air quality data to PM2.5 with above condition and equation using coding above.
I received an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().".
I hope someone can tell me what is the problem.
Thanks in advance.
I wrote the coding above but show error. Hope someone can tell what is the problem of my coding.
For example, you can rewrite the function for single value and use df.apply(...)
import pandas as pd
data = pd.DataFrame({'PM2.5': [15, 50, 1000]})
def calculation(y):
if 0<y<12:
return (50/12)*y
elif 12.1 <= y <= 50.4:
return (((100-51)/(50.4-12.1))*(y-12.1))+51
## ....
else:
return (((500-401)/(500.4-350.5))*(y-350.5))+401
y=data['PM2.5']
print(y.apply(calculation))
This is close to your code, however faster solutions might exists by vectorizing.
Related
I have a data frame that contains some daily,monthly and weekly statistics and lost weight.
I would like to create the boolean column that contains the information whether the lost weight was bigger or lower than the threshold. I tried using if loop nad np.where
if df_prod_stats.loc[df_prod_stats['frequency'] == "daily"]:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5)),1,0)
elif df_prod_stats.loc[df_prod_stats['frequency'] == "monthly"]:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 15)),1,0)
else:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 3.5)),1,0)
But i get an error:
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I think you will need to do this a different way. I think you're trying to go through each row to see if it's weekly/monthly and checking the loss weight accordingly, however that is not what your code actually does. In the if df_prod_stats.loc[...], the loc will return a subset of the data frame, which will evaluate to true if it has data in, but then your next line of trying to fill in the new column will just apply to the entire original data frame, not the rows that just matched the loc statement. You can achieve what (I think) you want using several loc statements as below:
create target_met column and set to 0:
df_prod_stats['target_met'] = 0
Then use .loc to filter your first if statement condition (frequency is daily, loss weight is less than 0.5), and set target met to be 1:
df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
& (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1
elif condition (frequency is monthly, loss weight is less than 15):
df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
& (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1
else condition (frequency is neither daily or monthly, and loss weight is less than 3.5):
df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
& (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1
Put together you get:
df_prod_stats['target_met'] = 0
df_prod_stats.loc[(df_prod_stats['frequency'] == 'daily')
& (df_prod_stats['loss_weight'] < 0.5), 'target_met'] = 1
df_prod_stats.loc[(df_prod_stats['frequency'] == 'monthly')
& (df_prod_stats['loss_weight'] < 15), 'target_met'] = 1
df_prod_stats.loc[~(df_prod_stats['frequency'].isin(['daily', 'monthly']))
& (df_prod_stats['loss_weight'] < 3.5), 'target_met'] = 1
Output:
frequency loss_weight target_met
0 daily -0.42 1
1 daily -0.35 1
2 daily -0.67 1
3 daily -0.11 1
4 daily -0.31 1
I hope that is what you're trying to achieve.
I found out it's possible also to use simple set of conditions in np.whereas follows:
df_prod_stats['target_met'] =np.where(((df_prod_stats['loss_weight'] < 0.5) & ( df_prod_stats['frequency'] == "daily")
| (df_prod_stats['loss_weight'] < 15.0) & ( df_prod_stats['frequency'] == "monthly")
| (df_prod_stats['loss_weight'] < 3.5) & ( df_prod_stats['frequency'] == "weekly")),1,0)
I have two columns in pandas dataframe and want to compare their values against each other and return a third column processing a simple formula.
if post_df['pos'] == 1:
if post_df['lastPrice'] < post_df['exp']:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
post_df['pos'] = 0
else:
post_df['profP'] = post_df['lastPrice'] - post_df['ltP']
However, when I run the above code I get the following error:
if post_df['pos'] == 1:
File "/Users/srikanthiyer/Environments/emacs/lib/python3.7/site-packages/pandas/core/generic.py", line 1479, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I have tried using np.where which works but since I intend to build a complex conditional structure want to keep it simple using if statements.
I would try something like this:
def calculate_profp(row):
profP = None
if row['pos'] == 1:
if row['lastPrice'] < row['exp']:
profP = row['lastPrice'] - row['ltP']
else:
profP = row['lastPrice'] - row['ltP']
return profP
post_df['profP'] = post_df.apply(calculate_profp, axis=1)
What do you want to do with rows where row['pos'] is not 1?
afterwards, you can run:
post_df['pos'] = post_df.apply(
lambda row: 0 if row['pos'] == 1 and row['lastPrice'] < row['exp'] else row['pos'],
axis=1)
to set pos from 1 to 0
or:
post_df['pos'] = post_df['pos'].map(lambda pos: 0 if pos == 1 else pos)
I want to create a directional pandas pct_change function, so a negative number in a prior row, followed by a larger negative number in a subsequent row will result in a negative pct_change (instead of positive).
I have created the following function:
```
ef pct_change_directional(x):
if x.shift() > 0.0:
return x.pct_change() #compute normally if prior number > 0
elif x.shift() < 0.0 and x > x.shift:
return abs(x.pct_change()) # make positive
elif x.shift() <0.0 and x < x.shift():
return -x.pct_change() #make negative
else:
return 0
```
However when I apply it to my pandas dataframe column like so:
df['col_pct_change'] = pct_change_directional(df[col1])
I get the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
any ideas how I can make this work?
Thanks!
CWE
As #Wen said multiple where, not unlikely np.select
mask1 = df[col].shift() > 0.0
mask2 = ((df[col].shift() < 0.0) & (df[col] > df[col].shift())
mask3 = ((df[col].shift() < 0.0) & (df[col] < df[col].shift())
np.select([mask1, mask2, mask3],
[df[col].pct_change(), abs(df[col].pct_change()),
-df[col].pct_change()],
0)
Much detail about select and where you can see here
I know following error
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
has been asked a long time ago.
However, I am trying to create a basic function and return a new column with df['busy'] with 1 or 0. My function looks like this,
def hour_bus(df):
if df[(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')&\
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')]:
return df['busy'] == 1
else:
return df['busy'] == 0
I can execute the function, but when I call it with the DataFrame, I get the error mentioned above. I followed the following thread and another thread to create that function. I used & instead of and in my if clause.
Anyhow, when I do the following, I get my desired output.
df['busy'] = np.where((df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00') & \
(df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday'),'1','0')
Any ideas on what mistake am I making in my hour_bus function?
The
(df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
gives a boolean array, and when you index your df with that you'll get a (probably) smaller part of your df.
Just to illustrate what I mean:
import pandas as pd
df = pd.DataFrame({'a': [1,2,3,4]})
mask = df['a'] > 2
print(mask)
# 0 False
# 1 False
# 2 True
# 3 True
# Name: a, dtype: bool
indexed_df = df[mask]
print(indexed_df)
# a
# 2 3
# 3 4
However it's still a DataFrame so it's ambiguous to use it as expression that requires a truth value (in your case an if).
bool(indexed_df)
# ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You could use the np.where you used - or equivalently:
def hour_bus(df):
mask = (df['hour'] >= '14:00:00') & (df['hour'] <= '23:00:00')& (df['week_day'] != 'Saturday') & (df['week_day'] != 'Sunday')
res = df['busy'] == 0
res[mask] = (df['busy'] == 1)[mask] # replace the values where the mask is True
return res
However the np.where will be the better solution (it's more readable and probably faster).
I have made a function which returns a value for a force depending on the z position (z_pos). I would like to plot these results (shear diagram for the engineers here), however I get the following error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I have tried it both with arange and linspace, see the code here:
import matplotlib.pyplot as plt
import numpy as np
#values in kN and m
FyFL = 520
FyRL = 1246
L = 40.
Lf1 = 2.
Lf2 = 25.5
g = 9.81
W = 60000
q = (3*g*W/L)/1000 #kN/m
print q
def int_force_y(FyFL, FyRL, L, Lf1, Lf2, q, z_pos):
if z_pos <= Lf1:
int_fc_y = -q*z_pos
elif z_pos > Lf1 and z_pos < Lf1+Lf2:
int_fc_y = -q*Lf1 + FyFL-q*z_pos
elif z_pos >= Lf2 and z_pos <= 40.:
int_fc_y = -q*Lf1 + FyFL-q*(Lf1+Lf2)-q*z_pos
else:
return "No valid z_pos"
return int_fc_y
z_pos = np.arange(0,41,1)
y = int_force_y(FyFL, FyRL, L, Lf1, Lf2, q, z_pos)
plt.plot(z_pos,y)
plt.show()
Help is very much appreciated!
The error you are getting has nothing to do with the plotting but arises when you call int_force_y. The argument z_pos is a np.ndarray. If you now compare this to eg. Lf1 in your function then this gives you a boolean array where each element indicates if the corresponding element of z_pos is smaller or equal to Lf1 in case of your first if statement. As some elements are smaller or equal and some are not, he cannot decide whether he should consider this as True or False and asks you to use .any() to indicate that it should be True if any element is True or .all() to indicate that it should be True if all elements are True.
But both or these cases do not do what you want them to do. You want a decision for each element individually and to then set the corresponding value in int_fc_y accordingly. You can do this by a for-loop or more elegantly by using boolean indexing and np.logical_and. Just use this function to produce your result array instead of your version:
def int_force_y(FyFL, FyRL, L, Lf1, Lf2, q, z_pos):
if (z_pos>40.).any():
return "No valid z_pos"
int_force_y = np.zeros_like(z_pos)
int_fc_y[z_pos<=Lf1] = -q*z_pos
int_fc_y[np.logical_and(z_pos > Lf1,
z_pos < Lf1+Lf2)] = -q*Lf1 + FyFL-q*z_pos
int_fc_y[np.logical_and(z_pos >= Lf2,
z_pos <= 40.)] = -q*Lf1 + FyFL-q*(Lf1+Lf2)-q*z_pos
return int_fc_y
The problem happens because you are asking python whether if an array is larger or smaller than certain value:
if z_pos <= Lf1:
This might be truth for some values and false for some others, with leaves the question of whether if that statement is true or false ambiguous.
You can try :
if np.array(z_pos <= Lf1).any():
or
if np.array(z_pos <= Lf1).all():
depending on what you want.
Same for the following if statements.