In my dataframe I want to substitute every value below 1 and higher than 5 with nan.
This code works
persDf = persDf.mask(persDf < 1000)
and I get every value as an nan but this one does not:
persDf = persDf.mask((persDf < 1) and (persDf > 5))
and I have no idea why this is so. I have checked the man page and different solutions on apparentely similar problems but could not find a solution. Does anyone have have an idea that could help me on this?
Use the | operator, because a value cant be < 1 AND > 5:
persDf = persDf.mask((persDf < 1) | (persDf > 5))
Another method would be to use np.where and call that inside pd.DataFrame:
pd.DataFrame(data=np.where((df < 1) | (df > 5), np.NaN, df),
columns=df.columns)
Related
Consider the following code.
import pandas as pd
np.random.seed(0)
df_H = pd.DataFrame( {'L0': np.random.randn(100),
'OneAndZero': np.random.randn(100),
'OneAndTwo': np.random.randn(100),
'GTwo': np.random.randn(100),
'Decide': np.random.randn(100)})
I would like to create a new column named Result, which depends on the value of the column Decide. So if the value in Decide is less than 0, I would like Result to have the corresponding value of the row in L0. If the value on the row in Decide is between 1 and 0, it should grab the value in OneAndZero, between 1 and 2, it should grab OneAndTwo and if the value of decide is > 2, then it should grab GTwo.
How would one do this with df.apply since I have only seen examples with fixed values and not values from other columns?
Just because it is Good Friday, we can try the following. Else it is a commonly asked question.
c1=df_H['Decide'].le(0)
c2=df_H['Decide'].between(0,1)
c3=df_H['Decide'].between(1,2)
c4=df_H['Decide'].gt(2)
cond=[c1,c2,c3,c4]
choices=[df_H['L0'],df_H['OneAndZero'],df_H['OneAndTwo'],df_H['GTwo']]
df_H['Result']=np.select(cond,choices)
df_H
If you really want to use apply
def choose_res(x):
if x['Decide'] <= 0:
return x['L0']
if 0 < x['Decide'] <= 1:
return x['OneAndZero']
if 1 < x['Decide'] <= 2:
return x['OneAndTwo']
if x['Decide'] > 2:
return x['GTwo']
df_H['Result'] = df_H.apply(axis=1, func=choose_res, result_type='expand')
df.iloc
df_H.reset_index(drop=True, inplace=True)
for i in range(len(df_H)):
a = df_H['Decide'].iloc[i]
if 0 <= a <=1 :
b = df_H['OneAndZero'].iloc[i]
df_H.loc[i,'Result']= b
if 1.1 <= a <= 2:
b = df_H['OneAndTwo'].iloc[i]
df_H.loc[i,'Result']= b
maybe you can try this way.
df_apply
if you want to use apply..
create the function that have the condition, and the output,
then used this code:
df_H['Result'] = df_H.apply(your function name)
Given a pandas dataset with columns a, b and c, I have the following requirement:
calculate m = mean of c in the entire dataset
For each record in the dataset, if (a>10 and b<5) c = m
Is it possible to do this with a single pandas command, or I need to loop each record and ask the condition?
I think it is very much possible using boolean masks
This should work-
m = df.c.mean()
df.c[(df.a > 10) & (df.b < 5)] = m
You would better use loc when trying to update a slice of a dataframe, to avoid the common error:
m = d['c'].mean()
df.loc[(df['a'] > 10) & (df['b'] < 5), 'c'] = m
One line
df.loc[df['a'].gt(10) & df['b'].lt(5), 'c'] = df['c'].mean()
I am trying to put this logic on pandas dataframe
IF base_total_price > 0
IF base_total_discount = 0
actual_price = base_total_price
IF base_total_discount > 0
actual_price = base_total_price +base_total_discount
IF base_total_price = 0
IF base_total_discount > 0
actual_price = base_total_discount
IF base_total_discount = 0
actual_price = 0
so I wrote these 2 apply functions
#for all entries where base_total_price > 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: df_slice_1['base_total_price'] if x == 0 else df_slice_1['base_total_price']+df_slice_1['base_total_discount'])
#for all entries where base_total_price = 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: x if x == 0 else df_slice_1['base_total_discount'])
When i run the code I get this error
ValueError: Wrong number of items passed 20, placement implies 1
I know that it is trying to put more values in one column but I do not understand why is this happening or how can I solve this problem. All I need to do is to update the dataframe with the new column `actual_price` and I need to calculate the values for this column according to the above mentioned logic. Please suggest me a better way of implementing the logic or correct me
Sample data would have been useful. Please try use np.select(condtions, choices)
Conditions=[(df.base_total_price > 0)&(df.base_total_discount == 0),(df.base_total_price > 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount == 0)]
choices=[df.base_total_price,df.base_total_price.add(df.base_total_discount),df.base_total_discount,0]
df.actual_price =np.select(Conditions,choices)
I solved this question simply by using iterrows. Thanks everyone who responded
import pandas as pd
import os
os.chdir(r"C:\Users\Administrator\Desktop")
data = pd.read_csv("nifty.csv")
for i in range(9,len(data['SMA(10)'])):
if (data['SMA(10)'][i] < data['Open(-1)'][i] and
data['SMA(10)'][i] > data['Open'][i]):
data['Trans'][i] = "Sell"
elif(data['SMA(10)'][i] > data['Open(-1)'][i] and
data['SMA(10)'][i] < data['Open'][i]):
data['Trans'][i] = "Buy"
else:
data['Trans'][i] = "Hold"
print("-----------------------------------------------------------------------")
print(data.head(50))
I dont want two buy/sell values together, instead I want hold value
That's not how you use Pandas. Use a vectorized solution. -1 for sell, 1 for buy and 0 for hold. Faster than using strings but you can replace them if you choose so.
data['Trans'] = np.select([((data['SMA(10)'] < data['Open(-1)']) & (data['SMA(10)'] > data['Open(-1)'])),
((data['SMA(10)'] > data['Open(-1)']) & (data['SMA(10)'] < data['Open(-1)']))],
(-1, 1), 0)
How do you replace a value in a dataframe for a cell based on a conditional for the entire data frame not just a column. I have tried to use df.where but this doesn't work as planned
df = df.where(operator.and_(df > (-1 * .2), df < 0),0)
df = df.where(df > 0 , df * 1.2)
Basically what Im trying to do here is replace all values between -.2 and 0 to zero across all columns in my dataframe and all values greater than zero I want to multiply by 1.2
You've misunderstood the way pandas.where works, which keeps the values of the original object if condition is true, and replace otherwise, you can try to reverse your logic:
df = df.where((df <= (-1 * .2)) | (df >= 0), 0)
df = df.where(df <= 0 , df * 1.2)
where allows you to have a one-line solution, which is great. I prefer to use a mask like so.
idx = (df < 0) & (df >= -0.2)
df[idx] = 0
I prefer breaking this into two lines because, using this method, it is easier to read. You could force this onto a single line as well.
df[(df < 0) & (df >= -0.2)] = 0
Just another option.