Dataframe column status - python

On the column for status
I want set status as 1 if diff is less than 0 and 1 if is more than 1.

You can use np.where to choose 1 or '' depending on the condition.
Use this:
import numpy as np
df_small["status"] = np.where((df_small["diff"] < 0) | (df_small["diff"] > 1), 1, '')

You can use np.where or, if you prefer, you can simply apply a lambda function like this:
df['status'] = df['diff'].apply(lambda val: 1 if val < 0 or val > 1 else np.nan)
As default value you can use np.nan or any other value that you like.

Related

How to use df.apply to switch between columns?

Consider the following code.
import pandas as pd
np.random.seed(0)
df_H = pd.DataFrame( {'L0': np.random.randn(100),
'OneAndZero': np.random.randn(100),
'OneAndTwo': np.random.randn(100),
'GTwo': np.random.randn(100),
'Decide': np.random.randn(100)})
I would like to create a new column named Result, which depends on the value of the column Decide. So if the value in Decide is less than 0, I would like Result to have the corresponding value of the row in L0. If the value on the row in Decide is between 1 and 0, it should grab the value in OneAndZero, between 1 and 2, it should grab OneAndTwo and if the value of decide is > 2, then it should grab GTwo.
How would one do this with df.apply since I have only seen examples with fixed values and not values from other columns?
Just because it is Good Friday, we can try the following. Else it is a commonly asked question.
c1=df_H['Decide'].le(0)
c2=df_H['Decide'].between(0,1)
c3=df_H['Decide'].between(1,2)
c4=df_H['Decide'].gt(2)
cond=[c1,c2,c3,c4]
choices=[df_H['L0'],df_H['OneAndZero'],df_H['OneAndTwo'],df_H['GTwo']]
df_H['Result']=np.select(cond,choices)
df_H
If you really want to use apply
def choose_res(x):
if x['Decide'] <= 0:
return x['L0']
if 0 < x['Decide'] <= 1:
return x['OneAndZero']
if 1 < x['Decide'] <= 2:
return x['OneAndTwo']
if x['Decide'] > 2:
return x['GTwo']
df_H['Result'] = df_H.apply(axis=1, func=choose_res, result_type='expand')
df.iloc
df_H.reset_index(drop=True, inplace=True)
for i in range(len(df_H)):
a = df_H['Decide'].iloc[i]
if 0 <= a <=1 :
b = df_H['OneAndZero'].iloc[i]
df_H.loc[i,'Result']= b
if 1.1 <= a <= 2:
b = df_H['OneAndTwo'].iloc[i]
df_H.loc[i,'Result']= b
maybe you can try this way.
df_apply
if you want to use apply..
create the function that have the condition, and the output,
then used this code:
df_H['Result'] = df_H.apply(your function name)

Pandas apply function does not assign values to the colum

I am trying to put this logic on pandas dataframe
IF base_total_price > 0
IF base_total_discount = 0
actual_price = base_total_price
IF base_total_discount > 0
actual_price = base_total_price +base_total_discount
IF base_total_price = 0
IF base_total_discount > 0
actual_price = base_total_discount
IF base_total_discount = 0
actual_price = 0
so I wrote these 2 apply functions
#for all entries where base_total_price > 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: df_slice_1['base_total_price'] if x == 0 else df_slice_1['base_total_price']+df_slice_1['base_total_discount'])
#for all entries where base_total_price = 0
df_slice_1['actual_price'] = df_slice_1['base_total_discount'].apply(lambda x: x if x == 0 else df_slice_1['base_total_discount'])
When i run the code I get this error
ValueError: Wrong number of items passed 20, placement implies 1
I know that it is trying to put more values in one column but I do not understand why is this happening or how can I solve this problem. All I need to do is to update the dataframe with the new column `actual_price` and I need to calculate the values for this column according to the above mentioned logic. Please suggest me a better way of implementing the logic or correct me
Sample data would have been useful. Please try use np.select(condtions, choices)
Conditions=[(df.base_total_price > 0)&(df.base_total_discount == 0),(df.base_total_price > 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount > 0),\
(df.base_total_price == 0)&(df.base_total_discount == 0)]
choices=[df.base_total_price,df.base_total_price.add(df.base_total_discount),df.base_total_discount,0]
df.actual_price =np.select(Conditions,choices)
I solved this question simply by using iterrows. Thanks everyone who responded

How do I label data based on the values of the previous row?

I want to label the data "1" if the current value is higher than that of the previous row and "0" otherwise.
Lets say I have this DataFrame:
df = pd.DataFrame({'date': [1,2,3,4,5], 'price': [50.125, 45.25, 65.857, 100.956, 77.4152]})
and I want the output as if the DataFrame is constructed like this:
df = pd.DataFrame({'date': [1,2,3,4,5], 'price': [50.125, 45.25, 65.857, 100.956, 77.4152], 'label':[0, 0, 1, 1, 0]})
*I don't know how to post a DataFrame
These code is my attempt:
df['label'] = 0
i = 0
for price in df['price']:
i = i+1
if price in i > price: #---> right now I am stuck here. i=It says argument of type 'int' is not iterable
df.append['label', 1]
elif price in i <= price:
df.append['label', 0]
I think there are also other logical mistakes in my codes. What am I doing wrong?
Create boolean mask by Series.ge (>=) with Series.shift and convert to integers for True/False to 1/0 mapping by Series.view:
df['label'] = df['price'].ge(df['price'].shift()).view('i1')
Or by Series.astype:
df['label'] = df['price'].ge(df['price'].shift()).astype(int)
IIUC np.where with a boolean shift to see the change in the row price and if it's greater than the row above.
df['label'] = np.where(df['price'].ge(df['price'].shift()),1,0)
print(df)
date price label
0 1 50.1250 0
1 2 45.2500 0
2 3 65.8570 1
3 4 100.9560 1
4 5 77.4152 0
Explanation:
print(df['price'].ge(df['price'].shift()))
returns a boolean of True and False statements that we can use in our where clause
0 False
1 False
2 True
3 True
4 False
To explain what is happening in your code:
df['label'] should be initiated to an empty list, not "0". If you want to set the first value of the list to 0, you can do df['label'] = [0].
i is just the index value (0, 1, 2, 3...) and not the value of the price at a specific index (50.125, 45.25, 65.857...) , so it is not what you want to compare.
price in is used to check if the value of the price variable exists in a list that follows. The in statement isn't followed with a list, so you will get an error. You want to instead get the price value at a specific index and compare if it is greater or less than the value at the previous index.
The append method uses () and not [].
If you want to continue along your method of using a loop, the following can work:
df['label'] = []
for i in range(len(df['price'])):
if df['price'][i] > df['price'][i - 1]:
df['label'].append(1)
else:
df['label'].append(0)
What this does is loop through the range of the length of the price list. It then compares the values of the price at position i and position i - 1.
You can also further simplify the if/else statement using a ternary operator to:
df['label'].append(1 if df['price'][i] > df['price'][i - 1] else 0)
Working fiddle: https://repl.it/repls/HoarseImpressiveScientists

Python Groupby with Boolean Mask

I have a pandas dataframe with the following general format:
id,atr1,atr2,orig_date,fix_date
1,bolt,l,2000-01-01,nan
1,screw,l,2000-01-01,nan
1,stem,l,2000-01-01,nan
2,stem,l,2000-01-01,nan
2,screw,l,2000-01-01,nan
2,stem,l,2001-01-01,2001-01-01
3,bolt,r,2000-01-01,nan
3,stem,r,2000-01-01,nan
3,bolt,r,2001-01-01,2001-01-01
3,stem,r,2001-01-01,2001-01-01
This result would be the following:
id,atr1,atr2,orig_date,fix_date,failed_part_ind
1,bolt,l,2000-01-01,nan,0
1,screw,l,2000-01-01,nan,0
1,stem,l,2000-01-01,nan,0
2,stem,l,2000-01-01,nan,1
2,screw,l,2000-01-01,nan,0
2,stem,l,2001-01-01,2001-01-01,0
3,bolt,r,2000-01-01,nan,1
3,stem,r,2000-01-01,nan,1
3,bolt,r,2001-01-01,2001-01-01,0
3,stem,r,2001-01-01,2001-01-01,0
Any tips or tricks most welcome!
Update2:
A better way to describe what I need to accomplish is that in a .groupby(['id','atr1','atr2']) to create a new indicator column where the following criteria are met for records within the groups:
(df['orig_date'] < df['fix_date'])
I think this should work:
df['failed_part_ind'] = df.apply(lambda row: 1 if ((row['id'] == row['id']) &
(row['atr1'] == row['atr1']) &
(row['atr2'] == row['atr2']) &
(row['orig_date'] < row['fix_date']))
else 0, axis=1)
Update: I think this is what you want:
import numpy as np
def f(g):
min_fix_date = g['fix_date'].min()
if np.isnan(min_fix_date):
g['failed_part_ind'] = 0
else:
g['failed_part_ind'] = g['orig_date'].apply(lambda d: 1 if d < min_fix_date else 0)
return g
df.groupby(['id', 'atr1', 'atr2']).apply(lambda g: f(g))

Replacing value based on conditional pandas

How do you replace a value in a dataframe for a cell based on a conditional for the entire data frame not just a column. I have tried to use df.where but this doesn't work as planned
df = df.where(operator.and_(df > (-1 * .2), df < 0),0)
df = df.where(df > 0 , df * 1.2)
Basically what Im trying to do here is replace all values between -.2 and 0 to zero across all columns in my dataframe and all values greater than zero I want to multiply by 1.2
You've misunderstood the way pandas.where works, which keeps the values of the original object if condition is true, and replace otherwise, you can try to reverse your logic:
df = df.where((df <= (-1 * .2)) | (df >= 0), 0)
df = df.where(df <= 0 , df * 1.2)
where allows you to have a one-line solution, which is great. I prefer to use a mask like so.
idx = (df < 0) & (df >= -0.2)
df[idx] = 0
I prefer breaking this into two lines because, using this method, it is easier to read. You could force this onto a single line as well.
df[(df < 0) & (df >= -0.2)] = 0
Just another option.

Categories