I'm trying to create a new column that comes from the calculation of two columns. Usually when I need to do this but with only one column I use .apply() but now with two parameters I don't know how to do it.
With one I do the following code:
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x):
x = x + 5
return x
df['new'] = df['colA'].apply(myFunc)
df.head()
With two I thought was like the following, but not.
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x,y):
x = x + y
return x
df['new'] = df[['colA','colB']].apply(myFunc)
df.head()
I see some people use lambda but I don't understand and furthermore I think has to be easier.
Thank you very much!
Disclaimer: avoid apply if possible. With that in mind, you are looking for axis=1, but you need to rewrite the function like:
df['new'] = df.apply(lambda x: myFunc(x['colA'], x['colB']),
axis=1)
which is essentially equivalent to:
df['new'] = [myFunc(x,y) for x,y in zip(df['colA'], df['colB'])]
You can use axis=1 and in function access columns like below
def myFunc(x):
x['colA']
x['colB']
and you apply it as
df['new'] = df.apply(myFunc, axis=1)
Get knowledge of using lambda from here
lambda function is an expression
https://realpython.com/python-lambda/
The special syntax *args in function definitions in python is used to
pass a variable number of arguments to a function
https://www.geeksforgeeks.org/args-kwargs-python/
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x,y):
return x + y
df['new'] = df[['colA','colB']].apply(lambda col: myFunc(*col) ,axis=1)
df.head()
Related
I'm working on a project where I'm would like to use 2 lambda functions to find a match in another column. I created a dummy df with the following code below:
df = pd.DataFrame(np.random.randint(0,10,size=(100, 4)), columns=list('ABCD'))
Now I would like to find column A matches in column B.
df['match'] = df.apply(lambda x: x['B'].find(x['A']), axis=1).ge(0)
Now I would like to add an extra check where I'm also checking if column C values appear in column D:
df['match'] = df.apply(lambda x: x['D'].find(x['C']), axis=1).ge(0)
I'm searching for a solution where I can combine these 2 lines of code that is a one-liner that could be combined with an '&' operator for example. I hope this helps.
You can use and operator instead.
df['match'] = df.apply(lambda x: (x['B'] == x['A']) and (x['D'] == x['C']), axis=1).ge(0)
I am trying to drop a column from a pandas dataframe as follows:
df = pd.read_csv('Caravan_Dataset.csv')
X = df.drop('Purchase',axis=1)
y = df['Purchase']
but it does not work.
I also tried the following one:
df = pd.read_csv('Caravan_Dataset.csv')
X = df.drop('Purchase',axis=1,inplace=True)
y = df['Purchase']
but it does not work neither. ıt keeps giving an error like Purchase is still on the columns. Any idea about how can I do it?
When inplace = True , the data is modified in place, which means it will return nothing and the dataframe is now updated. When inplace=False, you will need to assign it to something new.
Change your code from:
X = df.drop('Purchase',axis=1,inplace=True)
To this:
df.drop('Purchase',axis=1,inplace=True)
Or, alternatively use inplace=False (which is the default) and returns a copy of the object, and use:
X = df.drop('Purchase',axis=1)
You have to assign it to df like df = df.drop('column_name',asix=1)
I was wondering if map method was the best option when a simple mapping was necessary in a column, since using map or apply is usually a bad idea .
I compared the following functions for the simple case below. Please share if you have better alternatives.
# Case - Map the random number to its string
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(1,7,size=(5000,1)), columns=['A'])
dikt = {1:'1',2:'2',3:'3',4:'4',5:'5',6:'6'}
First function - using map method:
def f1():
df1 = df.copy()
df1['B'] = df['A'].map(dikt)
return df1
Results:
Second function - using to_list method in column:
def f2():
df2 = df.copy()
column_list = df2['A'].tolist()
df2['B'] = [dikt[i] for i in column_list]
return df2
Results:
I have this code which works for one pandas series. How to apply it to all columns of my large dataset? I have tried many solutions, but none works for me.
c = data["High_banks"]
c2 = pd.to_numeric(c.str.replace(',',''))
data = data.assign(High_banks = c2)
What is the best way to do this?
i think you can do it like this
df = df.replace(",","",regex=True )
after that you can convert datatype
You can use a combination of the methods apply and applymap.
Take this for an example:
df = pd.DataFrame([['1,', '2,12'], ['3,356', '4,567']], columns = ['a','b'])
new_df = (df.applymap(lambda x: x.replace(',',''))
.apply(pd.to_numeric, axis = 1))
new_df.dtypes
>> #successfully converted to numeric types
a int64
b int64
dtype: object
The first method, applymap runs element wise on the dataframe to remove , then apply applies the pd.to_numeric function across the column axis of the dataframe.
Could anyone suggest a way answer the same question (see link) but by using lambda function:
Update a dataframe in pandas while iterating row by row
You'll want to use apply with the parameter axis=1 to insure the function passed to apply is applied to each row.
The referenced question has an answer that uses this loop.
for i, row in df.iterrows():
if <something>:
row['ifor'] = x
else:
row['ifor'] = y
df.ix[i]['ifor'] = x
To use a lambda with the same logic
df['ifor'] = df.apply(lambda row: x if something else y, axis=1)