using apply() when function is expecting the dataframe row python - python

My function is returning a value based on the dataframe row
def function(df, row):
if df['A'][row]!='value':
return np.nan
else:
return df['B'][row]
I need to call it using apply. However, I don't know how to pass the row into the function. I know there are more efficient ways of doing this, but the intstructions are pretty clear (yes this is a class assignment)
I have tried multiple variations of apply() and lambda. Nothing is working. The latest attempt is this, but it's expecting a row argument.
df.apply(hourly_wage, axis=1)
Any suggestions?

Did you try : df.apply(lambda row: function(row), axis=1)
But you have to change your function :
def function(row):
if row['A']!='value':
return np.nan
else:
return row['B']

Related

How to get the full dataframe using lambda function in python?

I have a loop logic using iterrows but the performance is bad
result = []
for index, row in df_test.iterrows():
result.append(product_recommendation_model.predict(df_test.iloc[[index]]))
submission = pd.DataFrame({'ID': df_test['ID'],
'Result': result
})
display(submission)
I would like to rewrite it with using apply lambda but I have no idea how to get the full data frame.
a = df_test.apply(lambda x: product_recommendation_model.predict(df_test.iloc[[x]]) ,axis=1)
Can anyone help me please? Thanks.
I think this works for you
df_new = df_test.apply(lambda row: pd.Series([row['ID'],product_recommendation_model.predict(row)] ,axis=1)
df_new.columns = ['ID','Result']
Note: You can also pass argument to your prediction like row[column_name] if you want to pass only one column value to predict, row will send all column values of a row.
Finally, I can run it with the below code.
df_test.apply(lambda i: product_recommendation_model.predict(i.to_frame().T), axis=1)

Applying function to dataframe column?

I have the following function (one-hot encoding function that takes a column as an input). I basically want to apply it to a column in my dataframe, but can't seem to understand what's going wrong.
def dummies(dataframe, col):
dataframe[col] = pd.Categorical(dataframe[col])
pd.concat([dataframe,pd.get_dummies(dataframe[col],prefix = 'c')],axis=1)
df1 = df['X'].apply(dummies)
Guessing something is wrong with how I'm calling it?
you need to make sure you're returning a value from the function, currently you are not..also when you apply a function to a column you are basically passing the value of each row in the column into the function, so your function is set up wrong..typically you'd do it like this:
def function1(value):
new_value = value*2 #some operation
return new_value
then:
df['X'].apply(function1)
currently your function is set up to take entire df, and the name of a column, so likely your function might work if you call it like this:
df1 = dummies(df, 'X')
but you still need to add a return statement
If you want to apply it to that one column you don't need to make a new dataframe. This is the correct syntax. Please read the docs.
df['X'] = df['X'].apply(lambda x : dummies(x))

change several frames in a function in python

I would like to solve the below problem
I have the below code. I need to insert several data frames and apply the change at once
def reverse_df(*df):
for x in df:
x=x.loc[::-1].reset_index(level=0, drop=True)
return
reverse_df(df1,df2,df3,df4,df5)
I am able to do changes to a dataframe inside a function only when i am using inplace=True like in below
def remove_na(*df):
for x in df:
x.dropna(axis=0, how='all',inplace=True)
return
remove_na(df1,df2,df3,df4,df5)
buy the below doesn't work
def remove_na(*df):
for x in df:
x=x.dropna(axis=0, how='all')
return
remove_na(df1,df2,df3,df4,df5)
What am I doing wrong?
Short answer: x = x.dropna(axis=0, how='all') inside a function creates a local variable called x, so the reference to the original dataframe is lost, and any changes you make are not applied.
To solve the particular case of reversing the dataframe you can do:
def reverse(df):
df.reset_index(drop=False, inplace=True)
df.sort_index(ascending=False, inplace=True)
df.set_index('index', drop=True, inplace=True)
However, since inplace operations are not really inplace, you're probably better off returning a modified dataframe.

How to use Apply() and self defined function to change data in DataFrame?

What is the easiest way to make some changes in the index column of different rows in a DataFrame ?
def fn(country):
if any(char.isdigit() for char in country):
return country[:-2]
else:
return country
df.loc["Country"].apply(fn,axis=1)
I cant test now. Can you try: df['Country'] = df.apply(lambda row: fn(row),axis = 1) and change your function argument to take the row into account (like row['Country']). This way you can manipulate anything you want row by row using other column values.

Using pandas apply() function on a dataframe to create a new dataframe

I have a problem annoying me for some time now. I have written a function that should, based on the row values of a dataframe, create a new dataframe filled with values based on a condition in the function. My function looks like this:
def intI():
df_ = pd.DataFrame()
df_ = df_.fillna(0)
for index, row in Anno.iterrows():
genes=row['AR_Genes'].split(',')
df=pd.DataFrame()
if 'intI1' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
elif 'intI2' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
else:
df['Year']=row['Year']
df['Integrase']= 0
df_=df_.append(df)
return df_
when I call it like this Newdf=Anno['AR_Genes'].apply(intI()), I get the following error:
TypeError: 'DataFrame' object is not callable
I really do not understand why it does not work. I have done similar things before, but there seems to be a difference that I do not get. Can anybody explain what is wrong here?
*******************EDIT*****************************
Anno in the function is the dataframe that the function shal be run on. It contains a string, for example a,b,c,ad,c
DataFrame.apply takes a function which applies to all rows/columns of the DataFrame. That error occurs because your function returns a DataFrame which you then pass to apply.
Why do you do use .fillna(0) on a newly created, empty, DataFrame?
Would not this work? Newdf = intI()

Categories