Could anyone suggest a way answer the same question (see link) but by using lambda function:
Update a dataframe in pandas while iterating row by row
You'll want to use apply with the parameter axis=1 to insure the function passed to apply is applied to each row.
The referenced question has an answer that uses this loop.
for i, row in df.iterrows():
if <something>:
row['ifor'] = x
else:
row['ifor'] = y
df.ix[i]['ifor'] = x
To use a lambda with the same logic
df['ifor'] = df.apply(lambda row: x if something else y, axis=1)
Related
I have a loop logic using iterrows but the performance is bad
result = []
for index, row in df_test.iterrows():
result.append(product_recommendation_model.predict(df_test.iloc[[index]]))
submission = pd.DataFrame({'ID': df_test['ID'],
'Result': result
})
display(submission)
I would like to rewrite it with using apply lambda but I have no idea how to get the full data frame.
a = df_test.apply(lambda x: product_recommendation_model.predict(df_test.iloc[[x]]) ,axis=1)
Can anyone help me please? Thanks.
I think this works for you
df_new = df_test.apply(lambda row: pd.Series([row['ID'],product_recommendation_model.predict(row)] ,axis=1)
df_new.columns = ['ID','Result']
Note: You can also pass argument to your prediction like row[column_name] if you want to pass only one column value to predict, row will send all column values of a row.
Finally, I can run it with the below code.
df_test.apply(lambda i: product_recommendation_model.predict(i.to_frame().T), axis=1)
I'm trying to create a new column that comes from the calculation of two columns. Usually when I need to do this but with only one column I use .apply() but now with two parameters I don't know how to do it.
With one I do the following code:
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x):
x = x + 5
return x
df['new'] = df['colA'].apply(myFunc)
df.head()
With two I thought was like the following, but not.
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x,y):
x = x + y
return x
df['new'] = df[['colA','colB']].apply(myFunc)
df.head()
I see some people use lambda but I don't understand and furthermore I think has to be easier.
Thank you very much!
Disclaimer: avoid apply if possible. With that in mind, you are looking for axis=1, but you need to rewrite the function like:
df['new'] = df.apply(lambda x: myFunc(x['colA'], x['colB']),
axis=1)
which is essentially equivalent to:
df['new'] = [myFunc(x,y) for x,y in zip(df['colA'], df['colB'])]
You can use axis=1 and in function access columns like below
def myFunc(x):
x['colA']
x['colB']
and you apply it as
df['new'] = df.apply(myFunc, axis=1)
Get knowledge of using lambda from here
lambda function is an expression
https://realpython.com/python-lambda/
The special syntax *args in function definitions in python is used to
pass a variable number of arguments to a function
https://www.geeksforgeeks.org/args-kwargs-python/
from pandas import read_csv, DataFrame
df = read_csv('results.csv')
def myFunc(x,y):
return x + y
df['new'] = df[['colA','colB']].apply(lambda col: myFunc(*col) ,axis=1)
df.head()
I am using addresses stored in a pandas dataframe columns as arguments for a function to make a call to Google Maps API and store the results in a column called address_components in the same dataframe
dm.loc[: , 'address_components'] = dm.loc[:, ['streetNumber', 'streetName', 'city']].apply(
lambda row: get_address(row[0], row[1], row[2]), axis=1)
The entire dataframe is very large and I would like to run the same function on part of the dataframe that fits a specific condition. I have tried this:
dm[dm['g_FSA'] == 'None'].loc[: , 'address_components'] = dm[dm['g_FSA'] == 'None'].loc[:, ['streetNumber', 'streetName', 'city']].apply(
lambda row: get_address(row[0], row[1], row[2]), axis=1)
But that's not working properly. Could someone help me spot my mistake?
Create a boolean mask using Series.eq, then use this mask along with DataFrame.loc to select specific rows and columns, then use DataFrame.apply to apply the custom function:
m = dm['g_FSA'].eq('None')
dm.loc[m, 'address_components'] = (
dm.loc[m, ['streetNumber', 'streetName', 'city']]
.apply(lambda s: get_address(*s), axis=1)
)
for index, row in df.iterrows():
print(index)
name = row['name']
new_name = get_name(name)
row['new_name'] = new_name
df.loc[index] = row
In this piece of code, my testing shows that the last line makes it quite slow, really slow. It basically insert a new column row by row. Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?
Use Series.apply for processing function for each value of column, it is faster like iterrows:
df['new_name'] = df['name'].apply(get_name)
If want improve performance then is necessary change function if possible, but it depends of function.
df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)
.apply isn't a best practice, however I am not sure there is a better one here.
What is a more elegant way of implementing below?
I want to apply a function: my_function to a dataframe where each row of the dataframe contains the parameters of the function. Then I want to write the output of the function back to the dataframe row.
results = pd.DataFrame()
for row in input_panel.iterrows():
(index, row_contents) = row
row_contents['target'] = my_function(*list(row_contents))
results = pd.concat([results, row_contents])
We'll iterate through the values and build a DataFrame at the end.
results = pd.DataFrame([my_function(*x) for x in input_panel.values.tolist()])
The less recommended method is using DataFrame.apply:
results = input_panel.apply(lambda x: my_function(*x))
The only advantage of apply is less typing.