How to get the full dataframe using lambda function in python? - python

I have a loop logic using iterrows but the performance is bad
result = []
for index, row in df_test.iterrows():
result.append(product_recommendation_model.predict(df_test.iloc[[index]]))
submission = pd.DataFrame({'ID': df_test['ID'],
'Result': result
})
display(submission)
I would like to rewrite it with using apply lambda but I have no idea how to get the full data frame.
a = df_test.apply(lambda x: product_recommendation_model.predict(df_test.iloc[[x]]) ,axis=1)
Can anyone help me please? Thanks.

I think this works for you
df_new = df_test.apply(lambda row: pd.Series([row['ID'],product_recommendation_model.predict(row)] ,axis=1)
df_new.columns = ['ID','Result']
Note: You can also pass argument to your prediction like row[column_name] if you want to pass only one column value to predict, row will send all column values of a row.

Finally, I can run it with the below code.
df_test.apply(lambda i: product_recommendation_model.predict(i.to_frame().T), axis=1)

Related

run df.apply with lambda function in a loop

This code snippet works well:
df['art_kennz'] = df.apply(lambda x:myFunction(x.art_kennz), axis=1)
However, here I have hard coded the column name art_kennz on both places: df['art_kennz'] and x.art_kennz. Now, I want to modify the script such that I have a list of column names and the df.apply runs for all those columns. So I tried this:
cols_with_spaces = ['art_kennz', 'fk_wg_sch']
for col_name in cols_with_spaces:
df[col_name] = df.apply(lambda x: myFunction(x.col_name)
, axis=1)
but this gives an error that:
AttributeError: 'Series' object has no attribute 'col_name'
because of x.col_name. Here, col_name is supposed to be the element from the for loop. What would be the correct syntax for this?
Try:
for col_name in cols_with_spaces:
df[col_name] = df.apply(lambda x: myFunction(x[col_name])
Explanation: You can access the Serie using attribute syntax e.g x.art_kennz, but since col_name is a variable containing a string that represent the attribute, bracket syntax is the correct way.
In this case x.art_kennz you use string but in for-loop you have variables you can not use .variables.
try this: (In this approach you iterate row by row)
for col_name in cols_with_spaces:
df[col_name] = df.apply(lambda row: myFunction(row[col_name]), axis=1)
If you want to iterate columns by columns you can try this:
for col_name in cols_with_spaces:
df[col_name] = df[col_name].apply(myFunction)

Pandas using apply() to run the function only on part of the dataframe

I am using addresses stored in a pandas dataframe columns as arguments for a function to make a call to Google Maps API and store the results in a column called address_components in the same dataframe
dm.loc[: , 'address_components'] = dm.loc[:, ['streetNumber', 'streetName', 'city']].apply(
lambda row: get_address(row[0], row[1], row[2]), axis=1)
The entire dataframe is very large and I would like to run the same function on part of the dataframe that fits a specific condition. I have tried this:
dm[dm['g_FSA'] == 'None'].loc[: , 'address_components'] = dm[dm['g_FSA'] == 'None'].loc[:, ['streetNumber', 'streetName', 'city']].apply(
lambda row: get_address(row[0], row[1], row[2]), axis=1)
But that's not working properly. Could someone help me spot my mistake?
Create a boolean mask using Series.eq, then use this mask along with DataFrame.loc to select specific rows and columns, then use DataFrame.apply to apply the custom function:
m = dm['g_FSA'].eq('None')
dm.loc[m, 'address_components'] = (
dm.loc[m, ['streetNumber', 'streetName', 'city']]
.apply(lambda s: get_address(*s), axis=1)
)

Is there a way to make changing DataFrame faster in a loop?

for index, row in df.iterrows():
print(index)
name = row['name']
new_name = get_name(name)
row['new_name'] = new_name
df.loc[index] = row
In this piece of code, my testing shows that the last line makes it quite slow, really slow. It basically insert a new column row by row. Maybe I should store all the 'new_name' into a list, and update the df outside of the loop?
Use Series.apply for processing function for each value of column, it is faster like iterrows:
df['new_name'] = df['name'].apply(get_name)
If want improve performance then is necessary change function if possible, but it depends of function.
df['new_name'] = df.apply(lambda x: get_name(x) if x.name == 'name' else x)
.apply isn't a best practice, however I am not sure there is a better one here.

Using Lambda Function Pandas to Set Column Values

Could anyone suggest a way answer the same question (see link) but by using lambda function:
Update a dataframe in pandas while iterating row by row
You'll want to use apply with the parameter axis=1 to insure the function passed to apply is applied to each row.
The referenced question has an answer that uses this loop.
for i, row in df.iterrows():
if <something>:
row['ifor'] = x
else:
row['ifor'] = y
df.ix[i]['ifor'] = x
To use a lambda with the same logic
df['ifor'] = df.apply(lambda row: x if something else y, axis=1)

How can I delete a row in a Pandas dataframe if the entire row is null?

I have been checking each value of each row and if all of them are null, I delete the row with something like this:
df = pandas.concat([df[:2], df[3:]])
But, I am thinking there's got to be a better way to do this. I have been trying to use a mask or doing something like this:
rows_to_keep = df.apply(
lambda row :
any([if val is None for val in row ])
, axis=1)
I also tried something like this (suggested on another stack overflow question)
pandas.DataFrame.dropna()
but don't see any differences in my printed dataframe.
dropna returns a new DataFrame, you probably just want:
df = df.dropna()
or
df.dropna(inplace=True)
If you have a more complicated mask, rows_to_keep, you can do:
df = df[rows_to_keep]

Categories