apply a function to each row of the dataframe

apply a function to each row of the dataframe - python

What is a more elegant way of implementing below?
I want to apply a function: my_function to a dataframe where each row of the dataframe contains the parameters of the function. Then I want to write the output of the function back to the dataframe row.
results = pd.DataFrame()
for row in input_panel.iterrows():
(index, row_contents) = row
row_contents['target'] = my_function(*list(row_contents))
results = pd.concat([results, row_contents])

We'll iterate through the values and build a DataFrame at the end.
results = pd.DataFrame([my_function(*x) for x in input_panel.values.tolist()])
The less recommended method is using DataFrame.apply:
results = input_panel.apply(lambda x: my_function(*x))
The only advantage of apply is less typing.

Related

How to save the change for pandas dataframe after iterating by row?

I create a simple function to replace a certain column in df by row:
def replace(df):
for index, row in df.iterrows():
row['ALARM_TEXT'] = row['ALARM_TEXT'].replace('\'','')
return df
But the input df has not been changed after I call the function. Is there something wrong with it?

We usually do
df['ALARM_TEXT'] = df['ALARM_TEXT'].str.replace('\'','')

Efficient way to unnest pandas dataframe

I'm accessing a fairly large series of json files and storing them in a pandas series, part of a larger dataframe. There are several fields I want in said json, some of which are nested. I've been extracting them using json_normalize. The goal in to then merge these new fields with my original dataframe.
My problem is when I do so, instead of getting a dataframe with J rows and K columns, I get a J length series with each element being 1xK dataframe. I'm wondering if there is either an efficient vectorized way to turn this nested series/dataframe into a regular dataframe or get a regular dataframe from the start.
I've used map/lambda to create my nested series. Right now I'm unnesting with iteritems/append, but there has to be a more efficient way.
url_base = 'http:\\foo.bar='
df['http'] = df['id'].map(lambda x: url_base + x)
df['json'] = df['http'].map(lambda x: nf.get_json(x))
nest_ser = df['json'].map(lambda x: json_normalize(x))
df = pd.DataFrame()
for index, item in nest_ser.iteritems():
df = df.append(item)
json_normalize produces:
pd.Series([pd.DataFrame([col1,col2...]),[pd.DataFrame([col1,col2...]),[pd.DataFrame([col1,col2...]))
instead of
pd.DataFrame([col1,col2...])

suppose your name of the output series out of json_normalize is sr:
pd.concat(sr.tolist())

Pandas in Dataframe

I am posting this and hoping I will get a convincing answer.
df is my dataframe. I want to know what is being passed to min_max in apply function. When I print row inside min_max I don't get a dataframe same as I get outside it
import numpy as np
def min_max(row):
print(row)
print()
data = row[['POPESTIMATE2010',
'POPESTIMATE2011',
'POPESTIMATE2012',
'POPESTIMATE2013',
'POPESTIMATE2014',
'POPESTIMATE2015']]
return pd.Series({'min': np.min(data), 'max': np.max(data)})
df.apply(min_max, axis=1)

df.apply simply calls/invokes provided function, in your case min_max function for each objects in input axis. From documentation of apply function, axis=1 represents row wise operation and axis=0 represents column wise operation
Thus, in your case, it will invoke min_max function for each row of dataframe.
For further elaboration.
def print_funt(row):
pdb.set_trace()
print(row)
df = pd.DataFrame({'Temp1':[62,62,50,62,50,62,62],
'Temp2':[66,66,69,66,69,66,66],
'Temp3':[52,62,52,62,52,62,52],
'Target':[0.24,0.28,0.25,0.28,0.25,0.28,0.24]})
print(df)
df.apply(print_funt, axis=1)
output of apply function at first iteration

How do you effectively use pd.DataFrame.apply on rows with duplicate values?

The function that I'm applying is a little expensive, as such I want it to only calculate the value once for unique values.
The only solution I've been able to come up with has been as follows:
This step because apply doesn't work on arrays, so I have to convert the unique values into a series.
new_vals = pd.Series(data['column'].unique()).apply(function)
This one because .merge has to be used on dataframes.
new_dataframe = pd.DataFrame( index = data['column'].unique(), data = new_vals.values)
Finally Merging The results
yet_another= pd.merge(data, new_dataframe, right_index = True, left_on = column)
data['calculated_column'] = yet_another[0]
So basically I had to Convert my values to a Series, apply the function, convert to a Dataframe, merge the results and use that column to create me new column.
I'm wondering if there is some one-line solution that isn't as messy. Something pythonic that doesn't involve re-casting object types multiple times. I've tried grouping by but I just can't figure out how to do it.
My best guess would have been to do something along these lines
data[calculated_column] = dataframe.groupby(column).index.apply(function)
but that isn't right either.
This is an operation that I do often enough to want to learn a better way to do, but not often enough that I can easily find the last time I used it, so I end up re-figuring a bunch of things again and again.
If there is no good solution I guess I could just add this function to my library of common tools that I hedonistically > from me_tools import *
def apply_unique(data, column, function):
new_vals = pd.Series(data[column].unique()).apply(function)
new_dataframe = pd.DataFrame( data = new_vals.values, index =
data[column].unique() )
result = pd.merge(data, new_dataframe, right_index = True, left_on = column)
return result[0]

I would do something like this:
def apply_unique(df, orig_col, new_col, func):
return df.merge(df[[orig_col]]
.drop_duplicates()
.assign(**{new_col: lambda x: x[orig_col].apply(func)}
), how='inner', on=orig_col)
This will return the same DataFrame as performing:
df[new_col] = df[orig_col].apply(func)
but will be much more performant when there are many duplicates.
How it works:
We join the original DataFrame (calling) to another DataFrame (passed) that contains two columns; the original column and the new column transformed from the original column.
The new column in the passed DataFrame is assigned using .assign and a lambda function, making it possible to apply the function to the DataFrame that has already had .drop_duplicates() performed on it.
A dict is used here for convenience only, as it allows a column name to be passed in as a str.
Edit:
As an aside: best to drop new_col if it already exists, otherwise the merge will append suffixes to each new_col
if new_col in df:
df = df.drop(new_col, axis='columns')

What is the `pandas` way to create a column in a dataframe by operating on each row?

I have an apply function that operates on each row in my dataframe. The result of that apply function is a new value. This new value is intended to go in a new column for that row.
So, after applying this function to all of the rows in the dataframe, there will be an entirely new column in that dataframe.
How do I do this in pandas?

Two ways primarily:
df['new_column'] = df.apply(my_fxn, axis=1)
or
df = df.assign(new_column=df.apply(my_fxn, axis=1))
If you need to use other arguments, you can pass them to the apply function, but sometimes it's easier (for me) to just use a lambda:
df['new_column'] = df.apply(lambda row: my_fxn(row, global_dict), axis=1)
Additionally, if your function can operate on arrays in a vectorized fashion, you could just do:
df['new_column'] = my_fxn(df['col1'], df['col2'])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

apply a function to each row of the dataframe - python

We'll iterate through the values and build a DataFrame at the end. results = pd.DataFrame([my_function(x) for x in input_panel.values.tolist()]) The less recommended method is using DataFrame.apply: results = input_panel.apply(lambda x: my_function(x)) The only advantage of apply is less typing.

Related

How to save the change for pandas dataframe after iterating by row?

Efficient way to unnest pandas dataframe

Pandas in Dataframe

How do you effectively use pd.DataFrame.apply on rows with duplicate values?

What is the `pandas` way to create a column in a dataframe by operating on each row?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

apply a function to each row of the dataframe - python

We'll iterate through the values and build a DataFrame at the end. results = pd.DataFrame([my_function(*x) for x in input_panel.values.tolist()]) The less recommended method is using DataFrame.apply: results = input_panel.apply(lambda x: my_function(*x)) The only advantage of apply is less typing.

Related

How to save the change for pandas dataframe after iterating by row?

Efficient way to unnest pandas dataframe

Pandas in Dataframe

How do you effectively use pd.DataFrame.apply on rows with duplicate values?

What is the `pandas` way to create a column in a dataframe by operating on each row?

Categories

Resources

We'll iterate through the values and build a DataFrame at the end. results = pd.DataFrame([my_function(x) for x in input_panel.values.tolist()]) The less recommended method is using DataFrame.apply: results = input_panel.apply(lambda x: my_function(x)) The only advantage of apply is less typing.