Applying function to dataframe column? - python

I have the following function (one-hot encoding function that takes a column as an input). I basically want to apply it to a column in my dataframe, but can't seem to understand what's going wrong.
def dummies(dataframe, col):
dataframe[col] = pd.Categorical(dataframe[col])
pd.concat([dataframe,pd.get_dummies(dataframe[col],prefix = 'c')],axis=1)
df1 = df['X'].apply(dummies)
Guessing something is wrong with how I'm calling it?

you need to make sure you're returning a value from the function, currently you are not..also when you apply a function to a column you are basically passing the value of each row in the column into the function, so your function is set up wrong..typically you'd do it like this:
def function1(value):
new_value = value*2 #some operation
return new_value
then:
df['X'].apply(function1)
currently your function is set up to take entire df, and the name of a column, so likely your function might work if you call it like this:
df1 = dummies(df, 'X')
but you still need to add a return statement

If you want to apply it to that one column you don't need to make a new dataframe. This is the correct syntax. Please read the docs.
df['X'] = df['X'].apply(lambda x : dummies(x))

Related

Pandas Dataframe: Function doesn't preserve my custom column order when returning df

I followed this code from user Lala la (https://stackoverflow.com/a/55803252/19896454)
to put 3 columns at the front and leave the rest with no changes. It works well inside the function but when returns the dataframe, it loses column order.
My desperate solution was to put the code on the main program...
Other functions in my code are able to return modified versions of the dataframe with no problem.
Any ideas what is happening?
Thanks!
def define_columns_order(df):
cols_to_order = ['LINE_ID','PARENT.CATEGORY', 'CATEGORY']
new_columns = cols_to_order + (df.columns.drop(cols_to_order).tolist())
df = df[new_columns]
return df
try using return(df.reindex(new_columns, axis=1)) and keep in mind DataFrame modifications are not in place, unless you specify inplace=True, therefore you need to explicitly assign the result returned by your function to your df variable

Rename last column in a dataframe passed along in method chain

How can I rename the last column in a dataframe, that was passed along in a method chain? Think about the following example (the real use case is more complex). How can the rename function refer to the dataframe that it processes (which is different from the "table" dataframe? Is there something like the following? Unfortunately "self" does not exist.
result = table.iloc[:,2:-1].rename(columns={self.columns[-1]: "Text"})
Use pipe():
result = table.iloc[:,2:-1].pipe(lambda df: df.rename(columns={df.columns[-1]: "Text"}))
I think that you can just do the following:
result = table.iloc[:,2:-1]
result.columns = result.columns[:-1] + ["Text"]

using apply() when function is expecting the dataframe row python

My function is returning a value based on the dataframe row
def function(df, row):
if df['A'][row]!='value':
return np.nan
else:
return df['B'][row]
I need to call it using apply. However, I don't know how to pass the row into the function. I know there are more efficient ways of doing this, but the intstructions are pretty clear (yes this is a class assignment)
I have tried multiple variations of apply() and lambda. Nothing is working. The latest attempt is this, but it's expecting a row argument.
df.apply(hourly_wage, axis=1)
Any suggestions?
Did you try : df.apply(lambda row: function(row), axis=1)
But you have to change your function :
def function(row):
if row['A']!='value':
return np.nan
else:
return row['B']

How to use Apply() and self defined function to change data in DataFrame?

What is the easiest way to make some changes in the index column of different rows in a DataFrame ?
def fn(country):
if any(char.isdigit() for char in country):
return country[:-2]
else:
return country
df.loc["Country"].apply(fn,axis=1)
I cant test now. Can you try: df['Country'] = df.apply(lambda row: fn(row),axis = 1) and change your function argument to take the row into account (like row['Country']). This way you can manipulate anything you want row by row using other column values.

Using pandas apply() function on a dataframe to create a new dataframe

I have a problem annoying me for some time now. I have written a function that should, based on the row values of a dataframe, create a new dataframe filled with values based on a condition in the function. My function looks like this:
def intI():
df_ = pd.DataFrame()
df_ = df_.fillna(0)
for index, row in Anno.iterrows():
genes=row['AR_Genes'].split(',')
df=pd.DataFrame()
if 'intI1' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
elif 'intI2' in genes:
df['Year']=row['Year']
df['Integrase']= 1
df_=df_.append(df)
else:
df['Year']=row['Year']
df['Integrase']= 0
df_=df_.append(df)
return df_
when I call it like this Newdf=Anno['AR_Genes'].apply(intI()), I get the following error:
TypeError: 'DataFrame' object is not callable
I really do not understand why it does not work. I have done similar things before, but there seems to be a difference that I do not get. Can anybody explain what is wrong here?
*******************EDIT*****************************
Anno in the function is the dataframe that the function shal be run on. It contains a string, for example a,b,c,ad,c
DataFrame.apply takes a function which applies to all rows/columns of the DataFrame. That error occurs because your function returns a DataFrame which you then pass to apply.
Why do you do use .fillna(0) on a newly created, empty, DataFrame?
Would not this work? Newdf = intI()

Categories