What does the inplace parameter of replace() and drop() methods do?
I didn't manage to understand from the docs.
Example:
df = pd.read_csv('breast-cancer-wisconsin.data.txt')
df.replace('?',-99999, inplace=True)
df.drop(['id'], 1, inplace=True)
If you pass the parameter inplace=False, it will create a new DataFrame on which the operation has been performed.
If you pass the parameter inplace=True, it will apply the operation directly on the DataFrame you're working on. Hence, the following lines are doing the same thing (conceptually):
df.replace('?',-99999, inplace=True)
df = df.replace('?', -99999, inplace=False)
Using the inplace version allow you to work on a single DataFrame. Using the other version allows you to create a new DataFrame on which you can work while keeping the original one, like this:
df_dropped = df.replace('?', -99999, inplace=False)
Without the inplace, df.replace('?',-99999, inplace=True) creates a new dataframe which is just like df, but with '?' replaced by -9999. df is not changed. inplace changes df.
Related
I'm attempting to drop a range of columns in a pandas dataframe that have all NaN. I know the following code:
df.dropna(axis=1, how='all', inplace = True)
Will search all the columns in the dataframe and drop the ones that have all NaN.
However, when I extend this code to a specific range of columns:
df[df.columns[48:179]].dropna(axis=1, how='all', inplace = True)
The result is the original dataframe with no columns removed. I also no for a fact that the selected range has multiple columns with all NaN's
Any idea what I might be doing wrong here?
Don't use inplace=True. Instead do this:
cols = df.columns[48:179]
df[cols] = df[cols].dropna(axis=1, how='all')
inplace=True can only used when you apply changes to the whole dataframe. I won't work in range of columns. Try to use dropna without inplace=True to see the results(in a jupyter notebook)
I have two dataframes (df_train and df_test) containing a column ('Date') that I want to drop.
As far as I understood, I could do it in two ways, i.e. either by using inplace or by assigning the dataframe to itself, like:
if 'Date' in df_train.columns:
df_train.drop(['Date'], axis=1, inplace=True)
OR
if 'Date' in df_train.columns:
df_train = df_train.drop(['Date'], axis=1)
Both the methods work on the single dataframe, but the former way should be more memory friendly, since with the assignent a copy of the dataframe is created.
The weird thing is, I have to do it for both the dataframes, so I tried to do the same within a loop:
for data in [df_train, df_test]:
if 'Date' in data.columns:
data.drop(['Date'], axis=1, inplace=True)
and
for data in [df_train, df_test]:
if 'Date' in data.columns:
data = data.drop(['Date'], axis=1)
and the weird thing is that, in this case, only the first ways (using inplace) works. If I use the second way, the 'Date' columns aren't dropped.
Why is that?
It doesn't work because iterating through the list and changing what's in the list doesn't actually change the actual list of dataframes because it only changes the iterators, so you should try:
lst = []
for data in [df_train, df_test]:
if 'Date' in data.columns:
lst.append(data.drop(['Date'], axis=1))
print(lst)
Now lst contains all the dataframes.
Its better to use a list comprehension:
res = [data.drop(['Date'], axis=1) for data in [df_train, df_test] if 'Date' in data.columns]
Here, you will get a copy of both dataframes after columns are dropped.
Using df.drop() I removed the "ID" column from the df, and now I want to return that column.
df.drop('ID', axis=1, inplace=True)
df
# shows me df without ID column
What method should I use?
found that all you need to do is to reload again the first command of the import of the df :)
import pandas as pd
df = pd.DataFrame({
'col1': [99, None, 99],
'col2': [4, 5, 6],
'col3': [7, None, None]})
col_list = ['col1', 'col2']
df[col_list].dropna(axis=1, thresh=2, inplace = True)
This returns a warning and leaves the dataframe unchanged:
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
The following generates no warning but still leaves the DataFrame unchanged.
df.loc[:,col_list].dropna(axis=1, thresh=2, inplace=True)
Problem:
From among a list of columns specified by the user, remove those columns from the dataframe which have less than 'thresh' non-null vales. Make no changes to the columns that are not in the list.
I need to use inplace=True to avoid making a copy of the dataframe, since it is huge
I cannot loop over the columns and apply dropna one column at a time, because pandas.Series.dropna does not have the 'thresh' argument.
Funnily enough, dropna does not support this functionality, but there is a workaround.
v = df[col_list].notna().sum().le(2) # thresh=2
df.drop(v.index[v], axis=1, inplace=True)
By the way,
I need to use inplace=True to avoid making a copy of the dataframe
I'm sorry to inform you that even with inplace=True, a copy is generated. The only difference is that the copy is assigned back to the original object in-place, so a new object is not returned.
I think the problem is df['col_list'] or the slicing creates a new df and inplace=True effects on that df and not on the original one.
You might have to use subset param of dropna and pass the column list to it.
df.dropna(axis=1, thresh=2, subset=col_list,inplace = True)
I would like to solve the below problem
I have the below code. I need to insert several data frames and apply the change at once
def reverse_df(*df):
for x in df:
x=x.loc[::-1].reset_index(level=0, drop=True)
return
reverse_df(df1,df2,df3,df4,df5)
I am able to do changes to a dataframe inside a function only when i am using inplace=True like in below
def remove_na(*df):
for x in df:
x.dropna(axis=0, how='all',inplace=True)
return
remove_na(df1,df2,df3,df4,df5)
buy the below doesn't work
def remove_na(*df):
for x in df:
x=x.dropna(axis=0, how='all')
return
remove_na(df1,df2,df3,df4,df5)
What am I doing wrong?
Short answer: x = x.dropna(axis=0, how='all') inside a function creates a local variable called x, so the reference to the original dataframe is lost, and any changes you make are not applied.
To solve the particular case of reversing the dataframe you can do:
def reverse(df):
df.reset_index(drop=False, inplace=True)
df.sort_index(ascending=False, inplace=True)
df.set_index('index', drop=True, inplace=True)
However, since inplace operations are not really inplace, you're probably better off returning a modified dataframe.