The following line of my code causes a warning :
import pandas as pd
s = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
s.loc[-1] = [5,np.nan,np.nan,6]
grouped = s.groupby(['A'])
for key_m, group_m in grouped:
group_m.loc[-1] = [10,np.nan,np.nan,10]
C:\Anaconda3\lib\site-packages\ipykernel\__main__.py:10: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
According to the documentation this is the recommended way of doing, so what is happening ?
Thanks for your help.
The documentation is slightly confusing.
Your dataframe is a copy of another dataframe. You can verify this by running bool(df.is_copy) You are getting the warning because you are trying to assign to this copy.
The warning/documentation is telling you how you should have constructed df in the first place. Not how you should assign to it now that it is a copy.
df = some_other_df[cols]
will make df a copy of some_other_df. The warning suggests doing this instead
df = some_other_df.loc[:, [cols]]
Now that it is done, if you choose to ignore this warning, you could
df = df.copy()
or
df.is_copy = None
Related
I am trying first to slice a some columns from original dataframe and then add the additional column 'INDEX' to the last column.
df = df.iloc[:, np.r_[10:17]] #col 0~6
df['INDEX'] = df.index #col 7
I have the error message of second line saying 'A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead'
Why am I seeing this and how should I solve it?
I would do
df.loc[:,'INDEX'] = df.index
by default Python does shallow copy of dataframe. So whatever operations are performed on dataframe, it will actually performed on originall data frame. and the message is exactly indicates that.
Either of below will make the Python interpreter happy 😃 :
df = df.iloc[:, np.r_[10:17]].copy()
or
df.loc[:, ['INDEX']] = df.index
I tried filling the NA values of a column in a dataframe with:
df1 = data.copy()
df1.columns = data.columns.str.lower()
df2 = df1[['passangerid', 'trip_cost','class']]
df2['class'] = df2['class'].fillna(0)
df2
Although getting this error:
:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
df2['class'] = df2['class'].fillna(0, axis = 0)
Can someone please help?
First of all I'd advise you to follow the warning message and read up on the caveats in the provided link.
You're getting this warning (not an error) because your df2 is a slice of your df1, not a separate DataFrame.
To avoid getting this warning you can use .copy() method as:
df2 = df1[['passangerid', 'trip_cost','class']].copy()
I have something like this,
df1 = ...
df1['NEW_COLUMN'] = df1['SOME_COLUMN'].apply(lambda x: ...)
Although this works and I get the column 'NEW_COLUMN' added to the dataframe, I get this following annying warning. Why? And what is the solution?
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you simply want to avoid getting warned, you can set it in pandas options. If you understand why the warning is, and why is it happening then you can simply ignore it by adding this after importing pandas:
pd.options.mode.chained_assignment = None
Add copy() to avoid getting this warning
df = pd.DataFrame({"Value" : [0.12,0.22,0.32,0.11,0.54,0.55,0.98]})
df['Category'] = df.Value.apply(lambda x: 'Neg' if x < 0.5 else 'Pos').copy()
So I want to create a function in which a part of the codes modifies an existing pandas dataframe df and under some conditions, the df will be modified to empty. The challenge is that this function is now allwoed to return the dataframe itself; it can only modify the df by handling the alias. An example of this is the following function:
import pandas as pd
import random
def random_df_modifier(df):
letter_lst = list('abc')
message_lst = [f'random {i}' for i in range(len(letter_lst) - 1)] + ['BOOM']
chosen_tup = random.choice(list(zip(letter_lst, message_lst)))
df[chosen_tup[0]] = chosen_tup[1]
if chosen_tup[0] == letter_lst[-1]:
print('Game over')
df = pd.DataFrame()#<--this line won't work as intended
return chosen_tup
testing_df = pd.DataFrame({'col1': [True, False]})
print(random_df_modifier(testing_df))
I am aware of the reason df = pd.DataFrame() won't work is because the local df is now associated with the pd.DataFrame() instead of the mutable alias of the input dataframe. so is there any way to change the df inplace to an empty dataframe?
Thank you in advance
EDIT1: df.drop(df.index, inplace=True) seems to work as intended, but I am not sure about its efficientcy because df.drop() may suffer from performance issue
when the dataframe is big enough(by big enough I mean 1mil+ total entries).
df = pd.DataFrame(columns=df.columns)
will empty a dataframe in pandas (and be way faster than using the drop method).
I believe that is what your asking.
I'm fairly new to pandas, and was getting the infamous SettingWithCopyWarning in a large piece of code. I boiled it down to the following:
import pandas as pd
df = pd.DataFrame([[0,3],[3,3],[3,1],[1,1]], columns=list('AB'))
df
df = df.loc[(df.A>1) & (df.B>1)]
df['B'] = 10
When I run this I get the warning:
main:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
The strange thing is that if I leave off the "df" line it runs without a warning. Is this intended behavior?
In general, if I want to filter a DataFrame by the values across various columns, do I need to do a copy() to avoid the SettingWithCopyWarning?
thanks very much
Assuming your DataFrame as below from your question, this will avoid SettingWithCopyWarning
There is github Discussion and solution suggested by one of the Pandas developer Jeff :)
df
A B
1 3 3
Best to use this way.
df['B'] = df['B'].replace(3, 10)
df
A B
1 3 10