I am trying to slice a dataframe in pandas and save it to another dataframe. Let's say df is an existing dataframe and I want to slice a certain section of it into df1. However, I am getting this warning:
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
I have checked various posts in SO discussing the similar issue (two of them are following):
Setting with copy warning
How to deal with SettingWithCopyWarning in Pandas?
Going through these links, I was able to find issue in following line
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key][df[key].Consumption_Category=='GG']
which I then changed to following
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
and get away with the warning.
However, when I included another line to drop a certain column from this dataframe, I again this got warning which I am unable to dort out.
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
df1['GG'][key]=df1['GG'][key].drop(['Consumption_Category'],axis=1,inplace=True)
Finally after lot of research and going through pandas documentation, I found the answer to my question. The warning which I was getting is because I have put inplace=True in the drop() function. So, I removed the inplace=True and saved the result into new datafrmae. Now I do not get any warning.
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
df1['GG'][key]=df1['GG'][key].drop(['Consumption_Category'],axis=1)
Related
I am using pandas 1.0.1 and I am creating a new column that converts the date column to a datetime column and I am getting the warning below. I tried using data.loc[:, "Datetime"] as well and I still got the same warning. Please how could this be avoided?
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data["Datetime"] = pd.to_datetime(data["Date"], infer_datetime_format=True)
Most likely you created your source DataFrame as a view of another
DataFrame (only some columns and / or only some rows).
Find in your code the place where your DataFrame is created and append .copy() there.
Then your DataFrame will be created as a fully independent DataFrame (with its
own data buffer) and this warning should not appear any more.
New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.
New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.
I'm trying to change values in my DataFrame after merging it with another DataFrame and coming across some issues (doesn't appear to be an issue prior to merging).
I am indexing and changing values in my DataFrame with:
df.iloc[0]['column'] = 1
Subsequently I've joined (left outer join) along both indexes using merge (I realize left.join(right) would work too). After this when I perform the same value assignment using iloc, I receive the following warning:
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A review of the linked document doesn't clarify the understanding hence, am I using an incorrect method of slicing with iloc? (keeping in mind I require positional based slicing for the purpose of my code)
I notice that df.ix[0,'column'] = 1 works, and similarly based on this page I can reference the column location with df.columns.get_loc('column') but on the surface this seems unnecessarily convoluted.
What's the difference between these methods under the hood, and what about merging causes the previous method (df.iloc[0]['column']) to break?
You are using chained indexing above, this is to be avoided "df.iloc[0]['column'] = 1" and generates the SettingWithCopy Warning you are getting. The Pandas docs are a bit complicated but see SettingWithCopy Warning with chained indexing for the under the hood explanation on why this does not work.
Instead you should use df.loc[0, 'column'] = 1
.loc is for "Access a group of rows and columns by label(s) or a boolean array."
.iloc is for "Purely integer-location based indexing for selection by position."
It sucks, but the best solution I've come so far about updating a dataframe's column based on the .ilocs is find the iloc of a column, then use .iloc for everything:
column_i_loc = np.where(df.columns == 'column')[0][0]
df.iloc[0, column_i_loc] = 1
Note you could also disable the warning, but really do not!...
Also, if you face this warning and were not trying to update some original DataFrame, then you forgot to make a copy and end up with a nasty bug...
This question already has answers here:
Pandas still getting SettingWithCopyWarning even after using .loc
(3 answers)
Closed 6 years ago.
I'm trying to modify a single "cell" in a dataframe. Now, modification works, but I get this warning:
In [131]: df.loc[df['Access date'] == '06/01/2016 00:35:34', 'Title'] = 'XXXXXXXX'
ipython:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Per Pandas: Replacing column values in dataframe I am using .loc method, yet I get this warning (I don't see a copy of dataframe that I'm supposedly modifying anywhere here)
Should this warning happen here? If not, how do I disable it?
UPDATE
It seems that df is a (weakref) copy of another dataframe (checked with .is_copy).
That link in the warning addresses the issue in detail under the section: Why does assignment fail when using chained indexing?
Summary of the section: pandas makes no guarantee on the memory handling of arrays in certain situations so the warning is there, even with certain implementations of .loc, to tell you that this could be wildly inefficient.
To turn off warnings, you can use the warnings library and execute the following code in one of your ipython notebook cells.
import warnings
warnings.catch_warnings()
warnings.simplefilter("ignore")