How to avoid getting the SettingWithCopyWarning with pandas to_datetime method - python

I am using pandas 1.0.1 and I am creating a new column that converts the date column to a datetime column and I am getting the warning below. I tried using data.loc[:, "Datetime"] as well and I still got the same warning. Please how could this be avoided?
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data["Datetime"] = pd.to_datetime(data["Date"], infer_datetime_format=True)

Most likely you created your source DataFrame as a view of another
DataFrame (only some columns and / or only some rows).
Find in your code the place where your DataFrame is created and append .copy() there.
Then your DataFrame will be created as a fully independent DataFrame (with its
own data buffer) and this warning should not appear any more.

Related

Pandas: .loc not assigning row of one data frame to a row of a slice of another dataframe

I have a dataframe as below from which I take a slice called NDCSPart_df using NDCSPart_df = Register_df.iloc[:, :17]
This NDCSPart_df needs to be updated by the latest dataframe NOTES_df of same column length but some with different values, and the same or larger number of rows.
I compare a row of NDCSPart_df and NOTES_df using the "MainDocID" to identify any changes and if there are any changes, the row in NDCSPart_df will be assigned the value of the row with the same "MainDocID" in NOTES_df.
for i in ChangedDocumentIDDict.keys():
NDCSPart_df.loc[NDCSPart_df["MainDocID"]==i,:].update(NOTES_df.loc[NOTES_df["MainDocID"]==i,:])
which gives me the following warning,
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py:5516:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[col] = expressions.where(mask, this, that)
Likewise I tried the following code:
for i in ChangedDocumentIDDict.keys():
NDCSPart_df.loc[NDCSPart_df["MainDocID"]==i,:]= NOTES_df.loc[NOTES_df["MainDocID"]==i,:]
with similar warning:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py:190:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:3:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
This is separate from the ipykernel package so we can avoid doing
imports until
But my concern is with the fact that assignment fails with NaN values populated which should have values of the row against index 78 ofNOTES_df as indicated in the second snapshot.
I am using Python 3.7.3, pandas 0.24.2 and I have tried Python 3.6.6, pandas 0.23.4 with the same results.
My question is:
How am I using .loc incorrectly?
How can I assign the rows of NOTES_df to NDCSPart_df?
This is more like the index of two df after filter with condition is different, so we need adding the .values more info
for i in ChangedDocumentIDDict.keys():
NDCSPart_df.loc[NDCSPart_df["MainDocID"]==i,:]= NOTES_df.loc[NOTES_df["MainDocID"]==i,:].values

SettingWithCopyWarning after using copy()

I have code as below.
import pandas as pd
import numpy as np
data = [['Alex',10,5,0],['Bob',12,4,1],['Clarke',13,6,0],['brke',15,1,0]]
df = pd.DataFrame(data,columns=['Name','Age','weight','class'],dtype=float)
df_numeric=df.select_dtypes(include='number')#, exclude=None)[source]
df_non_numeric=df.select_dtypes(exclude='number')
df_non_numeric['class']=df_numeric['class'].copy()
it gives me below message
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
i want to have df_non_numeric independent from df_numeric
i used df_numeric['class'].copy() based upon suggestions given in other posts.
How could i avoid the message?
I think you need copy because DataFrame.select_dtypes is slicing operation, filtering by types of column, check Question 3:
df_numeric=df.select_dtypes(include='number').copy()
df_non_numeric=df.select_dtypes(exclude='number').copy()
If you modify values in df_non_numeric later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.

Pandas Dataframe Sorting by Date Time [duplicate]

New to Pandas, so maybe I'm missing a big idea?
I have a Pandas DataFrame of register transactions with shape like (500,4):
Time datetime64[ns]
Net Total float64
Tax float64
Total Due float64
I'm working through my code in a Python3 Jupyter notebook. I can't get past sorting any column. Working through the different code examples for sort, I'm not seeing the output reorder when I inspect the df. So, I've reduced the problem to trying to order just one column:
df.sort_values(by='Time')
# OR
df.sort_values(['Total Due'])
# OR
df.sort_values(['Time'], ascending=True)
No matter which column title, or which boolean argument I use, the displayed results never change order.
Thinking it could be a Jupyter thing, I've previewed the results using print(df), df.head(), and HTML(df.to_html()) (the last example is for Jupyter notebooks). I've also rerun the whole notebook from import CSV to this code. And, I'm also new to Python3 (from 2.7), so I get stuck with that sometimes, but I don't see how that's relevant in this case.
Another post has a similar problem, Python pandas dataframe sort_values does not work. In that instance, the ordering was on a column type string. But as you can see all of the columns here are unambiguously sortable.
Why does my Pandas DataFrame not display new order using sort_values?
df.sort_values(['Total Due']) returns a sorted DF, but it doesn't update DF in place.
So do it explicitly:
df = df.sort_values(['Total Due'])
or
df.sort_values(['Total Due'], inplace=True)
My problem, fyi, was that I wasn't returning the resulting dataframe, so PyCharm wasn't bothering to update said dataframe. Naming the dataframe after the return keyword fixed the issue.
Edit:
I had return at the end of my method instead of
return df,
which the debugger must of noticed, because df wasn't being updated in spite of my explicit, in-place sort.

Pandas Setting with copy warning

I am trying to slice a dataframe in pandas and save it to another dataframe. Let's say df is an existing dataframe and I want to slice a certain section of it into df1. However, I am getting this warning:
/usr/local/lib/python3.5/dist-packages/ipykernel_launcher.py:25: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
I have checked various posts in SO discussing the similar issue (two of them are following):
Setting with copy warning
How to deal with SettingWithCopyWarning in Pandas?
Going through these links, I was able to find issue in following line
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key][df[key].Consumption_Category=='GG']
which I then changed to following
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
and get away with the warning.
However, when I included another line to drop a certain column from this dataframe, I again this got warning which I am unable to dort out.
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
df1['GG'][key]=df1['GG'][key].drop(['Consumption_Category'],axis=1,inplace=True)
Finally after lot of research and going through pandas documentation, I found the answer to my question. The warning which I was getting is because I have put inplace=True in the drop() function. So, I removed the inplace=True and saved the result into new datafrmae. Now I do not get any warning.
Years=['2010','2011']
for key in Years:
df1['GG'][key]=df[key].loc[df['2010'].iloc[:,0]=='GG']
df1['GG'][key]=df1['GG'][key].drop(['Consumption_Category'],axis=1)

SettingWithCopyWarning Python changing column datatype in Dataframe

I have the following code:
block_table[[compared_attribute]] = block_table[[compared_attribute]].astype(int)
I want to change the datatype of a column. The code is working, but I get a warning from Python: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[k1] = value[k2]
I looked into this warning and I was reading it may be creating a copy of the dataframe, instead of just overwriting it, so I tried the following solutions with no luck...
block_table.loc[[compared_attribute]] = block_table[[compared_attribute]].astype(int)
block_table.loc[:,compared_attribute] = block_table[[compared_attribute]].astype(int)
It should be as simple as:
block_table.loc[:,compared_attribute] = block_table[compared_attribute].astype(int)
This is assuming compared attributes is by columns otherwise, switch the colon and compared_attribute in the loc part.
Also quite hard to answer without an example of what the data looks like and what compared_attribute looks like.

Categories