datetime conversion of a column results in pandas warning - python

I am trying to convert a column in a pandas dataframe to datetime format as follows:
df["date"] = pd.to_datetime(df["date"])
Although this works as expected, pandas gives the following warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if sys.path[0] == '':
Is there a better way to to a datetime conversion of a pandas column that does not produce this warning?

This should get rid of the warning:
df.loc["date"] = pd.to_datetime(df["date"])
Pandas discourages it if you set a slice of a dataset. Generally, using .loc is the best way to go when accessing your data.

Related

How to avoid getting the SettingWithCopyWarning with pandas to_datetime method

I am using pandas 1.0.1 and I am creating a new column that converts the date column to a datetime column and I am getting the warning below. I tried using data.loc[:, "Datetime"] as well and I still got the same warning. Please how could this be avoided?
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
data["Datetime"] = pd.to_datetime(data["Date"], infer_datetime_format=True)
Most likely you created your source DataFrame as a view of another
DataFrame (only some columns and / or only some rows).
Find in your code the place where your DataFrame is created and append .copy() there.
Then your DataFrame will be created as a fully independent DataFrame (with its
own data buffer) and this warning should not appear any more.

Why can't I change the values ​in one pandas column only?

I have a column in my data frame where I have emails and not emails.
with this slice I can only get the fields that are without email:
df[~df['email'].str.contains('#', case=False)]['email']
But when I try to replace it with a value of my preference:
df[~df['email'].str.contains('#', case=False)]['email'] = 'No'
The column does not receive the change.
I don't get any error, just the following warning:
/home/rockstar/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
"""Entry point for launching an IPython kernel.
follows an image of my complete dataframe:
Also, df[~df['email'].str.contains('#',case=False)] = 'No' works perfectly but I end up losing data from the rest of the line
Refer the following example code.
import pandas as pd
df = pd.DataFrame({"E-mail":["abc#de", "abcde"]})
df['E-mail'].loc[~df['E-mail'].str.contains('#', case = False)] = 'No'

SettingWithCopyWarning after using copy()

I have code as below.
import pandas as pd
import numpy as np
data = [['Alex',10,5,0],['Bob',12,4,1],['Clarke',13,6,0],['brke',15,1,0]]
df = pd.DataFrame(data,columns=['Name','Age','weight','class'],dtype=float)
df_numeric=df.select_dtypes(include='number')#, exclude=None)[source]
df_non_numeric=df.select_dtypes(exclude='number')
df_non_numeric['class']=df_numeric['class'].copy()
it gives me below message
__main__:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
i want to have df_non_numeric independent from df_numeric
i used df_numeric['class'].copy() based upon suggestions given in other posts.
How could i avoid the message?
I think you need copy because DataFrame.select_dtypes is slicing operation, filtering by types of column, check Question 3:
df_numeric=df.select_dtypes(include='number').copy()
df_non_numeric=df.select_dtypes(exclude='number').copy()
If you modify values in df_non_numeric later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.

SettingWithCopyWarning Python changing column datatype in Dataframe

I have the following code:
block_table[[compared_attribute]] = block_table[[compared_attribute]].astype(int)
I want to change the datatype of a column. The code is working, but I get a warning from Python: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[k1] = value[k2]
I looked into this warning and I was reading it may be creating a copy of the dataframe, instead of just overwriting it, so I tried the following solutions with no luck...
block_table.loc[[compared_attribute]] = block_table[[compared_attribute]].astype(int)
block_table.loc[:,compared_attribute] = block_table[[compared_attribute]].astype(int)
It should be as simple as:
block_table.loc[:,compared_attribute] = block_table[compared_attribute].astype(int)
This is assuming compared attributes is by columns otherwise, switch the colon and compared_attribute in the loc part.
Also quite hard to answer without an example of what the data looks like and what compared_attribute looks like.

How to read_excel with a dayfirst condition?

I'm trying to read_excel through pandas. I have a date column in the format DD/MM/YYYY. Pandas will automatically read this as month first and as far as I've been able to tell there is no dayfirst function like there is with read_csv.
Is there a way to do read_excel while specifying date format?
xlxs_data = pd.DataFrame()
df = pd.read_excel('new.xlsx')
xlsx_data = xlxs_data.append(df, ignore_index=True, dayfirst=True)
TypeError: append() got an unexpected keyword argument 'dayfirst'
The dayfirst perimeter does not work with read_excel in Version 1.1.4 of pandas. The docs state "For non-standard datetime parsing, use pd.to_datetime after pd.read_excel."
So read in your data
df = pd.read_excel('new.xlsx', engine="openpyxl")
Then use this
pd.to_datetime(df['col_name'], dayfirst=True)
Or this
pd.to_datetime(df['col_name'], format='%d/%m/%Y')
Some info on format codes can be found here https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior
Remember that Pandas displays dates in ISO format YYYY-MM-DD. If you want to convert to a different format you need to convert the datetime object into a string. But if you do that you will lose all the functionality of datetime object so best to do that during export.
You can pass dayfirst=True as a param to read_excel, although the docs don't state this is a param it recognises, it accepts kwargs and will resolve your problem:
df = pd.read_excel('new.xlsx', dayfirst=True)

Categories