Add new column to dataframe - SettingWithCopyWarning - python

I have a pandas dataframe (pandas version '0.24.2') and a list which have the same length.
I want to add this list as a column to the dataframe.
I do this:
df.loc[:, 'new_column'] = pd.Series(my_List, index=df.index)
but I receive this warning:
.../anaconda/lib/python3.7/site-packages/pandas/core/indexing.py:362: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
.../anaconda/lib/python3.7/site-packages/pandas/core/indexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
Am I doing something wrong?

Related

"A value is trying to be set on a copy of a slice from a DataFrame" Warning while creating a new column

After a simple operation of creating a new column by combining two string columns, I got a warning message.
Should I manage it somehow?
df['City-State'] = df['City'] + '-' + df['State[c]']
C:\Users\Lenovo\AppData\Local\Temp\ipykernel_3976\1005561990.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
UPDATE
I've tried to use .iloc while creating a new column, as it said in the warning:
Option 1:
df['City-State'] = df[combine_city_state].iloc[:].apply(lambda row: '-'.join(row.values.astype(str)), axis=1)
Option 2:
df['City-State'] = df['City'].iloc[:] + '-' + df['State[c]'].iloc[:]
The warning remained.

Discretize all the columns in a dataframe pyton

I have a dataframe where all the columns are continous variables, and I want to discretize them in binnings based on frequency (so the binnings have the same size).
In order to do this I just apply the pd.cut function and iterate through the columns, however I'm getting the following errors:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df_q[column] = pd.qcut(df_q[column], 3)
<ipython-input-46-87e2efb9d039>:3: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
You can find a RepEx here:
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
# Load data
data = datasets.load_breast_cancer()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Remove categorical variable and bin
df_q = df.loc[:, df.columns != "target"]
for column in df_q:
df_q[column] = pd.qcut(df_q[column], 3)
I do not get any error or warning message when running your code. Do try to make a copy of df before creating df_q.
df2 = df.copy()
df_q = df2.loc[:, df2.columns != "target"]

fillna and copy of a slice problem even after .loc

I am trying to fillna a column of dataframe like the following,
df['temp'] = df['temp'].fillna(method='ffill')
and I am getting,
var/folders/qp/lp_5yt3s65q_pj__6v_kdvnh0000gn/T/ipykernel_10842/2929940072.py:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['temp'] = df['temp'].fillna(method='ffill')
I revised the code to the following but I am still getting the same error. Do you have any suggestions?
df.loc[:,'temp'] = df['temp'].fillna(method='ffill')

Getting rid of Error from Fillna in pandas

I tried filling the NA values of a column in a dataframe with:
df1 = data.copy()
df1.columns = data.columns.str.lower()
df2 = df1[['passangerid', 'trip_cost','class']]
df2['class'] = df2['class'].fillna(0)
df2
Although getting this error:
:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
df2['class'] = df2['class'].fillna(0, axis = 0)
Can someone please help?
First of all I'd advise you to follow the warning message and read up on the caveats in the provided link.
You're getting this warning (not an error) because your df2 is a slice of your df1, not a separate DataFrame.
To avoid getting this warning you can use .copy() method as:
df2 = df1[['passangerid', 'trip_cost','class']].copy()

pandas error using df.astype

I'm just trying to convert a column of numeric strings to ints. This is what I'm trying:
df.date = df.date.astype(np.int64)
But I'm getting the warning:
/Users/austin/anaconda/lib/python3.5/site-packages/pandas/core/generic.py:2773:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[name] = value
Not sure what this means. I also tried:
df.date = df.date.apply(int)
And I get the same warning as above.
Why doesn't this work and what's the proper way?
astype function returns a new array. You need to assign the result:
date = date.astype(int)
x = pd.DataFrame(['20.1','19.1','12.3'])
x[0].convert_objects(convert_numeric=True)

Categories