trying to lower and strip a column in python 3 using panda, but getting the warning-- what is the right way so this warning will not come up
df["col1"] = df[["col1"]].apply(lambda x: x.str.strip())
df["col1"] = df[["col1"]].apply(lambda x: x.str.lower())
The warning
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[k1] = value[k2]
how to remove the warning
To get rid of this warning apply it to a series instead of a dataframe. Using df[["col1"]] is creating a new dataframe that you are then setting to the column. If you instead just modify the column it'll be fine. Additionally, I chained the two together.
df["col1"] = df["col1"].str.strip().str.lower()
Related
I am trying first to slice a some columns from original dataframe and then add the additional column 'INDEX' to the last column.
df = df.iloc[:, np.r_[10:17]] #col 0~6
df['INDEX'] = df.index #col 7
I have the error message of second line saying 'A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead'
Why am I seeing this and how should I solve it?
I would do
df.loc[:,'INDEX'] = df.index
by default Python does shallow copy of dataframe. So whatever operations are performed on dataframe, it will actually performed on originall data frame. and the message is exactly indicates that.
Either of below will make the Python interpreter happy 😃 :
df = df.iloc[:, np.r_[10:17]].copy()
or
df.loc[:, ['INDEX']] = df.index
I tried filling the NA values of a column in a dataframe with:
df1 = data.copy()
df1.columns = data.columns.str.lower()
df2 = df1[['passangerid', 'trip_cost','class']]
df2['class'] = df2['class'].fillna(0)
df2
Although getting this error:
:5: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation:
df2['class'] = df2['class'].fillna(0, axis = 0)
Can someone please help?
First of all I'd advise you to follow the warning message and read up on the caveats in the provided link.
You're getting this warning (not an error) because your df2 is a slice of your df1, not a separate DataFrame.
To avoid getting this warning you can use .copy() method as:
df2 = df1[['passangerid', 'trip_cost','class']].copy()
I have something like this,
df1 = ...
df1['NEW_COLUMN'] = df1['SOME_COLUMN'].apply(lambda x: ...)
Although this works and I get the column 'NEW_COLUMN' added to the dataframe, I get this following annying warning. Why? And what is the solution?
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you simply want to avoid getting warned, you can set it in pandas options. If you understand why the warning is, and why is it happening then you can simply ignore it by adding this after importing pandas:
pd.options.mode.chained_assignment = None
Add copy() to avoid getting this warning
df = pd.DataFrame({"Value" : [0.12,0.22,0.32,0.11,0.54,0.55,0.98]})
df['Category'] = df.Value.apply(lambda x: 'Neg' if x < 0.5 else 'Pos').copy()
I have a dataframe where three of the columns are coordinates of data ('H_x', 'H_y' and 'H_z'). I want to calculate radius-vector of the data and add it as a new column in my dataframe. But I have some kind of problem with pandas apply function.
My code is:
def radvec(x, y, z):
rv=np.sqrt(x**2+y**2+z**2)
return rv
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
The error I'm getting is:
group_sh.py:78: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
I get column that I want, but I'm still confused with this error message.
I'm aware there are similar questions here, but I couldn't find how to solve my problem. I'm fairly new to python. Can you help?
Edit: halo_field is a slice of another dataframe:
halo_field = halo_res[halo_res.N_subs==1]
The problem is you're working with a slice, which can be ambiguous:
halo_field = halo_res[halo_res.N_subs==1]
You have two options:
Work on a copy
You can explicitly copy your dataframe to avoid the warning and ensure your original dataframe is unaffected:
halo_field = halo_res[halo_res.N_subs==1].copy()
halo_field['rh_field'] = halo_field.apply(...)
Work on the original dataframe conditionally
Use pd.DataFrame.loc with a Boolean mask to update your original dataframe:
mask = halo_res['N_subs'] == 1
halo_res.loc[mask, 'rh_field'] = halo_res.loc[mask, 'rh_field'].apply(...)
Don't use apply
As a side note, in either scenario you can avoid apply for your function. For example:
halo_field['rh_field'] = (halo_field[['H_x', 'H_y', 'H_z']]**2).sum(1)**0.5
I'm just trying to convert a column of numeric strings to ints. This is what I'm trying:
df.date = df.date.astype(np.int64)
But I'm getting the warning:
/Users/austin/anaconda/lib/python3.5/site-packages/pandas/core/generic.py:2773:
SettingWithCopyWarning: A value is trying to be set on a copy of a
slice from a DataFrame. Try using .loc[row_indexer,col_indexer] =
value instead
See the caveats in the documentation:
http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[name] = value
Not sure what this means. I also tried:
df.date = df.date.apply(int)
And I get the same warning as above.
Why doesn't this work and what's the proper way?
astype function returns a new array. You need to assign the result:
date = date.astype(int)
x = pd.DataFrame(['20.1','19.1','12.3'])
x[0].convert_objects(convert_numeric=True)