Replace pandas iterrows() with apply & lambda - python

Working code to add a new column to Pandas DataFrame using iterrows(). But it is extremely slow
df1['Kite_Token'] =0
for row,col in df.iterrows():
df1.loc[row, 'Kite_Token'] = df_instrument[(df_instrument.exchange_token==col.SC_CODE) & (df_instrument.exchange == 'BSE')].instrument_token.to_string(index=False)
Here is my try to use apply and lambda to the above code but am getting the copy warning. Any faster way to accomplish without warning message?
df1['Kite_Token']= df1['SC_CODE'].apply(lambda x : df_instrument[(df_instrument.exchange_token==x) & (df_instrument.exchange == 'BSE')].instrument_token.to_string(index=False))
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Related

Subtracting Pandas columns caveat

There are many similar questions to this one, but I couldn't find the one that answers my questions specifically.
Firstly, when I run something like this
df['new_col'] = df['col2'] - df['col1']
I get a warning saying "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead".
If I then try to run something like this
df.loc[:, 'new_col'] = df['col2'] - df['col1']
I get a "SettingWithCopyWarning" warning with the same message "A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead".
Using the apply and lambda functions as suggested by some answers in other posts as raises a "SettingWithCopyWarning" warning and also seems to be a slow operation.
df.loc[:, 'new_col'] = df.apply(lambda x: x['col2'] - x['col1'], axis=1)
I read the documentation pages, but I'm afraid I don't completely understand them, otherwise it would be clear to me what the correct format to make such calculation would be.
Right, so my question is: how do I subtract to columns of a Pandas dataframe to create a new for the same dataframe in the correct way so that Pandas is happy? Thank you!
Try adding df = df.copy():
df = df.copy()
df['new_col'] = df['col2'] - df['col1']

Why do I get "a value is trying to be set on a copy of a slice" when creating a new column with apply?

I have something like this,
df1 = ...
df1['NEW_COLUMN'] = df1['SOME_COLUMN'].apply(lambda x: ...)
Although this works and I get the column 'NEW_COLUMN' added to the dataframe, I get this following annying warning. Why? And what is the solution?
SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
If you simply want to avoid getting warned, you can set it in pandas options. If you understand why the warning is, and why is it happening then you can simply ignore it by adding this after importing pandas:
pd.options.mode.chained_assignment = None
Add copy() to avoid getting this warning
df = pd.DataFrame({"Value" : [0.12,0.22,0.32,0.11,0.54,0.55,0.98]})
df['Category'] = df.Value.apply(lambda x: 'Neg' if x < 0.5 else 'Pos').copy()

New column using apply function on other columns in dataframe

I have a dataframe where three of the columns are coordinates of data ('H_x', 'H_y' and 'H_z'). I want to calculate radius-vector of the data and add it as a new column in my dataframe. But I have some kind of problem with pandas apply function.
My code is:
def radvec(x, y, z):
rv=np.sqrt(x**2+y**2+z**2)
return rv
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
The error I'm getting is:
group_sh.py:78: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-
docs/stable/indexing.html#indexing-view-versus-copy
halo_field['rh_field']=halo_field.apply(lambda row: radvec(row['H_x'], row['H_y'], row['H_z']), axis=1)
I get column that I want, but I'm still confused with this error message.
I'm aware there are similar questions here, but I couldn't find how to solve my problem. I'm fairly new to python. Can you help?
Edit: halo_field is a slice of another dataframe:
halo_field = halo_res[halo_res.N_subs==1]
The problem is you're working with a slice, which can be ambiguous:
halo_field = halo_res[halo_res.N_subs==1]
You have two options:
Work on a copy
You can explicitly copy your dataframe to avoid the warning and ensure your original dataframe is unaffected:
halo_field = halo_res[halo_res.N_subs==1].copy()
halo_field['rh_field'] = halo_field.apply(...)
Work on the original dataframe conditionally
Use pd.DataFrame.loc with a Boolean mask to update your original dataframe:
mask = halo_res['N_subs'] == 1
halo_res.loc[mask, 'rh_field'] = halo_res.loc[mask, 'rh_field'].apply(...)
Don't use apply
As a side note, in either scenario you can avoid apply for your function. For example:
halo_field['rh_field'] = (halo_field[['H_x', 'H_y', 'H_z']]**2).sum(1)**0.5

how to remove this warning in python 3

trying to lower and strip a column in python 3 using panda, but getting the warning-- what is the right way so this warning will not come up
df["col1"] = df[["col1"]].apply(lambda x: x.str.strip())
df["col1"] = df[["col1"]].apply(lambda x: x.str.lower())
The warning
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self[k1] = value[k2]
how to remove the warning
To get rid of this warning apply it to a series instead of a dataframe. Using df[["col1"]] is creating a new dataframe that you are then setting to the column. If you instead just modify the column it'll be fine. Additionally, I chained the two together.
df["col1"] = df["col1"].str.strip().str.lower()

Python Pandas SettingWithCopyWarning copies vs new objects

I'm working with a dataframe 'copy' created by sub-setting a previous one - see below:
import random
import pandas as pd
df = pd.DataFrame({'data':list(random.sample(range(10,100),25))})
df_filtered = df.query('data > 20 and data < 80')
df_filtered.rename(columns={'data':'observations'},inplace=True)
The problem is, when the rename method is called I receive a SettingWithCopy warning that, as I understand it, means I'm operating on a copy of the original (df in this case) object. The warning text is: "A value is trying to be set on a copy of a slice from a DataFrame"
I found this question that was answered using a different approach to subsetting. I prefer the Dataframe.query() method myself (syntax-wise). Is there a way I can create a new Dataframe object using the.query() method rather than the method suggested in the question I linked? I've tried a few options with iloc but haven't been successful thus-far.
You can always explicitly make a copy by calling .copy() on your filtered dataframe. Concretely, replace
df_filtered = df.query('data > 20 and data < 80')
with
df_filtered = df.query('data > 20 and data < 80').copy()
Does that get rid of the warning?
try this instead of using inplace=True:
In [12]: df_filtered = df.query('data > 20 and data < 80')
In [13]: df_filtered = df_filtered.rename(columns={'data':'observations'})
.rename() function returns a new object, so you can simply overwrite your DF with the returned new DF
if you use inplace the following is happening
from docs:
inplace : boolean, default False
Whether to return a new DataFrame. If True then value of copy is ignored.
Returns:
renamed : DataFrame (new object)
PS basically you should try to avoid using inplace=True and use df = df.function(...) technique instead

Categories