Using loc still rises SettingWithCopyWarning Warning while changing column - python

I want to filter URLs form text column of my df by filtering all http https like below:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ').str.strip()
I used the loc as advised in other answers but I still keep getting the warning.
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: The default value of regex will change from True to False in a future version.
"""Entry point for launching an IPython kernel.
time: 9.81 s (started: 2022-03-19 06:35:42 +00:00)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexing.py:1773: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self._setitem_single_column(ilocs[0], value, pi)
How to do this operation correctly ie. without the warning?
UPDATE:
I've generated data from kaggle dataset:
kaggle datasets download clmentbisaillon/fake-and-real-news-dataset
and then:
true_df.drop_duplicates(keep='first')
fake_df.drop_duplicates(keep='first')
true_df['is_fake'] = 0
fake_df['is_fake'] = 1
news_df = pd.concat([true_df, fake_df])
news_df = news_df.sample(frac=1).reset_index(drop=True)
drop_list = ['subject', 'date']
column_filter = news_df.filter(drop_list)
news_df.drop(column_filter, axis=1)
news_df['text_'] = news_df['title'] + news_df['text']
data = news_df[['text_', 'is_fake']]
Next for the following line:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ').str.strip()
I get that error from the start of the post.
UPDATE 2:
As mentioned by #Riley Adding the
data = data.copy()
Fix the SettingWithCopyWarning however the:
/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py:1: FutureWarning: The default value of regex will change from True to False in a future version.
"""Entry point for launching an IPython kernel.
Still remains. To fix it change regex=True fo replace:
data.loc[:,'text_'] = data['text_'].str.replace(r'\s*https?://\S+(\s+|$)', ' ', regex=True).str.strip()

Related

df.at warning message would like to optimize the code

I am using the following code to add a nuts value into new columns in dfEU dataframe. I came up with this code from the following past
pandas .at versus .loc
Is there a way to solve this warning?
dfEU was created using the following query:
dfEU = df.query('continent == "EU" & country_id == "Belgium"')
for row in dfEU.itertuples():
lati=float(getattr(row, 'locationlatitude'))
longi=float(getattr(row, 'locationlongitude'))
nuts = nf.find(lat=lati, lon=longi)
if nuts:
dfEU.at[row.Index, 'nuts1'] = nuts[0].get('NUTS_NAME')
dfEU.at[row.Index, 'nuts1id'] = nuts[0].get('FID')
dfEU.at[row.Index, 'nuts2'] = nuts[1].get('NUTS_NAME')
dfEU.at[row.Index, 'nuts2id'] = nuts[1].get('FID')
dfEU.at[row.Index, 'nuts3'] = nuts[2].get('NUTS_NAME')
dfEU.at[row.Index, 'nuts3id'] = nuts[2].get('FID')
else:
dfEU.at[row.Index, 'nuts1'] = 'Nan'
When launching the code I receive the following warning:
C:\Users\win\anaconda3\lib\site-packages\pandas\core\indexing.py:1596: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
self.obj[key] = _infer_fill_value(value)
C:\Users\win\anaconda3\lib\site-packages\pandas\core\indexing.py:1765: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
isetter(loc, value)

Python DataFrame Issue with Warning

I am having trouble finding a solution for SettingWithCopyWarning in Jupyter Notebook. I would appreciate any insight and/or solutions. Thank you in advance.
Code:
matches2['players'] = list(zip(matches2['player_1_name'], matches2['player_2_name']))
g = matches2.groupby('players')
df_list = []
for group, df in g:
df = df[['winner']]
n = df.shape[0]
player_1_h2h = np.zeros(n)
player_2_h2h = np.zeros(n)
p1 = group[0]
p2 = group[1]
for i in range(1,n):
if df.iloc[i-1,0] == p1:
player_1_h2h[i] = player_1_h2h[i-1] + 1
player_2_h2h[i] = player_2_h2h[i-1]
else:
player_1_h2h[i] = player_1_h2h[i-1]
player_2_h2h[i] = player_2_h2h[i-1] + 1
df['player_1_h2h'] = player_1_h2h
df['player_2_h2h'] = player_2_h2h
df_list.append(df)
Error:
<ipython-input-214-d8e04df2295c>:32: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['player_1_h2h'] = player_1_h2h
<ipython-input-214-d8e04df2295c>:33: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['player_2_h2h'] = player_2_h2h
I would recomend disabling the warning.
import pandas as pd
pd.options.mode.chained_assignment = None
For more information on this behavior see this question and search for the Garrett's answer
You can ignore this warning, as it's a false positive in this case, but if you want to avoid it entirely, you can change
df['player_1_h2h'] = player_1_h2h
df['player_2_h2h'] = player_2_h2h
to
df = df.assign(
player_1_h2h=player_1_h2h,
player_2_h2h=player_2_h2h
)

The pandas value error still shows, but the code is totally correct and it loads normally the visualization

I really wanted to use pd.options.mode.chained_assignment = None, but I wanted a code clean of error.
My start code:
import datetime
import altair as alt
import operator
import pandas as pd
s = pd.read_csv('../../data/aparecida-small-sample.csv', parse_dates=['date'])
city = s[s['city'] == 'Aparecida']
Based on #dpkandy's code:
city['total_cases'] = city['totalCases']
city['total_deaths'] = city['totalDeaths']
city['total_recovered'] = city['totalRecovered']
tempTotalCases = city[['date','total_cases']]
tempTotalCases["title"] = "Confirmed"
tempTotalDeaths = city[['date','total_deaths']]
tempTotalDeaths["title"] = "Deaths"
tempTotalRecovered = city[['date','total_recovered']]
tempTotalRecovered["title"] = "Recovered"
temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)
totalCases = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_cases:Q', title = None))
totalDeaths = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_deaths:Q', title = None))
totalRecovered = alt.Chart(temp).mark_bar().encode(alt.X('date:T', title = None), alt.Y('total_recovered:Q', title = None))
(totalCases + totalRecovered + totalDeaths).encode(color=alt.Color('title', scale = alt.Scale(range = ['#106466','#DC143C','#87C232']), legend = alt.Legend(title="Legend colour"))).properties(title = "Cumulative number of confirmed cases, deaths and recovered", width = 800)
This code works perfectly and loaded normally the visualization image, but it still shows the pandas error, asking to try to set .loc[row_indexer,col_indexer] = value instead, then I was reading the documentation "Returning a view versus a copy" whose linked cited and also tried this code, but it still shows the same error. Here is the code with loc:
# 1st attempt
tempTotalCases.loc["title"] = "Confirmed"
tempTotalDeaths.loc["title"] = "Deaths"
tempTotalRecovered.loc["title"] = "Recovered"
# 2nd attempt
tempTotalCases["title"].loc = "Confirmed"
tempTotalDeaths["title"].loc = "Deaths"
tempTotalRecovered["title"].loc = "Recovered"
Here is the error message:
<ipython-input-6-f16b79f95b84>:6: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalCases["title"] = "Confirmed"
<ipython-input-6-f16b79f95b84>:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalDeaths["title"] = "Deaths"
<ipython-input-6-f16b79f95b84>:12: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
tempTotalRecovered["title"] = "Recovered"
Jupyter and Pandas version:
$ jupyter --version
jupyter core : 4.7.1
jupyter-notebook : 6.3.0
qtconsole : 5.0.3
ipython : 7.22.0
ipykernel : 5.5.3
jupyter client : 6.1.12
jupyter lab : 3.1.0a3
nbconvert : 6.0.7
ipywidgets : 7.6.3
nbformat : 5.1.3
traitlets : 5.0.5
$ pip show pandas
Name: pandas
Version: 1.2.4
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: None
Author-email: None
License: BSD
Location: /home/gus/PUC/.env/lib/python3.9/site-packages
Requires: pytz, python-dateutil, numpy
Required-by: ipychart, altair
Update 2
I followed the answer, it worked, but there is another problem:
temp = tempTotalCases.append(tempTotalDeaths)
temp = temp.append(tempTotalRecovered)
Error log:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
iloc._setitem_with_indexer(indexer, value, self.name)
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-7-b2649a676837> in <module>
17 tempTotalRecovered.loc["title"] = _("Recovered")
18
---> 19 temp = tempTotalCases.append(tempTotalDeaths)
20 temp = temp.append(tempTotalRecovered)
21
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/frame.py in append(self, other, ignore_index, verify_integrity, sort)
7980 to_concat = [self, other]
7981 return (
-> 7982 concat(
7983 to_concat,
7984 ignore_index=ignore_index,
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
296 )
297
--> 298 return op.get_result()
299
300
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/reshape/concat.py in get_result(self)
514 obj_labels = obj.axes[1 - ax]
515 if not new_labels.equals(obj_labels):
--> 516 indexers[ax] = obj_labels.get_indexer(new_labels)
517
518 mgrs_indexers.append((obj._mgr, indexers))
~/GitLab/Gustavo/global/.env/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
3169
3170 if not self.is_unique:
-> 3171 raise InvalidIndexError(
3172 "Reindexing only valid with uniquely valued Index objects"
3173 )
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
This SettingWithCopyWarning is a warning and not an error. The importance in this distinction is that pandas isn't sure whether your code will produce the intended output so is letting the programmer make this decision where as a error means that something is definitely wrong.
The SettingWithCopyWarning is warning you about the difference between when you do something like df['First selection']['Second selection'] compared to df.loc[:, ('First selection', 'Second selection').
In the first case 2 separate events occur df['First selection'] takes place, then the object returned from this is used for the next seleciton returned_df['Second selection']. pandas has no way to know whether the returned_df is the original df or just temporary 'view' of this object. Most of the time is doesn't matter (see docs for more info)...but if you want to change a value on a temporary view of an object you'll be confused as to why your code runs error free but you don't see changes you made reflected. Using .loc bundles 'First selection' and 'Second selection' into one call so pandas can guarantee that what's returned is not just a view.
The documentation you linked show's you why your attempts to use .loc didn't work at you intended (eg. taken from docs):
def do_something(df):
foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
# ... many lines here ...
# We don't know whether this will modify df or not!
foo['quux'] = value
return foo
You have something similar in your code. Look at how tempTotalCases is created:
city = s[s['city'] == 'Aparecida']
# some lines of code
tempTotalCases = city[['date','total_cases']]
And then some more lines of code before you attempt to do:
tempTotalCases.loc["title"] = "Confirmed"
So pandas throws the warning.
Separate from your original question you might find df.rename() useful. Link to docs.
You'll be able to do something like:
city = city.rename(columns={'totalCases': 'total_cases',
'totalDeaths': 'total_deaths',
'totalRecovered': 'total_recovered})

How to assign Numpy array values to other variable

This is my code:
y_predForThisMatchType = model.predict(X_test, num_iteration=model.best_iteration)
print(type(y_predForThisMatchType))
y_predForThisMatchType = y_predForThisMatchType.reshape(-1)
print(type(y_predForThisMatchType))
count = 0
for i in range (len(y_pred)):
if y_pred.loc[i] == abType:
y_pred.loc[i] = y_predForThisMatchType[count]
count = count + 1
Output:
class 'numpy.ndarray'
class 'numpy.ndarray'
/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py:189: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self._setitem_with_indexer(indexer, value)
Python just print the above output, and that's all. The program is technically running, but below code do not get executed, no real error is shown.
Error Line:
y_pred.loc[i] = y_predForThisMatchType[count]
y_pred variable is a pandas dataframe.
Have you checked your outputs completely?
In my experience, your code is working.
The display is just a warning and can be disabled with:
pandas.options.mode.chained_assignment = None # default='warn'

Pandas SettingWithCopyWarning over re-ordering column's categorical values [duplicate]

Jupiter nootbook is returning this warning:
*C:\anaconda\lib\site-packages\pandas\core\indexing.py:337: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[key] = _infer_fill_value(value)
C:\anaconda\lib\site-packages\pandas\core\indexing.py:517: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s*
After runing the following code:
def group_df(df,num):
ln = len(df)
rang = np.arange(ln)
splt = np.array_split(rang,num)
lst = []
finel_lst = []
for i,x in enumerate(splt):
lst.append([i for x in range(len(x))])
for k in lst:
for j in k:
finel_lst.append(j)
df['group'] = finel_lst
return df
def KNN(dafra,folds,K,fi,target):
df = group_df(dafra,folds)
avarge_e = []
for i in range(folds):
train = df.loc[df['group'] != i]
test = df.loc[df['group'] == i]
test.loc[:,'pred_price'] = np.nan
test.loc[:,'rmse'] = np.nan
print(test.columns)
KNN(data,5,5,'GrLivArea','SalePrice')
In the error message, it is recommended to use .loc indexing- which i did, but it did not help. Please help me- what is the problem ? I have went through the related questions and read the documentation, but i still don't get it.
I think you need copy:
train = df.loc[df['group'] != i].copy()
test = df.loc[df['group'] == i].copy()
If you modify values in test later you will find that the modifications do not propagate back to the original data (df), and that Pandas does warning.

Categories