Pandas gives me a SettingWithCopyWarning [duplicate] - python

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
I am trying to create two new columns in my dataframe depending on the values of the columns Subscribers, External Party and Direction. If the Direction is I for Incoming, column a should become External Party and col B should become Subscriber. If the Direction is O for Outgoing, it should be the other way around. I use the code:
import pandas as pd
import numpy as np
...
df['a'] = np.where((df.Direction == 'I'), df['External Party'], df['Subscriber'])
df['b'] = np.where((df.Direction == 'O'), df['External Party'], df['Subscriber'])
I get a SettingWithCopyWarning from Pandas, but the code does what it needs to do. How can I improve this operation to avoid the error?
Thanks in advance!
Jo

Inspect the place in your code where df is created.
Most probably, it is a view of another DataFrame, something like:
df = df_src[...]
Then any atempt to save something in df causes just this warning.
To avoid it, create df as a truly independent DataFrame, with its
own data buffer. Something like:
df = df_src[...].copy()
Now df has its own data buffer, and can be modified without the
above warning.

If you are planning to work with the same df later on in your code then it is sometimes useful to create a deep copy of the df before making any iterations.
Pandas native copy method is not always acting as one would expect - here is a similar question that might give more insights.
You can use copy module that comes with python to copy the entire object and to ensure that there are no links between 2 dataframes.
import copy
df_copy = copy.deepcopy(df)

Related

Provide sample DataFrame from csv file when asking question on stackoverflow [duplicate]

This question already has answers here:
How to make good reproducible pandas examples
(5 answers)
Closed 17 days ago.
when asking a python/pandas question on stackoverflow i often like to provide a sample dataframe.
I usually have a local csv file i deal with for testing.
So for a DataFrame i like to provide a code in my question like
df = pd.DataFrame()
Is there an easy way or tool to get a csv file into code in a format like this, so another user can easily recreate the dataframe?
For now i usually do it manually, which is annoying and time consuming. I have to copy/paste the data from excel to stackoverflow, remove tabs/spaces, rearrange numbers to get a list or dictionary and so on.
Example csv file:
col1
col2
1
3
2
4
I if want to provide this table i can provide code like:
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
I will have to create the dictionary and Dataframe manually. I manually have to write the code into the stackoverflow editor.
For a more complex table this could lead to a lot of work.
Hope you get the "problem".
Thank you.
You can make a dict from the .csv and pass it to the pandas.DataFrame constructor :
N = 5 # <- adjust here to choose the number of rows
dico = pd.read_csv("f.csv").sample(N).to_dict("list")
S = f"df = pd.DataFrame{dico}") # <- copy its output and paste in StackOverflow
You can also use pyperclip to copy directly the text you'll paste/include on your question :
#pip install pyperclip
import pyperclip
pyperclip.copy(S)

Debug: Dataframe Column Referencing and Indexing [duplicate]

This question already has answers here:
Deleting multiple columns based on column names in Pandas
(11 answers)
Closed 4 years ago.
I can't figure this bug out. I think it is my misunderstanding of a dataframe and indexing through one. Also, maybe a misunderstanding of a for loop. (I am used to matlab for loops... iterations are, intuitively, way easier :D)
Here is the error:
KeyError: "['United States' 'Canada' 'Mexico'] not found in axis"
This happens at the line: as_df=as_df.drop(as_df[column])
But this makes no sense... I am calling an individual column not the entire set of dummy variables.
The following code can be copied and ran. I made sure of it.
MY CODE:
import pandas as pd
import numpy as np
df=pd.DataFrame({"country": ['United States','Canada','Mexico'], "price": [23,32,21], "points": [3,4,4.5]})
df=df[['country','price','points']]
df2=df[['country']]
features=df2.columns
print(features)
target='points'
#------_-__-___---____________________
as_df=pd.concat([df[features],df[target]],axis=1)
#Now for Column Check
for column in as_df[features]:
col=as_df[[column]]
#Categorical Data Conversion
#This will split the countries into their own column with 1 being when it
#is true and 0 being when it is false
col.select_dtypes(include='object')
dummies=pd.get_dummies(col)
#ML Check:
dumcols=dummies.drop(dummies.columns[1],axis=1)
if dumcols.shape[1] > 1:
print(column)
as_df=as_df.drop(as_df[column])
else:
dummydf=col
as_df=pd.concat([as_df,dummydf],axis=1)
as_df.head()
I would comment instead of answering, but I do not have enough reputation to do so. (I need clarification to help you and Stack Exchange does not provide me with a way to do so "properly".)
I'm not entirely sure what your end-goal is. Could you clarify what your end result for as_df would look like? Including after the for loop ends, and after the entire code is finished running?
Found my mistake.
as_df=as_df.drop(as_df[column])
should be
as_df=as_df.drop(column,axis=1)

Pandas map to a new column, SettingWithCopyWarning [duplicate]

This question already has an answer here:
df.loc causes a SettingWithCopyWarning warning message
(1 answer)
Closed 6 years ago.
In pandas data frame, I'm trying to map df['old_column'], apply user defined function f for each row and create a new column.
df['new_column'] = df['old_column'].map(lambda x: f(x))
This will give out "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame." error.
I tried the following:
df.loc[:, 'new_column'] = df['old_column'].map(lambda x: f(x))
which doesn't help. What can I do?
A SettingWithCopy warning is raised for certain operations in pandas which may not have the expected result because they may be acting on copies rather than the original datasets. Unfortunately there is no easy way for pandas itself to tell whether or not a particular call will or won't do this, so this warning tends to be raised in many, many cases where (from my perspective as a user) nothing is actually amiss.
Both of your method calls are fine. If you want to get rid of the warning entirely, you can specify:
pd.options.mode.chained_assignment = None
See this StackOverflow Q&A for more information on this.

Getting SettingWithCopyWarning warning even after using .loc in pandas [duplicate]

This question already has answers here:
Pandas still getting SettingWithCopyWarning even after using .loc
(3 answers)
Closed 6 years ago.
df_masked.loc[:, col] = df_masked.groupby([df_masked.index.month, df_masked.index.day])[col].\
transform(lambda y: y.fillna(y.median()))
Even after using a .loc, I get the foll. error, how do I fix it?
Anaconda\lib\site-packages\pandas\core\indexing.py:476: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.obj[item] = s
You could get this UserWarning if df_masked is a sub-DataFrame of some other DataFrame.
In particular, if data had been copied from the original DataFrame to df_masked then, Pandas emits the UserWarning to alert you that modifying df_masked will not affect the original DataFrame.
If you do not intend to modify the original DataFrame, then you are free to ignore the UserWarning.
There are ways to shut off the UserWarning on a per-statement basis. In particular, you could use df_masked.is_copy = False.
If you run into this UserWarning a lot, then instead of silencing the UserWarnings one-by-one, I think it is better to leave them be as you are developing your code. Be aware of what the UserWarning means, and if the modifying-the-child-does-not-affect-the-parent issue does not affect you, then ignore it. When your code is ready for production, or if you are experienced enough to not need the warnings, shut them off entirely with
pd.options.mode.chained_assignment = None
near the top of your code.
Here is a simple example which demonstrate the problem and (a) solution:
import pandas as pd
df = pd.DataFrame({'swallow':['African','European'], 'cheese':['gouda', 'cheddar']})
df_masked = df.iloc[1:]
df_masked.is_copy = False # comment-out this line to see the UserWarning
df_masked.loc[:, 'swallow'] = 'forest'
The reason why the UserWarning exists is to help alert new users to the fact that
chained-indexing such as
df.iloc[1:].loc[:, 'swallow'] = 'forest'
will not affect df when the result of the first indexer (e.g. df.iloc[1:])
returns a copy.

Cannot change nan values in a dataframe [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 3 years ago.
I have been reading this link on "Returning a view versus a copy". I do not really get how the chained assignment concept in Pandas works and how the usage of .ix(), .iloc(), or .loc() affects it.
I get the SettingWithCopyWarning warnings for the following lines of code, where data is a Panda dataframe and amount is a column (Series) name in that dataframe:
data['amount'] = data['amount'].astype(float)
data["amount"].fillna(data.groupby("num")["amount"].transform("mean"), inplace=True)
data["amount"].fillna(mean_avg, inplace=True)
Looking at this code, is it obvious that I am doing something suboptimal? If so, can you let me know the replacement code lines?
I am aware of the below warning and like to think that the warnings in my case are false positives:
The chained assignment warnings / exceptions are aiming to inform the
user of a possibly invalid assignment. There may be false positives;
situations where a chained assignment is inadvertantly reported.
EDIT : the code leading to the first copy warning error.
data['amount'] = data.apply(lambda row: function1(row,date,qty), axis=1)
data['amount'] = data['amount'].astype(float)
def function1(row,date,qty):
try:
if(row['currency'] == 'A'):
result = row[qty]
else:
rate = lookup[lookup['Date']==row[date]][row['currency'] ]
result = float(rate) * float(row[qty])
return result
except ValueError: # generic exception clause
print "The current row causes an exception:"
The point of the SettingWithCopy is to warn the user that you may be doing something that will not update the original data frame as one might expect.
Here, data is a dataframe, possibly of a single dtype (or not). You are then taking a reference to this data['amount'] which is a Series, and updating it. This probably works in your case because you are returning the same dtype of data as existed.
However it could create a copy which updates a copy of data['amount'] which you would not see; Then you would be wondering why it is not updating.
Pandas returns a copy of an object in almost all method calls. The inplace operations are a convience operation which work, but in general are not clear that data is being modified and could potentially work on copies.
Much more clear to do this:
data['amount'] = data["amount"].fillna(data.groupby("num")["amount"].transform("mean"))
data["amount"] = data['amount'].fillna(mean_avg)
One further plus to working on copies. You can chain operations, this is not possible with inplace ones.
e.g.
data['amount'] = data['amount'].fillna(mean_avg)*2
And just an FYI. inplace operations are neither faster nor more memory efficient. my2c they should be banned. But too late on that API.
You can of course turn this off:
pd.set_option('chained_assignment',None)
Pandas runs with the entire test suite with this set to raise (so we know if chaining is happening) on, FYI.

Categories