Pandas SettingWithCopyWarning When Using loc [duplicate] - python

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
Have a general question on assignments with indexing/slicing using .loc.
Assume the below DataFrame, df:
df:
A B C
0 a b
1 a b
2 b a
3 c c
4 c a
code to reproduce:
df = pd.DataFrame({'A':list('aabcc'), 'B':list('bbaca'), 'C':5*[None]})
I create df1 using:
df1=df.loc[df.A=='c']
df1:
A B C
3 c c
4 c a
I then assign a value to C based upon a value in B using:
df1.loc[df1.B=='a','C']='d'
The assignment works, but I receive a SettingWithCopy warning. Am I doing something wrong or is this the expected functionality? I thought that using .loc would avoid chained assignment. Is there something that I am missing? I am using Pandas 14.1

#EdChum answer in comments to OP has solved the issue.
i.e. replace
df1=df.loc[df.A=='c']
with
df1=df.loc[df.A=='c'].copy()
this will make it clear your intentions and not raise a warning

Related

Transpose a table using pandas [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 7 months ago.
I have a dataframe that looks like this:
A
type
val
first
B
20
second
B
30
first
C
200
second
C
300
I need to get it to look like this:
A
B
C
first
20
200
second
30
300
How do I do this using Pandas? I tried using transpose, but couldn't get it to this exact table.
df = df.pivot('A','type')
df.columns = [x[1] for x in list(df.columns)]
df.reset_index()

Merge two dataframes with different sizes [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes with different sizes and I want to merge them.
It's like an "update" to a dataframe column based on another dataframe with different size.
This is an example input:
dataframe 1
CODUSU Situação TIPO1
0 1AB P A0
1 2C3 C B1
2 3AB P C1
dataframe 2
CODUSU Situação ABC
0 1AB A 3
1 3AB A 4
My output should be like this:
dataframe 3
CODUSU Situação TIPO1
0 1AB A A0
1 2C3 C B1
2 3AB A C1
PS: I did it through loop but I think there should better and easier way to make it!
I read this content: pandas merging 101 and wrote this code:
df3=df1.merge(df2, on=['CODUSU'], how='left', indicator=False)
df3['Situação'] = np.where((df3['Situação_x'] == 'P') & (df3['Situação_y'] == 'A') , df3['Situação_y'] , df3['Situação_x'])
df3=df3.drop(columns=['Situação_x', 'Situação_y','ABC'])
df3 = df3[['CODUSU','Situação','TIPO1']]
And Voilà, df3 is exactly what I needed!
Thanks for everyone!
PS: I already found my answer, is there a better place to answer my own question?
df1.merge(df2,how='left', left_on='CODUSU', right_on='CODUSU')
This should do the trick.
Also, worth noting that if you want your resultant data frame to not contain the column ABC, you'd use df2.drop("ABC") instead of just df2.

How to aggregate rows in a dataframe [duplicate]

This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
I have a dataframe that contains values by country (and by region in certain countries) and which looks like this:
For each country that is repeated, I would add the values by regions so that there is only one row per country and obtain the following file:
How can I do this in Python? Since I'm really new to Python, I don't mind having a long set of instructions, as long as the procedure is clear, rather than a single line of code, compacted but hard to understand.
Thanks for your help.
You want to study the split-apply-combine paradigm of Pandas DataFrame manipulation. You can do a lot with it. What you want to do is common, and can be accomplished in one line.
>>> import pandas as pd
>>> df = pd.DataFrame({"foo": ["a","b","a","b","c"], "bar": [6,5,4,3,2]})
>>> df
foo bar
0 a 6
1 b 5
2 a 4
3 b 3
4 c 2
>>> df.groupby("foo").sum()
bar
foo
a 10
b 8
c 2

Better alternative to a groupby with a merge [duplicate]

This question already has answers here:
Pandas grouby and transform('count') gives placement error - works fine on smaller dataset
(1 answer)
Merging a pandas groupby result back into DataFrame
(3 answers)
Closed 4 years ago.
I was wondering if anyone knew of a better method to what I am currently doing. Here is an example data set:
ID Number
a 1
a 2
a 3
b 4
c 5
c 6
c 7
c 8
Example: if I wanted to get a count of Numbers by ID column in the table above. I would first do a groupby ID and do a count on Number, then merge the results back to the original table like so:
df2 = df.groupby('ID').agg({'Number':'count'}).reset_index()
df2 = df2.rename(columns = {'Number':'Number_Count'})
df = pd.merge(df, df2, on = ['ID'])
This results in:
It feels like a roundabout way of doing this, does anyone know a better alternative? The reason I ask is because when working with large data sets, this method can chew up a lot of memory (by creating another table and then merging them).
You can do that quite simply with this:
import pandas as pd
df = pd.DataFrame({'ID': list('aaabcccc'),
'Number': range(1,9)})
df['Number_Count'] = df.groupby('ID').transform('count')
df
# ID Number Number_Count
#0 a 1 3
#1 a 2 3
#2 a 3 3
#3 b 4 1
#4 c 5 4
#5 c 6 4
#6 c 7 4
#7 c 8 4

how to use map in index of pandas dataframe [duplicate]

This question already has answers here:
Map dataframe index using dictionary
(6 answers)
Closed 1 year ago.
I want to create a new column on a pandas dataframe using values on the index and a dictionary that translates these values into something more meaningful. My initial idea was to use map. I arrived to a solution but it is very convoluted and there must be a more elegant way to do it. Suggestions?
#dataframe and dict definition
df=pd.DataFrame({'foo':[1,2,3],'boo':[3,4,5]},index=['a','b','c'])
d={'a':'aa','b':'bb','c':'cc'}
df['new column']=df.reset_index().set_index('index',drop=False)['index'].map(d)
Creating a new series explicitly is a bit shorter:
df['new column'] = pd.Series(df.index, index=df.index).map(d)
After to_series, you can using map or replace
df.index=df.index.to_series().map(d)
df
Out[806]:
boo foo
aa 3 1
bb 4 2
cc 5 3
Or we think about another way
df['New']=pd.Series(d).get(df.index)
df
Out[818]:
boo foo New
a 3 1 aa
b 4 2 bb
c 5 3 cc

Categories