Pandas - return a dataframe after groupby - python

I have a Pandas df:
Name No
A 1
A 2
B 2
B 2
B 3
I want to group by column Name, sum column No and then return a 2-column dataframe like this:
Name No
A 3
B 7
I tried:
df.groupby(['Name'])['No'].sum()
but it does not return my desire dataframe. I can't add the result to a dataframe as a column.
Really appreciate any help

Add parameter as_index=False to groupby:
print (df.groupby(['Name'], as_index=False)['No'].sum())
Name No
0 A 3
1 B 7
Or call reset_index:
print (df.groupby(['Name'])['No'].sum().reset_index())
Name No
0 A 3
1 B 7

Related

pandas drop_duplicates condition on two other columns values

I have a datframe with columns A,B and C.
Column A is where there are duplicates. Column B is where there is email value or NaN. Column C is where there is 'wait' value or a number.
My dataframe has duplicate values in A. I would like to keep those who have a non-NaN value in B and the non 'wait' value in C (ie numbers).
How could I do that on a df dataframe?
I have tried df.drop_duplicates('A') but i dont see any conditions on other columns
Edit :
sample data :
df=pd.DataFrame({'A':[1,1,2,2,3,3],'B':['a#b.com',np.nan,np.nan,'c#d.com','np.nan',np.nan],'C':[123,456,567,'wait','wait','wait']})
>>> df
A B C
0 1 a#b.com 123
1 1 NaN 456
2 2 NaN 567
3 2 c#d.com wait
4 3 np.nan wait
5 3 NaN wait
I would like a resulting dataframe as
>>> df
A B C
0 1 a#b.com 123
1 2 c#d.com 567
2 3 np.nan wait
Thank you
Best,
Solution sorting per A, C columns with test if match wait first and then get first non missing value if exist per groups by column A:
df = df.sort_values(['A', 'C'], key = lambda x: x.eq('wait')).groupby('A').first()
print (df)
B C
A
1 a#b.com 123
2 c#d.com 567
3 np.nan wait

Python Pandas Compare 2 dataFrame [duplicate]

This question already has an answer here:
Find unique column values out of two different Dataframes
(1 answer)
Closed 1 year ago.
i'm working on python with Pandas and i have 2 dataFrame
1 'A'
2 'B'
1 'A'
2 'B'
3 'C'
4 'D'
and i want to return the difference:
1 'C'
2 'D'
You can concatenate two dataframes and drop duplicates:
pd.concat([df1, df2]).drop_duplicates(keep=False)
If your dataframe contains more columns you can add a certain column name as a subset:
pd.concat([df1, df2]).drop_duplicates(subset='col_name', keep=False)
What i retrieve with pd.concat([df1, df2]).drop_duplicates(keep=False)
(N = name of column)
df1:
N
0 A
1 B
2 C
df2:
N
0 A
1 B
2 C
df3
N
0 A
1 B
2 C
0 A
1 B
2 C
Value in df is phone Number without '+' in it. i can't show them.
i import them with :
df1 = pd.DataFrame(ListResponse, columns=['33000000000'])
df2 = pd.read_csv('number.csv')
ListResponse return List with number and number.csv is ListResponse that i save in csv the last time i run the script
edit:
(what i want in this case is "Empty DataFrame")
just test with new value :
df3:
N
0 A
1 B
2 C
3 D
0 B
1 C
2 D
Edit2: i think drop_duplicate is not working because my func implement new value as index = 0 and not index = length+1 like you can see just above. but when same values in both df, it not return me empty df...

How to add value of dataframe to another dataframe?

I want to add a row of dataframe to every row of another dataframe.
df1=pd.DataFrame({"a": [1,2],
"b": [3,4]})
df2=pd.DataFrame({"a":[4], "b":[5]})
I want to add df2 value to every row of df1.
I use df1+df2 and get following result
a b
0 5.0 8.0
1 NaN NaN
But I want to get the following result
a b
0 5 7
1 7 9
Any help would be dearly appreciated!
If really need add values per columns it means number of columns in df2 is same like number of rows in df1 use:
df = df1.add(df2.loc[0].to_numpy(), axis=0)
print (df)
a b
0 5 7
1 7 9
If need add by rows it means first value of df1 is add to first column of df2, so output is different:
df = df1.add(df2.loc[0], axis=1)
print (df)
a b
0 5 8
1 6 9

How to reverse the content of a specific dataframe column in pandas?

I have a pandas dataframe df1 = {'A':['a','b','c','d','e'],'no.':[0,1,2,3,4]}, df1 = pd.DataFrame(df1,columns=['A','no.']) where I would like to reverse in place the content of the second column with the result being like that: df2 = {'A':['a','b','c','d','e'],'no.':[4,3,2,1,0]} df2 = pd.DataFrame(df2,columns=['A','no.'])
Convert values to numpy and then indexing for change order:
df1['no.'] = df1['no.'].to_numpy()[::-1]
print (df1)
A no.
0 a 4
1 b 3
2 c 2
3 d 1
4 e 0

code multiple columns based on lists and dictionaries in Python

I have the following dataframe in Pandas
OfferPreference_A OfferPreference_B OfferPreference_C
A B A
B C C
C S G
I have the following dictionary of unique values under all the columns
dict1={A:1, B:2, C:3, S:4, G:5, D:6}
I also have a list of the columnames
columnlist=['OfferPreference_A', 'OfferPreference_B', 'OfferPreference_C']
I Am trying to get the following table as the output
OfferPreference_A OfferPreference_B OfferPreference_C
1 2 1
2 3 3
3 4 5
How do I do this.
Use:
#if value not match get NaN
df = df[columnlist].applymap(dict1.get)
Or:
#if value not match get original value
df = df[columnlist].replace(dict1)
Or:
#if value not match get NaN
df = df[columnlist].stack().map(dict1).unstack()
print (df)
OfferPreference_A OfferPreference_B OfferPreference_C
0 1 2 1
1 2 3 3
2 3 4 5
You can use map for this like shown below, assuming the values will match always
for col in columnlist:
df[col] = df[col].map(dict1)

Categories