Pandas merge two dataframes summing values [duplicate] - python

This question already has answers here:
how to merge two dataframes and sum the values of columns
(2 answers)
Closed 4 years ago.
Suppose I have two dataframes with partly repeated entries:
source1=pandas.DataFrame({'key':['a','b'],'value':[1,2]})
# key value
#0 a 1
#1 b 2
source2=pandas.DataFrame({'key':['b','c'],'value':[3,0]})
# key value
#0 b 3
#1 c 0
What do I need to do with source1 and source2 in order to get resulting frame with following entries:
# key value
#0 a 1
#1 b 5
#2 c 0

Just add
source1.set_index('key').add(source2.set_index('key'), fill_value=0)
If key is already the index, just use
source1.add(source2, fill_value=0)
You man want to .reset_index() at the end if you don't want key as index

With grouping:
>>> pd.concat([source1, source2]).groupby('key', as_index=False).sum()
key value
0 a 1
1 b 5
2 c 0

Related

drop rows based on a condition based on another [duplicate]

This question already has answers here:
Get the row(s) which have the max value in groups using groupby
(15 answers)
Closed 6 months ago.
I have the following data frame
user_id
value
1
5
1
7
1
11
1
15
1
35
2
8
2
9
2
14
I want to drop all rows that are not the maximum value of every user_id
resulting on a 2 row data frame:
user_id
value
1
35
2
14
How can I do that?
You can use pandas.DataFrame.max after the grouping.
Assuming that your original dataframe is named df, try the code below :
out = df.groupby('user_id', as_index=False).max('value')
>>> print(out)
Edit :
If you want to group more than one column, use this :
out = df.groupby(['user_id', 'sex'], as_index=False, sort=False)['value'].max()
>>> print(out)

Python Pandas Compare 2 dataFrame [duplicate]

This question already has an answer here:
Find unique column values out of two different Dataframes
(1 answer)
Closed 1 year ago.
i'm working on python with Pandas and i have 2 dataFrame
1 'A'
2 'B'
1 'A'
2 'B'
3 'C'
4 'D'
and i want to return the difference:
1 'C'
2 'D'
You can concatenate two dataframes and drop duplicates:
pd.concat([df1, df2]).drop_duplicates(keep=False)
If your dataframe contains more columns you can add a certain column name as a subset:
pd.concat([df1, df2]).drop_duplicates(subset='col_name', keep=False)
What i retrieve with pd.concat([df1, df2]).drop_duplicates(keep=False)
(N = name of column)
df1:
N
0 A
1 B
2 C
df2:
N
0 A
1 B
2 C
df3
N
0 A
1 B
2 C
0 A
1 B
2 C
Value in df is phone Number without '+' in it. i can't show them.
i import them with :
df1 = pd.DataFrame(ListResponse, columns=['33000000000'])
df2 = pd.read_csv('number.csv')
ListResponse return List with number and number.csv is ListResponse that i save in csv the last time i run the script
edit:
(what i want in this case is "Empty DataFrame")
just test with new value :
df3:
N
0 A
1 B
2 C
3 D
0 B
1 C
2 D
Edit2: i think drop_duplicate is not working because my func implement new value as index = 0 and not index = length+1 like you can see just above. but when same values in both df, it not return me empty df...

filter in a dataframe by values of another data frame in python (pandas) [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
i have two data frames:
df1 :
ID COUNT
0 202485 6
1 215893 8
2 181840 8
3 168337 7
and another dataframe
df2:
ID
0 202485
1 215893
2 181840
i want to filter /left join the two dataframes:
desired result is
ID COUNT
0 202485 6
1 215893 8
2 181840 8
i tried df1.merge(df2, how='inner', on='ID') : error like ou are trying to merge on object and int64 columns
also used isin, but didn't work.
list=df1['ID'].drop_duplicates().to_list()
df1[df1['ID'].isin(list)]
Any help?
df1 = pd.DataFrame({'ID':[202485,215893,181840,168337],'COUNT':[6,8,8,7]})
df2 = pd.DataFrame({"ID":[202485,215893,181840]})
out_df = pd.merge(df1,df2)
print(out_df)
This gives the desired result
ID COUNT
0 202485 6
1 215893 8
2 181840 8

Pandas Dataframe: How to create a column of incremental unique value count of another column [duplicate]

This question already has answers here:
Creating a new column assigning same index to repeated values in Pandas DataFrame [closed]
(2 answers)
Closed 2 years ago.
Consider the sample dataframe ('value' column is of no significance here):
df = pd.DataFrame({'key':list('AABBBC'), 'value': [1, 2, 3, 4, 5, 6]})
What I want is a column to count the unique value of only the 'key' column, the caveat being value count will be incrementally ascending and the count will only go up if the cell value hasn't appeared in previous rows. So here "A" will be assigned value 1, "B" 2 and "C" 3.
The desired result looks like this:
Right now I can only achieve this with a couple of steps:
df1 = df.drop_duplicates('key').reset_index(drop = True).drop(columns = ['value'])
df1['count_unique'] = df1.index+1
pd.merge(df, df1.set_index(['key']), left_on = ['key'], right_index= True, how = 'left')
It doesn't look very Pythonic and is not the most efficient. Any advice is appreciated.
Is it:
df['count_unique'] = df['key'].factorize()[0] + 1
Output:
key value count_unique
0 A 1 1
1 A 2 1
2 B 3 2
3 B 4 2
4 B 5 2
5 C 6 3

replacing values in a df based on the values in another df if a coincidence exists in a column [duplicate]

This question already has an answer here:
Replace value in a specific with corresponding value
(1 answer)
Closed 3 years ago.
I have the following df1:
Id value
'so' 5
'fe' 6
'd1' 4
Then I have a ref_df:
Id value
'so' 3
'fe' 3
'ju' 2
'd1' 1
I want to check that if any of the Ids in ref_df appear in df1, then replace the value in df1 by the ref_df.
The desired output would be:
Id value
'so' 3
'fe' 3
'd1' 1
How can I achieve this?
try this,
df1['Value'] = df1['Id'].map(ref_df.set_index('Id')['Value'])
O/P:
Id Value
0 so 3
1 fe 3
2 dl 1

Categories