Python: Pandas merge 2 dataframes using a common column [duplicate] - python

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
How can I merge 2 dataframes df1 and df2 using a common column 'ADD' into df3?
Both df1 and df2 have a common column 'ADD'.
I want to use df2 as a mapping table for covert ADD into a ST value.
I have tried to convert df2 into Series or Dictionary, but nether seems work.
df1 =
Name ADD
1 A 12
2 B 54
3 C 34
4 D 756
5 E 43
df2 =
ADD ST
1 12 CA
2 54 CA
3 34 TX
df3 =
Name ADD ST
1 A 12 CA
2 B 54 CA
3 C 34 TX
4 D 756 nan
5 E 43 nan

You have to do an outer merge (join):
In [11]: df1.merge(df2, how='outer')
Out[11]:
Name ADD ST
0 A 12 CA
1 B 54 CA
2 C 34 TX
3 D 756 NaN
4 E 43 NaN

Related

Merge or concat two df by index [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 7 months ago.
I have the following issue: I want to concat or merge two dataframes with different length and partly different indexes:
data1:
index
data1
1
16
2
37
3
18
7
49
data2:
index
data2
2
74
3
86
4
12
6
97
12
35
They should be merged in the way, that the output looks like:
index
data1
data2
1
16
NaN
2
37
74
3
18
86
4
NaN
12
6
NaN
97
7
49
NaN
12
NaN
35
I hope you can help me out.
Thanks in advance
You can use join:
out = df1.join(df2, how='outer')
print(out)
# Output
data1 data2
index
1 16.0 NaN
2 37.0 74.0
3 18.0 86.0
4 NaN 12.0
6 NaN 97.0
7 49.0 NaN
12 NaN 35.0
Or you can use merge:
out = df1.merge(df2, left_index=True, right_index=True, how='outer')
Or you can use concat:
out = pd.concat([df1, df2], axis=1).sort_index()

Merge dataframes but without intersecting values from the right dataframe

I have two dataframes A and B
A = pd.DataFrame({'a'=[1,2,3,4,5], 'b'=[11,22,33,44,55]})
B = pd.DataFrame({'a'=[7,2,3,4,9], 'b'=[123,234,456,789,1122]})
I want to merge B and A such that I don't want the common values in column 'a' in A and B from B, only non-intersecting values from B in column 'a' should be taken. The final dataframe should look like
a
b
1
11
2
22
3
33
4
44
5
55
7
123
9
1122
If a is unique-valued in both A and B (some sort of unique ID for example), you can try with concat and drop_duplicates:
pd.concat([A,B]).drop_duplicates('a')
Output:
a b
0 1 11
1 2 22
2 3 33
3 4 44
4 5 55
0 7 123
4 9 1122
In the general case, use isin to check for existence of B['a'] in A['a']:
pd.concat([A,B[~B['a'].isin(A['a'])])

Extract corresponding df value with reference from another df

There are 2 dataframes with 1 to 1 correspondence. I can retrieve an idxmax from all columns in df1.
Input:
df1 = pd.DataFrame({'ref':[2,4,6,8,10,12,14],'value1':[76,23,43,34,0,78,34],'value2':[1,45,8,0,76,45,56]})
df2 = pd.DataFrame({'ref':[2,4,6,8,10,12,14],'value1_pair':[0,0,0,0,180,180,90],'value2_pair':[0,0,0,0,90,180,90]})
df=df1.loc[df1.iloc[:,1:].idxmax(), 'ref']
Output: df1, df2 and df
ref value1 value2
0 2 76 1
1 4 23 45
2 6 43 8
3 8 34 0
4 10 0 76
5 12 78 45
6 14 34 56
ref value1_pair value2_pair
0 2 0 0
1 4 0 0
2 6 0 0
3 8 0 0
4 10 180 90
5 12 180 180
6 14 90 90
5 12
4 10
Name: ref, dtype: int64
Now I want to create a df which contains 3 columns
Desired Output df:
ref max value corresponding value
12 78 180
10 76 90
What are the best options to extract the corresponding values from df2?
Your main problem is matching the columns between df1 and df2. Let's rename them properly, melt both dataframes, merge and extract:
(df1.melt('ref')
.merge(df2.rename(columns={'value1_pair':'value1',
'value2_pair':'value2'})
.melt('ref'),
on=['ref', 'variable'])
.sort_values('value_x')
.groupby('variable').last()
)
Output:
ref value_x value_y
variable
value1 12 78 180
value2 10 76 90

Pandas - For Each Index, Put All Columns Into Rows [duplicate]

This question already has answers here:
Convert columns into rows with Pandas
(6 answers)
Closed 3 years ago.
I'm trying to avoid looping, but the title sort of explains the issue.
import pandas as pd
df = pd.DataFrame(columns=['Index',1,2,3,4,5])
df = df.append({'Index':333,1:'A',2:'C',3:'F',4:'B',5:'D'}, ignore_index=True)
df = df.append({'Index':234,1:'B',2:'D',3:'C',4:'A',5:'Z'}, ignore_index=True)
df.set_index('Index', inplace=True)
print(df)
1 2 3 4 5
Index
333 A C F B D
234 B D C A Z
I want to preserve the index, and for each column turn it into a row with the corresponding value like this:
newcol value
Index
333 1 A
333 2 C
333 3 F
333 4 B
333 5 C
234 1 B
234 2 D
234 3 C
234 4 A
234 5 Z
It's somewhat of a transpose issue, but not exactly like that. Any ideas?
You need:
df.stack().reset_index(1, name='value').rename(columns={'level_1':'newcol'})
# OR df.reset_index().melt('Index',var_name='new_col',value_name='Value').set_index('Index')
#(cc: #anky_91)
Output:
newcol value
Index
333 1 A
333 2 C
333 3 F
333 4 B
333 5 D
234 1 B
234 2 D
234 3 C
234 4 A
234 5 Z
Another solution using to_frame and rename_axis:
df.stack().to_frame('value').rename_axis(index=['','newcol']).reset_index(1)
newcol value
333 1 A
333 2 C
333 3 F
333 4 B
333 5 D
234 1 B
234 2 D
234 3 C
234 4 A
234 5 Z

how to join two Pandas dataframes on one column and one index [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
Suppose I have two DataFrame df1 and df2, the join key in df1 is a column but the key in df2 is the index.
df1
Out[88]:
A B C
0 1 A 10
1 2 B 20
2 3 C 30
3 4 D 40
4 5 E 50
df2
Out[89]:
D E
A 22 2
B 33 3
C 44 4
D 55 5
E 66 6
I want to do something like,
pd.merge(df1,df2, how= 'outer',left_on="B" , right_on= df2.index )
I know this is sure to fail.I can workaround by reset the index on df2, but in the application I will have to index it back.
df2=df2.reset_index()
I am wondering whether it is possible to just join one column and one index together easily ?
You can specify right_index=True to merge on the index for the rhs:
In [193]:
pd.merge(df1,df2, how= 'outer',left_on="B" , right_index= True )
Out[193]:
A B C D E
0 1 A 10 22 2
1 2 B 20 33 3
2 3 C 30 44 4
3 4 D 40 55 5
4 5 E 50 66 6
I think you can also use join:
df1.join(df2, on='B', how='outer')

Categories