Python: how to merge two dataframe based only on different columns?

Python: how to merge two dataframe based only on different columns? - python

I have two dataframes df1 and df2
df1
A B
0 4 2
1 3 3
2 1 2
df2
B AB C
0 4 8 3
1 3 9 2
2 1 2 4
I would like to make a join only on different columns
df3
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4

Use Index.isin with inverse mask or Index.difference:
df22 = df2.loc[:, ~df2.columns.isin(df1.columns)]
df = df1.join(df22)
Or:
df22 = df2[df2.columns.difference(df1.columns)]
df = df1.join(df22)
print (df)
A B AB C
0 4 2 8 3
1 3 3 9 2
2 1 2 2 4

You can also use the merge functions as an alternate solution:
df3=pd.merge(df1,df2, left_on='A', right_on='B', how ='left', suffixes=('','_')).drop('B_',axis=1)

Related

Concatenate dataframe to a master dataframe only if the the values do not exist in the master dataframe

Let's say I have 2 dataframes:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
Df2=
Batch Result
0 C 8
1 C 3
2 D 6
3 D 1
I want to concat Df2 to Df1, but I only want to have batch D from Df2 in Df1. The output should be look like this:
Df1 =
Batch Result
0 A 3
1 A 5
2 B 5
3 B 6
4 C 8
5 C 3
2 D 6
3 D 1
How can I do this with Pandas?

You can remove the batch that are in Df1 from Df2 before concat:
pd.concat([Df1, Df2[ ~Df2['Batch'].isin(Df1['Batch'])] ])

create a new column with a value from rows

I have a data table as below
I want to put "a" to a column like below:

Delete row:
df = df.drop(0, axis=0)
Add column :
df.insert(0,"",["a","","",""])
First, you need to learn how to use pandas.

df1 = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,3,4],'c':[1,2,3,4]})
print(df1)
df2 = df1.drop([0])
df2 = df2.reset_index(drop=True)
print(df2)
df3 = pd.DataFrame({'': [1,' ', ' ']})
df4 = pd.concat([df3,df2],axis = 1)
print(df4)
a b c
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
a b c
0 2 2 2
1 3 3 3
2 4 4 4
a b c
0 1 2 2 2
1 3 3 3
2 4 4 4

Merge of 4 dataframes within groups

I have a 4 df's:
df1
a b
1 0 3
2 1 4
df2
a b
1 0 5
2 0 6
3 1 7
df3
a b
1 0 2
2 1 6
3 1 5
...
Within groups of 'a' I want to merge all 4 df's on a and keep all values by putting them in a further column. The merge of df1 and df2 should look like:
a b1 b2
1 0 3 5
2 0 3 6
3 1 4 7
Merge of df1, df2, df3:
a b1 b2 b3
1 0 3 5 2
2 0 3 6 2
3 1 4 7 6
4 1 4 7 5
I tried:
df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy', how='outer').drop('dummy', axis=1)
but this is ignoring the groups and 'a' disappears.

This is not Cartesian product, but a simple merge across multiple dataframes.
Try this:
In [846]: df1.merge(df2, on='a').merge(df3, on='a').rename(columns={'b_x':'b1', 'b_y':'b2', 'b':'b3'})
Out[846]:
a b1 b2 b3
0 0 3 5 2
1 0 3 6 2
2 1 4 7 6
3 1 4 7 5
OR, if your dataframes can increase you can do this:
In [851]: from functools import reduce
In [852]: reduce(lambda x,y: pd.merge(x,y, on='a'), [df1, df2, df3])
Out[852]:
a b_x b_y b
0 0 3 5 2
1 0 3 6 2
2 1 4 7 6
3 1 4 7 5

Modify the b column name before merging, then use reduce to expand to an arbitrary number of dataframes.
from functools import reduce
dfs = [df.rename(columns={'b':f'b{num+1}'}) for num, df in enumerate([df1, df2, df3])]
reduce(lambda x,y: pd.merge(x,y), dfs)
Note that by default, pd.merge on shared columns, hence a.

Use pd.DataFrame.join:
First set the index of each dataframe to 'a', you can use list comprehension to do this in place like this.
[i.set_index('a', inplace=True) for i in [df1, df2, df3]]
Next, use join:
df1.join([df2, df3])
Output:
a b_x b_y b
0 0 3 5 2
1 0 3 6 2
2 1 4 7 6
3 1 4 7 5

Combine two distinct dataframes to show all possible iterations

I'm looking to combine dataframes df1 and df2 to get df3 in Python, most preferably in a one-liner (that is, no "for all x in df1.LETS...").
I'm at a current loss for words to use with my Google-fu, so here I am at StackExchange, hoping another programmer can help fill in my mental blank with this predicament.
Thank you!
df1 df2 df3
LETS NUMS LETS NUMS
A 1 A 1
B 2 A 2
3 A 3
4 A 4
B 1
B 2
B 3
B 4

You can use:
df1 = pd.DataFrame({'LETS':list('AB')})
df2 = pd.DataFrame({'NUMS':range(1,5)})
cross join solution with merge + assign column with constant and drop helper column A:
df = pd.merge(df1.assign(A=1), df2.assign(A=1), on='A').drop('A', axis=1)
print (df)
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4
Another solution with MultiIndex.from_product and new function in pandas 0.20.1 - MultiIndex.to_frame
df = pd.MultiIndex.from_product([df1['LETS'], df2['NUMS']]).to_frame()
df.columns = ['LETS','NUMS']
print (df)
LETS NUMS
A 1 A 1
2 A 2
3 A 3
4 A 4
B 1 B 1
2 B 2
3 B 3
4 B 4
print (df.reset_index(drop=True))
LETS NUMS
0 A 1
1 A 2
2 A 3
3 A 4
4 B 1
5 B 2
6 B 3
7 B 4

pd.DataFrame(index=pd.MultiIndex.from_product([df1.LETS, df2.NUMS],
names=("LETS", "NUMS"))).reset_index()
# LETS NUMS
#0 A 1
#1 A 2
#2 A 3
#3 A 4
#4 B 1
#5 B 2
#6 B 3
#7 B 4

Rearrange dataframe structure

I get a dataframe
df
A B
0 1 4
1 2 5
2 3 6
For further processing, it would be more convenient to have the df restructered
as follows:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
How can I achieve that?

Use unstack with reset_index :
df = df.unstack().reset_index(level=1, drop=True).reset_index()
df.columns = ['letters','numbers']
print (df)
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6
Or numpy.concatenate + numpy.repeat + DataFrame:
a = np.concatenate(df.values)
b = np.repeat(df.columns,len(df.index))
df = pd.DataFrame({'letters':b, 'numbers':a})
print (df)
letters numbers
0 A 1
1 A 4
2 A 2
3 B 5
4 B 3
5 B 6

Probably simplest to melt:
In [36]: pd.melt(df, var_name="letters", value_name="numbers")
Out[36]:
letters numbers
0 A 1
1 A 2
2 A 3
3 B 4
4 B 5
5 B 6

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: how to merge two dataframe based only on different columns? - python

I have two dataframes df1 and df2 df1 A B 0 4 2 1 3 3 2 1 2 df2 B AB C 0 4 8 3 1 3 9 2 2 1 2 4 I would like to make a join only on different columns df3 A B AB C 0 4 2 8 3 1 3 3 9 2 2 1 2 2 4

Use Index.isin with inverse mask or Index.difference: df22 = df2.loc[:, ~df2.columns.isin(df1.columns)] df = df1.join(df22) Or: df22 = df2[df2.columns.difference(df1.columns)] df = df1.join(df22) print (df) A B AB C 0 4 2 8 3 1 3 3 9 2 2 1 2 2 4

You can also use the merge functions as an alternate solution: df3=pd.merge(df1,df2, left_on='A', right_on='B', how ='left', suffixes=('','_')).drop('B_',axis=1)

Related

Concatenate dataframe to a master dataframe only if the the values do not exist in the master dataframe

create a new column with a value from rows

Merge of 4 dataframes within groups

Combine two distinct dataframes to show all possible iterations

Rearrange dataframe structure

Categories

Resources