This question already has an answer here:
What are the 'levels', 'keys', and names arguments for in Pandas' concat function?
(1 answer)
Closed 4 years ago.
I have two dataframes with the same index but different columns. How do I combine them into one with the same index but containing all the columns?
I have:
A
1 10
2 11
B
1 20
2 21
and I need the following output:
A B
1 10 20
2 11 21
pandas.concat([df1, df2], axis=1)
You've got a few options depending on how complex the dataframe is:
Option 1:
df1.join(df2, how='outer')
Option 2:
pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
Related
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 1 year ago.
i have two data frames:
df1 :
ID COUNT
0 202485 6
1 215893 8
2 181840 8
3 168337 7
and another dataframe
df2:
ID
0 202485
1 215893
2 181840
i want to filter /left join the two dataframes:
desired result is
ID COUNT
0 202485 6
1 215893 8
2 181840 8
i tried df1.merge(df2, how='inner', on='ID') : error like ou are trying to merge on object and int64 columns
also used isin, but didn't work.
list=df1['ID'].drop_duplicates().to_list()
df1[df1['ID'].isin(list)]
Any help?
df1 = pd.DataFrame({'ID':[202485,215893,181840,168337],'COUNT':[6,8,8,7]})
df2 = pd.DataFrame({"ID":[202485,215893,181840]})
out_df = pd.merge(df1,df2)
print(out_df)
This gives the desired result
ID COUNT
0 202485 6
1 215893 8
2 181840 8
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
we have the following two dataframes respectively.
Dataframe 1:
id =[30,30]
month =[1,3]
less_data =['pravin','shashi']
df = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','less_data'])
Dataframe 2:
id =[30,30]
month =[1,2]
less_data =['amol','pinak']
df2 = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','zero_data'])
and expected output:
id month less_data zero_data
30 1 pravin amol
30 2 pinak
30 3 shashi
How can I use pd.concat to achieve this or suggest better solution for the same
use pd.concat:
dfn = pd.concat([
df.set_index(['id','month']),
df2.set_index(['id','month'])
], axis = 1).reset_index()
You can do an outer join:
df.join(df2.set_index(['id', 'month']), how='outer', on=['id', 'month'])
You can use pd.merge on month and id
import pandas as pd
import numpy as np
id =[30,30]
month =[1,3]
less_data =['pravin','shashi']
df = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','less_data'])
id =[30,30]
month =[1,2]
less_data =['amol','pinak']
df2 = pd.DataFrame(list(zip(id,month,less_data)),columns =['id','month','zero_data'])
##### Merge can be thought of joins in SQL
>>> df_merge = pd.merge(df,df2,on=['id','month'],how='outer')
>>> df_merge
id month less_data zero_data
0 30 1 pravin amol
1 30 3 shashi NaN
2 30 2 NaN pinak
This question already has answers here:
Concatenate rows of two dataframes in pandas
(3 answers)
Closed 5 years ago.
I have two Pandas DataFrames, each with different columns. I want to basically glue them together horizontally (they each have the same number of rows so this shouldn't be an issue).
There must be a simple way of doing this but I've gone through the docs and concat isn't what I'm looking for (I don't think).
Any ideas?
Thanks!
concat is indeed what you're looking for, you just have to pass it a different value for the "axis" argument than the default. Code sample below:
import pandas as pd
df1 = pd.DataFrame({
'A': [1,2,3,4,5],
'B': [1,2,3,4,5]
})
df2 = pd.DataFrame({
'C': [1,2,3,4,5],
'D': [1,2,3,4,5]
})
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)
With the result being:
A B C D
0 1 1 1 1
1 2 2 2 2
2 3 3 3 3
3 4 4 4 4
4 5 5 5 5
This question already has answers here:
Merge two dataframes by index
(7 answers)
pandas: merge (join) two data frames on multiple columns
(6 answers)
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I have two dataframes, df and df1. The first one contains all the information about all the possible combination of a dataset while the second one is just a subset without the information.
df
x y distance
0 1 4
0 2 3
0 3 2
1 2 2
1 3 5
2 3 1
df1
x y
1 3
2 3
2 3
I would like to merge df and df1 in order to have the following:
df1
x y distance
1 3 5
2 3 1
2 3 1
You can use the merge command
df.merge(df1, left_on=['x','y'], right_on=['x','y'], how='right')
Here you're merging the df on the left with df1 on the right using the columns x andy as merging criteria and keeping only the rows that are present in the right dataframe.
You can read more about merging and joining dataframes here.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
How can I merge two pandas DataFrames on two columns with different names and keep one of the columns?
df1 = pd.DataFrame({'UserName': [1,2,3], 'Col1':['a','b','c']})
df2 = pd.DataFrame({'UserID': [1,2,3], 'Col2':['d','e','f']})
pd.merge(df1, df2, left_on='UserName', right_on='UserID')
This provides a DataFrame like this
But clearly I am merging on UserName and UserID so they are the same. I want it to look like this. Is there any clean ways to do this?
Only the ways I can think of are either re-naming the columns to be the same before merge, or droping one of them after merge. I would be nice if pandas automatically drops one of them or I could do something like
pd.merge(df1, df2, left_on='UserName', right_on='UserID', keep_column='left')
How about set the UserID as index and then join on index for the second data frame?
pd.merge(df1, df2.set_index('UserID'), left_on='UserName', right_index=True)
# Col1 UserName Col2
# 0 a 1 d
# 1 b 2 e
# 2 c 3 f
There is nothing really nice in it: it's meant to be keeping the columns as the larger cases like left right or outer joins would bring additional information with two columns. Don't try to overengineer your merge line, be explicit as you suggest
Solution 1:
df2.columns = ['Col2', 'UserName']
pd.merge(df1, df2,on='UserName')
Out[67]:
Col1 UserName Col2
0 a 1 d
1 b 2 e
2 c 3 f
Solution 2:
pd.merge(df1, df2, left_on='UserName', right_on='UserID').drop('UserID', axis=1)
Out[71]:
Col1 UserName Col2
0 a 1 d
1 b 2 e
2 c 3 f