Want to merge two dataframe column wise - python

I have two dataframes df and df1 and I want to merge them in such way that I get result as showcased below.
I have tried using pd.concat but it didn't work out. While using pd.merge I get error:
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
df
df2
Expected Output:

It's most likely because the columns that you want to merge are not the same dtype. In pd.merge specify which columns who want to join on, and check using df.dtypes and df2.dtypes to see if the columns you want to join are the same dtypes.

Related

Concatenating two dataframes with no common columns but same row dimension

I have two dataframes df1 (dimension: 2x3) and df2 (dimension: 2x239) taken for example - each having the same number of rows but a different number of columns.
I need to concatenate them to get a new dataframe df3 (dimension 2x242).
I used the concat function but ended up getting a (4x242) which has NaN values.
Need help.
Screenshot attached. jupyter notebook screenshot
You need to set the axis=1 parameter
pd.concat([df1.reset_index(), df2.reset_index()], axis=1)
You need concat on columns, from your pic, df2 index is duplicated. You need reset it to normal with .reset_index(drop=True)
out = pd.concat([df1.reset_index(drop=True), df2.reset_index(drop=True)], axis=1)
If you want to add these new columns in the same row. Try the below code :
pd.concat([df1,df2],axis=1)
The output shape will be (2*total_cols)

compare two dataframes using three columns

I got two dataframess; df_users like below
and df1 like below
i need to create a third dataframe called df2 in which i will get the corresponding usernames from USER_NAME column in df_users by filtering using three columns which are 'InterfaceDesc TESVLAN CVLAN'
i tried to use merge, concat and datacompy Compare functions but all failed with diffirent errors arose. please support
If you want to merge the 2 DataFrame only when the columns "InterfaceDesc", "TESVLAN", "CVLAN" are the same, you need to merge the 2 DataFrame on multiple columns and it should work:
df2 = pd.merge(df_users, df1, on=["InterfaceDesc", "TESVLAN", "CVLAN"])
If you want df2 to have only these 4 columns:
df2 = df2[["USER_NAME", "InterfaceDesc", "TESVLAN", "CVLAN"]]

Join column in dataframe to another dataframe - Pandas

I have 2 dataframes. One has a bunch of columns including f_uuid. The other dataframe has 2 columns, f_uuid and i_uuid.
the first dataframe may contain some f_uuids that the second dataframe doesn't and vice versa.
I want the first dataframe to have a new column i_uuid (from the second dataframe) populated with the appropriate values for the matching f_uuid in that first dataframe.
How would I achieve this?
df1 = pd.merge(df1,
df2,
on='f_uuid')
If you want to keep all f_uuid from df1 (e.g. those not available in df2), you may run
df1 = pd.merge(df1,
df2,
on='f_uuid',
how='left')
I think what your looking for is a merge : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge
In your case, that would look like :
bunch_of_col_df.merge(other_df, on="f_uuid")

How to inner join in pandas as SQL , Stuck in a problem below

I have two df named "df" and second as "topwud".
df
topwud
when I join these two dataframes bt inner join using BOMCPNO and PRTNO as the join column
like
second_level=pd.merge(df,top_wud ,left_on='BOMCPNO', right_on='PRTNO', how='inner').drop_duplicates()
Then I got this data frame
Result
I don't want common name coming as PRTNO_x and PRTNO_y , I want to keep only PRTNO_x in my result dataframe as name "PRTNO" which is default name.
Kindly help me :)
try This -
pd.merge(df1, top_wud, on=['BOMCPNO', 'PRTNO'])
What this will do though is return only the values where BOMCPNO and PRTNO exist in both dataframes as the default merge type is an inner merge.
So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_y suffix B columns.
I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outer merge:
pd.merge(df1, df2, on=['A', 'B'], how='outer')
Then what you could do is then drop duplicate rows (and possibly any NaN rows) and that should give you a clean merged dataframe.
merged_df.drop_duplicates(cols=['BOMCPNO', 'PRTNO'],inplace=True)
also try other types of join , as i dont know what exactly you want, i think its left inner .
check this if it solved your problem.

Merging data frames, highlighting the problematic column

I'm trying to merge two data frames with the aim of finding the value that causes the merging error. Most of the columns are not common across both data frames.
The following highlights what rows have a "NaN" value, how can I then find what column caused the merging issue? Thanks
df3 = pd.merge(df1, df2, how='outer')
df4 = (df3[df3.isnull().any(axis=1)])
It is difficult to tell from the question, but the question indicates pd.merge(df1, df2, on=None, how='outer')
If on is None and not merging on indexes then this defaults to the intersection of the columns in both DataFrames.
This means that the intersection of the columns in both DataFrames better have the same type. If not, an error will occur indicating a type issue.
ValueError: You are trying to merge on int64 and object columns. If you wish to proceed you should use pd.concat
Presupposing there is a conflict of type interfering with the outer join, the difference between the types of the intersecting columns should be examined.
dtypes_diff = pd.concat([df1.dtypes,df2.dtypes]).drop_duplicates(keep=False)

Categories