compare two dataframes using three columns - python

I got two dataframess; df_users like below
and df1 like below
i need to create a third dataframe called df2 in which i will get the corresponding usernames from USER_NAME column in df_users by filtering using three columns which are 'InterfaceDesc TESVLAN CVLAN'
i tried to use merge, concat and datacompy Compare functions but all failed with diffirent errors arose. please support

If you want to merge the 2 DataFrame only when the columns "InterfaceDesc", "TESVLAN", "CVLAN" are the same, you need to merge the 2 DataFrame on multiple columns and it should work:
df2 = pd.merge(df_users, df1, on=["InterfaceDesc", "TESVLAN", "CVLAN"])
If you want df2 to have only these 4 columns:
df2 = df2[["USER_NAME", "InterfaceDesc", "TESVLAN", "CVLAN"]]

Related

Want to merge two dataframe column wise

I have two dataframes df and df1 and I want to merge them in such way that I get result as showcased below.
I have tried using pd.concat but it didn't work out. While using pd.merge I get error:
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
df
df2
Expected Output:
It's most likely because the columns that you want to merge are not the same dtype. In pd.merge specify which columns who want to join on, and check using df.dtypes and df2.dtypes to see if the columns you want to join are the same dtypes.

Join column in dataframe to another dataframe - Pandas

I have 2 dataframes. One has a bunch of columns including f_uuid. The other dataframe has 2 columns, f_uuid and i_uuid.
the first dataframe may contain some f_uuids that the second dataframe doesn't and vice versa.
I want the first dataframe to have a new column i_uuid (from the second dataframe) populated with the appropriate values for the matching f_uuid in that first dataframe.
How would I achieve this?
df1 = pd.merge(df1,
df2,
on='f_uuid')
If you want to keep all f_uuid from df1 (e.g. those not available in df2), you may run
df1 = pd.merge(df1,
df2,
on='f_uuid',
how='left')
I think what your looking for is a merge : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge
In your case, that would look like :
bunch_of_col_df.merge(other_df, on="f_uuid")

Finding non-matching rows between two dataframes

I have a scenario where I want to find non-matching rows between two dataframes. Both dataframes will have around 30 columns and an id column that uniquely identify each record/row. So, I want to check if a row in df1 is different from the one in df2. The df1 is an updated dataframe and df2 is the previous version.
I have tried an approach pd.concat([df1, df2]).drop_duplicates(keep=False) , but it just combines both dataframes. Is there a way to do it. I would really appreciate the help.
The sample data looks like this for both dfs.
id user_id type status
There will be total 39 columns which may have NULL values in them.
Thanks.
P.S. df2 will always be a subset of df1.
If your df1 and df2 has the same shape, you may easily compare with this code.
df3 = pd.DataFrame(np.where(df1==df2,True,False), columns=df1.columns)
And you will see boolean output "False" for not matching cell value.

Is it possible to find common values in two dataframes using Python?

I have a dataframe df1 that is the list of e-mails of people that downloaded a certain e-book, and another dataframe df2 that is the e-mails of people that downloaded a different e-book.
I want to find the people that downloaded both e-books, or the common values between df1 and df2, using Python.
Is it possible to do that? How?
This was already discussed. Can you click on the below link
Find the common values in columns in Pandas dataframe
Assuming the two data frames as df1 and df2 with email column, you can do the following:
intersected_df = pd.merge(df1, df2, how='inner')
This data frame will have the values corresponding to emails found in df1 and df2
Dump the emails from df1 into a set, in order to avoid duplicates.
Dump the emails from df2 into a set, for the same reason.
Find the intersection of these two sets, as such:
set1 = set(df1.Emails)`
set2 = set(df2.Emails)
common = set1.intersection(set2)```
I believe you should merge the two dataframes
merged = pd.merge(df1, df1, how='inner', on=['e-mails'])
and then drop the Nan values:
merged.dropna(inplace=True)

Find where three separate DataFrames overlap and create a new DataFrame

I have three separate DataFrames. Each DataFrame has the same columns - ['Email', 'Rating']. There are duplicate row values in all three DataFrames for the column Email. I'm trying to find those emails that appear in all three DataFrames and then create a new DataFrame based off those rows. So far I have I had all three DataFrames saved to a list like this dfs = [df1, df2, df3], and then concatenated them together using df = pd.concat(dfs). I tried using groupby from here but to no avail. Any help would be greatly appreciated
You want to do a merge. Similar to a join in sql you can do an inner merge and treat the email like a foreign key. Here is the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It would look something like this:
in_common = pd.merge(df1, df2, on=['Email'], how='inner')
you could try using .isin from pandas, e.g:
df[df['Email'].isin(df2['Email'])]
This would retrieve row entries where the values for the column email are the same in the two dataframes.
Another idea is maybe try an inner merge.
Goodluck, post code next time.

Categories