Is it possible to find common values in two dataframes using Python? - python

I have a dataframe df1 that is the list of e-mails of people that downloaded a certain e-book, and another dataframe df2 that is the e-mails of people that downloaded a different e-book.
I want to find the people that downloaded both e-books, or the common values between df1 and df2, using Python.
Is it possible to do that? How?

This was already discussed. Can you click on the below link
Find the common values in columns in Pandas dataframe

Assuming the two data frames as df1 and df2 with email column, you can do the following:
intersected_df = pd.merge(df1, df2, how='inner')
This data frame will have the values corresponding to emails found in df1 and df2

Dump the emails from df1 into a set, in order to avoid duplicates.
Dump the emails from df2 into a set, for the same reason.
Find the intersection of these two sets, as such:
set1 = set(df1.Emails)`
set2 = set(df2.Emails)
common = set1.intersection(set2)```

I believe you should merge the two dataframes
merged = pd.merge(df1, df1, how='inner', on=['e-mails'])
and then drop the Nan values:
merged.dropna(inplace=True)

Related

compare two dataframes using three columns

I got two dataframess; df_users like below
and df1 like below
i need to create a third dataframe called df2 in which i will get the corresponding usernames from USER_NAME column in df_users by filtering using three columns which are 'InterfaceDesc TESVLAN CVLAN'
i tried to use merge, concat and datacompy Compare functions but all failed with diffirent errors arose. please support
If you want to merge the 2 DataFrame only when the columns "InterfaceDesc", "TESVLAN", "CVLAN" are the same, you need to merge the 2 DataFrame on multiple columns and it should work:
df2 = pd.merge(df_users, df1, on=["InterfaceDesc", "TESVLAN", "CVLAN"])
If you want df2 to have only these 4 columns:
df2 = df2[["USER_NAME", "InterfaceDesc", "TESVLAN", "CVLAN"]]

Join column in dataframe to another dataframe - Pandas

I have 2 dataframes. One has a bunch of columns including f_uuid. The other dataframe has 2 columns, f_uuid and i_uuid.
the first dataframe may contain some f_uuids that the second dataframe doesn't and vice versa.
I want the first dataframe to have a new column i_uuid (from the second dataframe) populated with the appropriate values for the matching f_uuid in that first dataframe.
How would I achieve this?
df1 = pd.merge(df1,
df2,
on='f_uuid')
If you want to keep all f_uuid from df1 (e.g. those not available in df2), you may run
df1 = pd.merge(df1,
df2,
on='f_uuid',
how='left')
I think what your looking for is a merge : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html?highlight=merge#pandas.DataFrame.merge
In your case, that would look like :
bunch_of_col_df.merge(other_df, on="f_uuid")

Given 2 data frames search for matching value and return value in second data frame

Given 2 data frames like the link example, I need to add to df1 the "index income" from df2. I need to search by the df1 combined key in df2 and if there is a match return the value into a new column in df1. There is not an equal number of instances in df1 and df2 and there are about 700 rows in df1 1000 rows in df2.
I was able to do this in excel with a vlookup but I am trying to apply it to python code now.
This should solve your issue:
df1.merge(df2, how='left', on='combind_key')
This (left join) will give you all the records of df1 and matching records from df2.
https://www.geeksforgeeks.org/how-to-do-a-vlookup-in-python-using-pandas/
Here is an answer using joins. I modified my df2 to only include useful columns then used pandas left join.
Left_join = pd.merge(df,
zip_df,
on ='State County',
how ='left')

Find where three separate DataFrames overlap and create a new DataFrame

I have three separate DataFrames. Each DataFrame has the same columns - ['Email', 'Rating']. There are duplicate row values in all three DataFrames for the column Email. I'm trying to find those emails that appear in all three DataFrames and then create a new DataFrame based off those rows. So far I have I had all three DataFrames saved to a list like this dfs = [df1, df2, df3], and then concatenated them together using df = pd.concat(dfs). I tried using groupby from here but to no avail. Any help would be greatly appreciated
You want to do a merge. Similar to a join in sql you can do an inner merge and treat the email like a foreign key. Here is the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It would look something like this:
in_common = pd.merge(df1, df2, on=['Email'], how='inner')
you could try using .isin from pandas, e.g:
df[df['Email'].isin(df2['Email'])]
This would retrieve row entries where the values for the column email are the same in the two dataframes.
Another idea is maybe try an inner merge.
Goodluck, post code next time.

Pandas: merge_asof-like solutions for merging two multi-indexed DataFrames?

I have two dataframes, df1 and df2 say, which are both multi-indexed.
At the first index level, both dataframes share the same keys (i.e. df1.index.get_level_values(0) and df2.index.get_level_values(0) contain the same elements). Those keys are unordered strings, such as ['foo','bar','baz'].
At the second index level, both dataframes have timestamps which are ordered, but unequally spaced.
My question is as follows. I would like to merge df1and df2 in such a way that, for each key at level 1, the values of df2 should be inserted into df1 without changing the order of df1.
I tried using pd.merge, pd.merge_asof and pd.MultiIndex.searchsorted. From the descriptions of those methods, it seems like one of them should do the trick for me, but I cannot figure out how. Ideally, I would like to find a solution that avoids looping over the keys in index.get_level_values(0), since my dataframes can get large.
A few failed attempts for illustration:
df_merged = pd.merge(left=df1.reset_index(), right=df2.reset_index(),
left_on=[['some_keys', 'timestamps_df1']], right_on=[['some_keys', 'timestamps_df2']],
suffixes=('', '_2')
) # after sorting
# FAILED
df2.index.searchsorted(df1, side='right') # after sorting
# FAILED
Any help is greatly appreciated!
Base on your description , here is the solution from merge_asof
df_merged = pd.merge_asof(left=df1.reset_index(), right=df2.reset_index(),
left_on=['timestamps_df1'], right_on=['timestamps_df2'],by='some_keys',
suffixes=('', '_2')
)

Categories