Pandas: Suffixes created on merge with same name columns - python

I am trying to do a left join using pandas merge however for some reason, I get suffixes on my join even though the series/columns have the same name
penalty_kicks dataframe:
backup dataframe:
Here is how I am using pd.merge to do the left join as detailed before:
penalty_kicks = pd.merge(penalty_kicks,backup, on=['video_name','image_name'],how='left')
After that, here is the output table:
Is there a way to just make the data all go into their repective columns?

Related

Pandas Dataframes: Combining Columns from Two Global Datasets when the rows hold different Countries

My Problem is that these two CSV files have different countries at different rows, so I can't just append the column in question to the other data frame.
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv
https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
I'm trying to think of some way to use a for loop, checking every row, and add the recovered cases to the correct row where the country name is the same in both data frames, but I don't know how to put that idea in to code. Help?
You can do this a couple of ways:
Option 1: use pd.concat with set_index
pd.concat([df_confirmed.set_index(['Province/State', 'Country/Region']),
df_recovered.set_index(['Province/State', 'Country/Region'])],
axis=1, keys=['Confirmed', 'Recovered'])
Option 2: use pd.DataFrame.merge with an left join or outer join using how parameter
df_confirmed.merge(df_recovered, on=['Province/State', 'Country/Region'], how='left',
suffixes=('_confirmed','_recovered'))
Using pd.read_csv from github raw format:
df_recovered = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')
df_confirmed = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')

Find where three separate DataFrames overlap and create a new DataFrame

I have three separate DataFrames. Each DataFrame has the same columns - ['Email', 'Rating']. There are duplicate row values in all three DataFrames for the column Email. I'm trying to find those emails that appear in all three DataFrames and then create a new DataFrame based off those rows. So far I have I had all three DataFrames saved to a list like this dfs = [df1, df2, df3], and then concatenated them together using df = pd.concat(dfs). I tried using groupby from here but to no avail. Any help would be greatly appreciated
You want to do a merge. Similar to a join in sql you can do an inner merge and treat the email like a foreign key. Here is the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It would look something like this:
in_common = pd.merge(df1, df2, on=['Email'], how='inner')
you could try using .isin from pandas, e.g:
df[df['Email'].isin(df2['Email'])]
This would retrieve row entries where the values for the column email are the same in the two dataframes.
Another idea is maybe try an inner merge.
Goodluck, post code next time.

How to inner join in pandas as SQL , Stuck in a problem below

I have two df named "df" and second as "topwud".
df
topwud
when I join these two dataframes bt inner join using BOMCPNO and PRTNO as the join column
like
second_level=pd.merge(df,top_wud ,left_on='BOMCPNO', right_on='PRTNO', how='inner').drop_duplicates()
Then I got this data frame
Result
I don't want common name coming as PRTNO_x and PRTNO_y , I want to keep only PRTNO_x in my result dataframe as name "PRTNO" which is default name.
Kindly help me :)
try This -
pd.merge(df1, top_wud, on=['BOMCPNO', 'PRTNO'])
What this will do though is return only the values where BOMCPNO and PRTNO exist in both dataframes as the default merge type is an inner merge.
So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_y suffix B columns.
I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outer merge:
pd.merge(df1, df2, on=['A', 'B'], how='outer')
Then what you could do is then drop duplicate rows (and possibly any NaN rows) and that should give you a clean merged dataframe.
merged_df.drop_duplicates(cols=['BOMCPNO', 'PRTNO'],inplace=True)
also try other types of join , as i dont know what exactly you want, i think its left inner .
check this if it solved your problem.

Joining two multi-index dataframes with pandas

I have two multi-index dataframes which should have the same index names (barring any format issues) that I want to join together.
DF1 =
DF2 =
I want to join DF2 to DF1 only where the NAME and code match so on this dataframe for example, it would pull in qty for 10'' of 69.0
I've tried different variations of join, concat and multiindex.join but can't see to figure it out. Any help much appreciated!

python pandas - joining specific columns

I have a main dataframe (MbrKPI4), and I want to left join it with another dataframe (mbrsdf). They have the same index. I am successful with the below.
MbrKPI4.join(mbrsdf['Gender'])
However, I want to join more columns from mbrsdf, and the below does not work (MemoryError). Is there a way to join that I can select the columns I want from mbrsdf?
MbrKPI4.join(mbrsdf['Gender'], mbrsdf['Marital Status'])
Based on documentation for join() I think you want to pass in an array of dataframes to left join by, or chain join calls.
d1.join([d2['Gender'], d2['Marital Status']])
d1.join(d2['Gender']).join(d2['Marital Status'])
Hope that works.

Categories