Python Dataframes not merging on index - python

I'm trying to merge 2 dataframes, but for some reason it's throwing KeyError: Player_Id
I'm trying to merge on Striker_Id and Player_Id
This is how my Dataframe looks like
Merge Code:
player_runs.merge(matches_played_by_players,left_on='Striker_Id',right_on='Player_Id',how='left')
What am I doing wrong?

Hmm, from looking at your problem, it seems like you're trying to merge on the indexes, but you treat them as columns? Try changing your merge code a bit -
player_runs.merge(matches_played_by_players,
left_index=True,
right_index=True,
how='left')
Furthermore, make sure that both indexes are of the same type (in this case, consider strint?)
player_runs.index = player_runs.index.astype(int)
And,
matches_played_by_players.index = matches_played_by_players.index.astype(int)

you're basically merging on none existing columns. this is because reset_index creates a new data-frame rather than changing the data frame it's applied to. setting the parameter inplace=True when using reset_index should resolve this issue, alternatively merge on the index of each data-frame. i.e.
pd.merge(df1,df2,left_index=True,right_index=True,how='left')

Related

Merging Two Dataframes stacks rows instead of merging into one

I'm attempting to merge two dataframes using two columns as keys: "Date" and "Instrument"
Here is my code:
merge_df = pd.merge(df1 , df2, how='outer', left_on=['Date','Instrument'], right_on = ['Date','Instrument'])
df1:
df2:
You'll notice that the row in each dataframe has the same instrument and date value: AEA000201011 & 2008-01-31.
The merged dataframe is stacking the two rows instead of combining them:
merged_df:
I have ensured that the dataframe key columns dtypes match:
df1:
df2:
Any advice would be much appreciated!
Man I wish I could use add comment section.
Even though you've probably already tried, have tried to use "left" or "right" instead of "outer"
Or for once check them like
df1["Instrument"].iloc[0] == df2["Instrument"].iloc[0]
Maybe they got some invisible chars in them. If it's like that you can try using strip() functions.
Nothing other than these comes to my mind.

Unable to merge datasets

I have scraped data from two different pharma websites. So, I have 2 datasets in hand:-
Both datasets have a name column in common. What I am trying to achieve is combining these two datasets. My final objective is to get all the tables from the first dataset and product descriptions from the second dataset wherever the name is the same in both tables.
I tried using information from geeks for geeks:- https://www.geeksforgeeks.org/different-types-of-joins-in-pandas/
and https://pandas.pydata.org/docs/user_guide/merging.html
but not getting the expected result.
Also, I tried it using the for loop but to no avail:-
new_df['Product_description']=''
for i in range(len(new_df['Name'])):
for j in range(len(match_data['Name'])):
if type(new_df['Name'][i]) != float:
if new_df['Name'][i] == match_data['Name'][j].split(' ')[0].strip():
new_df['Product_description'][i] = match_data['Product_Description'][j]
I also tried:
but it's giving me 106 result which was from the older dataset and I need 251 results as in the new_df.
I want something like this but matched from the match_df data frame.
Can anyone suggest what I am doing here?
Result with left join
Also, below are the values I am getting after finding the unique values sorted.
If you want to keep the size of the first dataframe constant, you need to use left join. If there are mismatched values, it will be set to null, but this will keep the size constant.
Also remember that the first parameter of the merge method is the dataframe whose size you want to keep constant when 'how' is 'left'.
If you want to keep new_df length, I would suggest to use how='left' argument in
pd.merge(new_df, match_data, on="Name", how="left")
So it will do a left join on new_df.
Based in the screenshots you shared, I would double-check there are names in common in both dataframes "Name" column
Did you try these?
desc_df1 = pd.merge(new_df, match_data, on='Name', how='inner')
desc_df1 = pd.merge(new_df, match_data, on='Name', how='left')
After trying these options let us now, because I could not able to understand from your data preview. Can you sort Name.value_counts() ascending and check is there any dublicates in both df's ?.If so this is why you got this problem

Multiindex Filterings of grouped data

I have a pandas dataframe where I have done a groupby. The groupby results look like this:
As you can see this dataframe has a multilevel index ('ga:dimension3','ga:data') and a single column ('ga:sessions').
I am looking to create a dataframe with the first level of the index ('ga:dimension3') and the first date for each first level index value :
I can't figure out how to do this.
Guidance appreciated.
Thanks in advance.
Inspired from #ggaurav suggestion for using first(), I think that the following should do the work (df is the data you provided, after the group):
result=df.reset_index(1).groupby('ga:dimension3').first()
You can directly use first. As you need data based on just 'ga:dimension3', so you need to groupby it (or level=0)
df.groupby(level=0).first()
Without groupby, you can get the level 0 index values and delete the duplicated ones and keeping the first one.
df[~df.index.get_level_values(0).duplicated(keep='first')]

Merging 3 datasets together

I have 3 datasets (csv.) that I need to merge together inorder to make a motion chart.
they all have countries as the columns and the year as rows.
The other two datasets look the same except one is population and the other is income. I've tried looking around to see what I can find to get the data set out like I'd like but cant seem to find anything.
I tried using pd.concat but it just lists it all one after the other not in separate columns.
merge all 3 data sets in preperation for making motionchart using pd.concat
mc_data = pd.concat([df2, pop_data3, income_data3], sort = True)
Any sort of help would be appreciated
EDIT: I have used the code as suggested, however I get a heap of NaN values that shouldn't be there
mc_data = pd.concat([df2, pop_data3, income_data3], axis = 1, keys = ['df2', 'pop_data3', 'income_data3'])
EDIT2: when I run .info and .index on them i get these results. Could it be to do with the data types? or the column entries?
From this answer:
You can do it with concat (the keys argument will create the hierarchical columns index):
pd.concat([df2, pop_data3, income_data3], axis=1, keys=['df2', 'pop_data3', 'income_data3'])

How to inner join in pandas as SQL , Stuck in a problem below

I have two df named "df" and second as "topwud".
df
topwud
when I join these two dataframes bt inner join using BOMCPNO and PRTNO as the join column
like
second_level=pd.merge(df,top_wud ,left_on='BOMCPNO', right_on='PRTNO', how='inner').drop_duplicates()
Then I got this data frame
Result
I don't want common name coming as PRTNO_x and PRTNO_y , I want to keep only PRTNO_x in my result dataframe as name "PRTNO" which is default name.
Kindly help me :)
try This -
pd.merge(df1, top_wud, on=['BOMCPNO', 'PRTNO'])
What this will do though is return only the values where BOMCPNO and PRTNO exist in both dataframes as the default merge type is an inner merge.
So what you could do is compare this merged df size with your first one and see if they are the same and if so you could do a merge on both columns or just drop/rename the _x/_y suffix B columns.
I would spend time though determining if these values are indeed the same and exist in both dataframes, in which case you may wish to perform an outer merge:
pd.merge(df1, df2, on=['A', 'B'], how='outer')
Then what you could do is then drop duplicate rows (and possibly any NaN rows) and that should give you a clean merged dataframe.
merged_df.drop_duplicates(cols=['BOMCPNO', 'PRTNO'],inplace=True)
also try other types of join , as i dont know what exactly you want, i think its left inner .
check this if it solved your problem.

Categories