I am trying to merge two dataframe that are multi-index, while preserving the highest level index. The problem is merging on axis=1 results in the below two columns. Merging/joining on axis=0 drops any value in the 0_y column that has the same sub-index as an entry in )_x. An example below is (226,0), where the value 1510123295301 gets dropped if I merge/join on axis=0.
Is there any way to merge two multi-index dataframes into a single column, preserving the primary index (e.g. 226), but expanding to include non-duplicates in the right-hand column (e.g. 226(0-6))?
Related
I have three DataFrames for which I am trying to merge and output the result. The common column in each DataFrame I am trying to merge on is COUNTRY.
Case1:
Before merging the three DataFrames I have set the index of each DataFrame to COUNTRY and did
pd.merge(leftdf,rightdf,left_index=True,right_index=True,how="inner")
I am getting the required answer. But when I am not setting the indices of each DataFrame to Country, leaving them as columns, and performing the merge
pd.merge(leftdf,rightdf,on="Country",how="inner")
the resultant DataFrame is reduced in size. I am loosing some rows. Why is this happening? I do not understand.
I have a Multiindex dataframe with 2 index levels and 2 column levels.
The first level index and first index columns are the same. The second levels share elements but are not equal. This gives me a non square dataframe (I have more elements in my 2nd level columns than in my second level index)
I want to set all elements of my dataframe to 0 in the case the first level index is not equal to the first level column. I have done it recursively but am sure there is a better way.
Can you help?
Thanks
I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.
I have a sparse dataframe with duplicate indices. How can I merge the same-indexed rows in a way that I keep all the non-NaN data from the conflicting rows?
I know that you can achieve something very close with the built-in drop_duplicates function, but you can only keep either the first or the last row with the same index:
df.reset_index().drop_duplicates(subset='index', keep='first').set_index('index').sort_index()
What I'd need is all non-nan values, from any of the conflicting rows.
Before:
After:
df.reset_index().groupby('index').max()
This will select the non-NaN values from the conflicting rows. Or, if there are values in multiple conflicting rows for the same column, the maximum of them.
I have two Series that I need to join in one DataFrame.
Each series has a date index and corresponding price.
When I use concat I get a DataFrame that has one index (good) but two columns that have the same values (bad).
zee_nbp = pd.concat([zee_da_df,nbp_da_df],axis=1)
The values are correct for zee_da_df but are duplicated for nbp_df_df. Any ideas? I have checked and each series has different values before they are concatenated
Thanks in advance