Pandas merge is giving different answers with index merge and column merge - python

I have three DataFrames for which I am trying to merge and output the result. The common column in each DataFrame I am trying to merge on is COUNTRY.
Case1:
Before merging the three DataFrames I have set the index of each DataFrame to COUNTRY and did
pd.merge(leftdf,rightdf,left_index=True,right_index=True,how="inner")
I am getting the required answer. But when I am not setting the indices of each DataFrame to Country, leaving them as columns, and performing the merge
pd.merge(leftdf,rightdf,on="Country",how="inner")
the resultant DataFrame is reduced in size. I am loosing some rows. Why is this happening? I do not understand.

Related

Split a large dataframe into a list of dataframes based on values of row of multiple columns

I have a large time-series dataframe (df) in which the columns Year#, Month#, Week# are derived from the Date column.
Question, how to accomplish the following:
For all rows which have the same Year#, Month# and Week#, I would like to extract the rows of df into a new dataframe df_subsetx as shown in attached photo.
I would like to compile all these subset dataframes as a 'list of dataframes'
PS- do chime in if there is a better way to handle multiple dataframes than making them 'list of dataframes'

Pandas Merge Multiple Columns

I am struggling to merge two pandas dataframes to replicate a vlookup function using two columns as lookup value.
The first dataframe df has 6 columns including three columns: perf, ticker and date. The perf column is empty and this is the one I would like to see populated. The second dataframe u includes the same three columns, including values in the perf column but only for a specific date.
I have tried this:
df=pd.merge(df,u,how='left',on=['ticker_and_exch_code', 'date'])
But the result I get is a dataframe with new perf columns instead of populating the one existing perf column. Would really appreciate insights into what I am missing, thanks!
Vincent
If the 'perf' column is empty in the first DataFrame, may I suggest removing it before merging the two DataFrames?
df=pd.merge(
df.drop(columns='perf'),
u,
how='left',
on=['ticker_and_exch_code', 'date'],
)

Merge/Join Multi-index Dataframes and combine columns

I am trying to merge two dataframe that are multi-index, while preserving the highest level index. The problem is merging on axis=1 results in the below two columns. Merging/joining on axis=0 drops any value in the 0_y column that has the same sub-index as an entry in )_x. An example below is (226,0), where the value 1510123295301 gets dropped if I merge/join on axis=0.
Is there any way to merge two multi-index dataframes into a single column, preserving the primary index (e.g. 226), but expanding to include non-duplicates in the right-hand column (e.g. 226(0-6))?

Pandas merge DataFrames based on index/column combination

I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.

Pandas Join Two Series

I have two Series that I need to join in one DataFrame.
Each series has a date index and corresponding price.
When I use concat I get a DataFrame that has one index (good) but two columns that have the same values (bad).
zee_nbp = pd.concat([zee_da_df,nbp_da_df],axis=1)
The values are correct for zee_da_df but are duplicated for nbp_df_df. Any ideas? I have checked and each series has different values before they are concatenated
Thanks in advance

Categories