Pandas Merge Multiple Columns - python

I am struggling to merge two pandas dataframes to replicate a vlookup function using two columns as lookup value.
The first dataframe df has 6 columns including three columns: perf, ticker and date. The perf column is empty and this is the one I would like to see populated. The second dataframe u includes the same three columns, including values in the perf column but only for a specific date.
I have tried this:
df=pd.merge(df,u,how='left',on=['ticker_and_exch_code', 'date'])
But the result I get is a dataframe with new perf columns instead of populating the one existing perf column. Would really appreciate insights into what I am missing, thanks!
Vincent

If the 'perf' column is empty in the first DataFrame, may I suggest removing it before merging the two DataFrames?
df=pd.merge(
df.drop(columns='perf'),
u,
how='left',
on=['ticker_and_exch_code', 'date'],
)

Related

How to get rows from one dataframe based on another dataframe

I just edited the question as maybe I didn't make myself clear.
I have two dataframes (MR and DT)
The column 'A' in dataframe DT is a subset of the column 'A' in dataframe MR, they both are just similar (not equal) in this ID column, the rest of the columns are different as well as the number of rows.
How can I get the rows from dataframe MR['ID'] that are equal to the dataframe DT['ID']? Knowing that values in 'ID' can appear several times in the same column.
The DT is 1538 rows and MR is 2060 rows).
I tried some lines proposed here >https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe but I got bizarre results as I don't fully understand the methods they proposed (and the goal is little different)
Thanks!
Take a look at pandas.Series.isin() method. In your case you'd want to use something like:
matching_id = MR.ID.isin(DT.ID) # This returns a boolean Series of whether values match or not
# Now filter your dataframe to keep only matching rows
new_df = MR.loc[matching_id, :]
Or if you want to just get a new dataframe of combined records for the same ID you need to use merge():
new_df = pd.merge(MR, DT, on='ID')
This will create a new dataframe with columns from both original dfs but only where ID is the same.

When left joining pandas dataframe why the result has less columns than the left dataframe

I am working on a project and at one point I need to left join two dataframes: df and temp.
df has around 20 columns and 47576 rows while temp has 4 columns and 446829 rows; the two dataframe have to be joined on three columns (shared by the both of them).
To avoid creating extra Lines I first run the following:
temp = temp.drop_duplicates(subset=['A','B','C'])
Then I join the two dataframes running the function:
df_1 = pd.merge(df, temp, how='left', left_on=['A','B','C']; right_on=['A','B','C'])
I would then assume that the df_1 dataframe has exactly as many rows as df (since it can't have more as I have already dropped the duplicates in temp; and it shouldn't have less as it is a left join).
But I see that actually the df_1 dataframe has 30259 rows which is much less than the 47576 rows the df dataframe had.
How is this possible?
(Also, thinking it could somehow help I filled in the Nan values of the columns 'A','B','C' in the df dataframe but it doesn't seem to help)

appending in pandas - row wise

I'm trying to append two columns of my dataframe to an existing dataframe with this:
dataframe.append(df2, ignore_index = True)
and this does not seem to be working.
This is what I'm looking for (kind of) --> a dataframe with 2 columns and 6 rows:
although this is not correct and it's using two print statements to print the two dataframes, I thought it might be helpful to have a selection of the data in mind.
I tried to use concat(), but that leads to some issues as well.
dataframe = pd.concat([dataframe, df2])
but that appears to concat the second dataframe in columns rather than rows, in addition to gicing NaN values:
any ideas on what I should do?
I assume this happened because your dataframes have different column names. Try assigning the second dataframe column names with the first dataframe column names.
df2.columns = dataframe.columns
dataframe_new = pd.concat([dataframe, df2], ignore_index=True)

Pandas merge is giving different answers with index merge and column merge

I have three DataFrames for which I am trying to merge and output the result. The common column in each DataFrame I am trying to merge on is COUNTRY.
Case1:
Before merging the three DataFrames I have set the index of each DataFrame to COUNTRY and did
pd.merge(leftdf,rightdf,left_index=True,right_index=True,how="inner")
I am getting the required answer. But when I am not setting the indices of each DataFrame to Country, leaving them as columns, and performing the merge
pd.merge(leftdf,rightdf,on="Country",how="inner")
the resultant DataFrame is reduced in size. I am loosing some rows. Why is this happening? I do not understand.

Pandas merge DataFrames based on index/column combination

I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.

Categories