I have two Series that I need to join in one DataFrame.
Each series has a date index and corresponding price.
When I use concat I get a DataFrame that has one index (good) but two columns that have the same values (bad).
zee_nbp = pd.concat([zee_da_df,nbp_da_df],axis=1)
The values are correct for zee_da_df but are duplicated for nbp_df_df. Any ideas? I have checked and each series has different values before they are concatenated
Thanks in advance
Related
I'm getting the following error when trying to groupby and sum by dataframe by specific columns.
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
I've checked other solutions and it's not a double column name header issue.
See df3 below which I want to group by on all columns except last two, I want to sum()
dfs head shows that if I just group by the columns names it works fine but not with iloc which I know to be the correct formula to pull back column I want to group by.
I need to use ILOC as final dataframe will have many more columns.
df.iloc[:,0:3] returns a dataframe. So you are trying to group dataframe with another dataframe.
But you just need a column list.
can you try this:
dfs = df3.groupby(list(df3.iloc[:,0:3].columns))['Churn_Alive_1','Churn_Alive_0'].sum()
I just edited the question as maybe I didn't make myself clear.
I have two dataframes (MR and DT)
The column 'A' in dataframe DT is a subset of the column 'A' in dataframe MR, they both are just similar (not equal) in this ID column, the rest of the columns are different as well as the number of rows.
How can I get the rows from dataframe MR['ID'] that are equal to the dataframe DT['ID']? Knowing that values in 'ID' can appear several times in the same column.
The DT is 1538 rows and MR is 2060 rows).
I tried some lines proposed here >https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe but I got bizarre results as I don't fully understand the methods they proposed (and the goal is little different)
Thanks!
Take a look at pandas.Series.isin() method. In your case you'd want to use something like:
matching_id = MR.ID.isin(DT.ID) # This returns a boolean Series of whether values match or not
# Now filter your dataframe to keep only matching rows
new_df = MR.loc[matching_id, :]
Or if you want to just get a new dataframe of combined records for the same ID you need to use merge():
new_df = pd.merge(MR, DT, on='ID')
This will create a new dataframe with columns from both original dfs but only where ID is the same.
I have three DataFrames for which I am trying to merge and output the result. The common column in each DataFrame I am trying to merge on is COUNTRY.
Case1:
Before merging the three DataFrames I have set the index of each DataFrame to COUNTRY and did
pd.merge(leftdf,rightdf,left_index=True,right_index=True,how="inner")
I am getting the required answer. But when I am not setting the indices of each DataFrame to Country, leaving them as columns, and performing the merge
pd.merge(leftdf,rightdf,on="Country",how="inner")
the resultant DataFrame is reduced in size. I am loosing some rows. Why is this happening? I do not understand.
I have multiple dataframe with timeseries index in dfList.(example dataframe is shown below)
I tried to concatenate these dataframe into one dataframe by following command.
db=pd.concat(dfList)
and I got following dataframe.
Timeseries index are duplicated (many index are 2012-10-12 20:00:00) since timeseries in base dataframe was overlapping each other.
I want to remove this duplicate. Does anyone know how to do this?
some example dataframe in which timeseries index are overlapping is shown below
Thank you!!
You can simply drop the duplicates by a particular column value as mentioned in the docs here. You may do something like this:
db = db.drop_duplicates(cols="Timestamp")
which will drop all rows with duplicates in the column "Timestamp" except the first occurrence.
I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.