pandas not matching initial index when I try to join/merge/loc - python

I've got 2 pd.series, one has datetimes and is short, the other has datetimes with matching values and is long.
I want to get a dataframe with indexes from the first series and corresponding values from the second series. Both have some duplicates. I can create a new object looping through the indexes, but there's got to be a better way? I tried join, merge and loc each time the resulting dataframe is longer than the first series of datetimes.

You can try with merge() method:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It would help a lot if you could provide a snippet of dataframes you are working on.

Related

Group By and ILOC Errors

I'm getting the following error when trying to groupby and sum by dataframe by specific columns.
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
I've checked other solutions and it's not a double column name header issue.
See df3 below which I want to group by on all columns except last two, I want to sum()
dfs head shows that if I just group by the columns names it works fine but not with iloc which I know to be the correct formula to pull back column I want to group by.
I need to use ILOC as final dataframe will have many more columns.
df.iloc[:,0:3] returns a dataframe. So you are trying to group dataframe with another dataframe.
But you just need a column list.
can you try this:
dfs = df3.groupby(list(df3.iloc[:,0:3].columns))['Churn_Alive_1','Churn_Alive_0'].sum()

Multiindex Filterings of grouped data

I have a pandas dataframe where I have done a groupby. The groupby results look like this:
As you can see this dataframe has a multilevel index ('ga:dimension3','ga:data') and a single column ('ga:sessions').
I am looking to create a dataframe with the first level of the index ('ga:dimension3') and the first date for each first level index value :
I can't figure out how to do this.
Guidance appreciated.
Thanks in advance.
Inspired from #ggaurav suggestion for using first(), I think that the following should do the work (df is the data you provided, after the group):
result=df.reset_index(1).groupby('ga:dimension3').first()
You can directly use first. As you need data based on just 'ga:dimension3', so you need to groupby it (or level=0)
df.groupby(level=0).first()
Without groupby, you can get the level 0 index values and delete the duplicated ones and keeping the first one.
df[~df.index.get_level_values(0).duplicated(keep='first')]

How can I set the index of a generated pandas Series to a column from a DataFrame?

In pandas this operation creates a Series:
q7.loc[:, list(q7)].max(axis=1) - q7.loc[:, list(q7)].min(axis=1)
I would like to be able to set the index as a list of values from a df colum. Ie
list(df['Colname'])
I've tried to create the series then update it with the series generated from the first code snippet. I've also searched the docs and don't see a method that will allow me to do this. I would prefer not to manually iterate over it.
Help is appreciated.
You can simply store that series to a variable say S and set the index accordingly as shown below..!!
S = (q7.loc[:, list(q7)].max(axis=1) - q7.loc[:, list(q7)].min(axis=1))
S.index = df['Colname']
The code is provided assuming the lengths of the series and Column from the dataframe is equal. Hope this helps.!!
If you want to reset series s index, you can do:
s.index = new_index_list

Pandas concatenate all elements of dataframe into single series

There must be a simple answer to this, but for some reason I can't find it. Apologies if this is a duplicate question.
I have a dataframe with shape on the order of (1000,100). I want to concatenate ALL items in the dataframe into a single series (or list). Order doesn't matter (so it doesn't matter what axis to concatenate along). I don't want/need to keep any column names or indices. Dropping NaNs and duplicates is ok but not required.
What's the easiest way to do this?
This will yield a 1-dim numpy-array of the lowest-common dtype for all elements.
df.values.ravel()

Pandas Join Two Series

I have two Series that I need to join in one DataFrame.
Each series has a date index and corresponding price.
When I use concat I get a DataFrame that has one index (good) but two columns that have the same values (bad).
zee_nbp = pd.concat([zee_da_df,nbp_da_df],axis=1)
The values are correct for zee_da_df but are duplicated for nbp_df_df. Any ideas? I have checked and each series has different values before they are concatenated
Thanks in advance

Categories