Group By and ILOC Errors - python

I'm getting the following error when trying to groupby and sum by dataframe by specific columns.
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
I've checked other solutions and it's not a double column name header issue.
See df3 below which I want to group by on all columns except last two, I want to sum()
dfs head shows that if I just group by the columns names it works fine but not with iloc which I know to be the correct formula to pull back column I want to group by.
I need to use ILOC as final dataframe will have many more columns.

df.iloc[:,0:3] returns a dataframe. So you are trying to group dataframe with another dataframe.
But you just need a column list.
can you try this:
dfs = df3.groupby(list(df3.iloc[:,0:3].columns))['Churn_Alive_1','Churn_Alive_0'].sum()

Related

DataFrame Pandas condition over a column

Dear fellows I´ve difficulties by performing a condition over a column in my DataFrame, i want to iterate over the column and extract only the values that starts with the number 6, the values from that column are floats.
The columns is called "Vendor".
This is my Dataframe, and I want to sum the values from the column "Amount in loc.curr.2" only for the values from column "Vendor" starts with 6.
This is what I´ve been traying
Also this
idx = df_spend['Vendor'].apply(lambda x: str(x).startswith('6'))
This should create a Boolean pandas.Series that you can use as an index.
summed_col=df_spend.loc[idx,"Amount in loc.curr.2"].apply(sum)
summed_col contains the sum of the column
Definitely take a look at the pandas documentation for the apply function: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Hope this works! :)

How to get rows from one dataframe based on another dataframe

I just edited the question as maybe I didn't make myself clear.
I have two dataframes (MR and DT)
The column 'A' in dataframe DT is a subset of the column 'A' in dataframe MR, they both are just similar (not equal) in this ID column, the rest of the columns are different as well as the number of rows.
How can I get the rows from dataframe MR['ID'] that are equal to the dataframe DT['ID']? Knowing that values in 'ID' can appear several times in the same column.
The DT is 1538 rows and MR is 2060 rows).
I tried some lines proposed here >https://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe but I got bizarre results as I don't fully understand the methods they proposed (and the goal is little different)
Thanks!
Take a look at pandas.Series.isin() method. In your case you'd want to use something like:
matching_id = MR.ID.isin(DT.ID) # This returns a boolean Series of whether values match or not
# Now filter your dataframe to keep only matching rows
new_df = MR.loc[matching_id, :]
Or if you want to just get a new dataframe of combined records for the same ID you need to use merge():
new_df = pd.merge(MR, DT, on='ID')
This will create a new dataframe with columns from both original dfs but only where ID is the same.

Apply functions to nested list inside pandas rows

I have the following df and I'm trying to figure out how to extract the unique values from each list in each row in order to simplify my df.
As if you were to apply unique() to the first row and then you get 'NEUTRALREGION' only once. Please note that I have another 4 columns with the same requirements.
I solved this using df.applymap(lambda x: set(x)).
That allowed me to check the unique values in each cell.

accessing specific columns of a dataframe, index specified by idxmax()

I have a dataframe row that I would like to access specific columns of. The index for this row is specified from a idxmax command.
idx_interest=(df['colA']==matchingstring).idxmax()
Using this index, I want to access specific columns, namely colB and colD of the df # index=idx_interest
df.loc[idx_interest,['colB','colD']].reset_index().values.tolist()
however, doing so gave me the error: cannot perform reduce on flexible type. How do I go about accessing columns of a df # index given from an idxmax command>
You need to first apply your filter to your dataframe df correctly, in order to return idx_interest. If your original dataframe is a MultiIndex, then be mindful that this will return a tuple:
idx_interest = df[df['colA']==matchingstring].idxmax()
Now that you have idx_interest, you can limit your dataframe to the columns you want and then call .iloc() to specify a row index:
df[['colB','colD']].iloc[idx_interest].values.tolist()
The code you provide above will also work assuming that idx_interest returns an int:
df.loc[idx_interest,['colB','colD']].reset_index().values.tolist()

Pandas Join Two Series

I have two Series that I need to join in one DataFrame.
Each series has a date index and corresponding price.
When I use concat I get a DataFrame that has one index (good) but two columns that have the same values (bad).
zee_nbp = pd.concat([zee_da_df,nbp_da_df],axis=1)
The values are correct for zee_da_df but are duplicated for nbp_df_df. Any ideas? I have checked and each series has different values before they are concatenated
Thanks in advance

Categories