This question already has answers here:
How to access pandas groupby dataframe by key
(6 answers)
Closed 9 years ago.
I have grouped sum data in a dataframe, by the following:
groups = df.groupby(['name'])
Now I can get the head of the groups by groups.head(2) which gives the first two rows.
But how do I get a group by a specific name? i.e. if I want the single group where the group name is 'ruby', I can't just do groups['ruby']
How about:
groups.get_group('name')
For more elaboration, see this related question
Related
This question already has answers here:
Python: get a frequency count based on two columns (variables) in pandas dataframe some row appears
(3 answers)
Closed last year.
I'm working on the following dataset:
and I want to count each value in the LearnCode column for each Age category, I've tried doing it using Groupby method but didn't manage to get it correctly, can anyone help on how to do it?
You can do this using a groupby on two columns
results = df.groupby(by=['Age', 'LearnCode']).count()
This outputs a count for each ['Age', 'LearnCode'] pair
This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 1 year ago.
df.loc[df['name'] == 'Mary']
The above get rows if the 'name' is Mary. What about if I want rows that contains 'Mary', not exactly equal 'Mary'?
You can use pd.Series.str.contains() method to achieve this.
df[df['name'].str.contains('Mary')]
This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
Closed 2 years ago.
I have a dataframe like the one below having 3 types of status -'complete','start' and 'fail'. I want to create another dataframe from this keeping only the "fail" status entries with their corresponding level number.
Let's do this:
fail_df = df[df['status']=='fail']
or this with str.contains:
fail_df = df[df['status'].str.contains(r'fail',case=False)]
Both ways will give a new dataframe with only status being 'fail'. However, the str.contains method is more robust to typo's.
This question already has answers here:
How do I Pandas group-by to get sum?
(11 answers)
Closed 2 years ago.
DataFrame in question
I need to find out total invoice value for each supplier and create a new dataframe with unique supplier names as follows.
Final Output desired
Try this:
sum_by_supplier = (df.groupby('Supplier Name')['Invoice Value'].sum()).reset_index()
This question already has answers here:
Pandas get topmost n records within each group
(6 answers)
Select top n columns based on another column
(1 answer)
Pandas GroupBy : How to get top n values based on a column
(2 answers)
Closed 2 years ago.
I am stuck in a requirement to pick the top 5 rows sorted in desc in the data frame based on column var_count and for each unique col_name in the dataFrame.
I would like to have the output to be sorted in desc based on var_count in the attached screenshot
Could someone please help?
I have attached the screenshot
My code:
df_ans
#df_ans.groupby('col_name')['var_name'].nlargest(5)
#top_col_values = pd.DataFrame(df_ans.groupby('col_name')['var_count'].nlargest(5))
top_col_values = df_ans.groupby('col_name').head(5)
top_col_values