Grouping a Pandas DataFrame by Months for Distributor Names

Grouping a Pandas DataFrame by Months for Distributor Names - python

I have a dataframe that looks like this:
[1]: https://i.stack.imgur.com/KnHba.png
Essentially, there is a distributor name column, a sales column, and a MM/DD/YYYY Date column.
For each distributor, by month, I want the sum of sales.
What I tried:
df = df.groupby(df['Distributor Name'],df.Date.dt.month)['Sales'].sum()
This throws an error. "Unhashable type: Series". This works when I remove Distributor Name, but I don't just want the overall monthly sales. I want the monthly sales BY distributor.
Thanks in advance!!
Joel

The correct way to group by multiple columns is to put them in a list as a first argument:
result = df.groupby(['Distributor Name', df.date.dt.month])['Sales'].sum()
This creates a multiindex pandas series, with Distributor Name and Date as indices. If you wish to create a dataframe with three columns (Distributor Name, Date, Sales) you can reset index of this pandas series.
result = result.reset_index()

Related

Merge Rows based on similar values in some columns

please I need help, I am recently started learning python. Please, how do I merge rows with the same “PatientID” and the same “Resource” as one with “StartDate” and “EndDate” as the average of the merged rows?
enter image description here

Given df is the name for the pandas.DataFrame containing your data.
To get the earliest StartDate and EndDate of each patient's resource, you can write:
# Group by the 'PatientID' and 'Resource' columns
grouped_df = df.groupby(['PatientID', 'Resource'])
# Select Earliest `StartDate` and `EndDate` from aggregate.
grouped_df = grouped_df.min(['StartDate', 'EndDate'])
# Remove levels from the index.
grouped_df.reset_index(inplace=True)

Making Pandas dataframe to display aggregate values based on date

Hello a Python newbie here.
I have a dataframe that shows the product and how much they sold on each date
I need to change this dataframe to show the aggregate amount of units sold.
This is just an example dataframe and the actual dataframe that I am dealing with contains hundreds of products and 3 years worth of sales data.
It would be appreciated if there would be a way to do this efficiently.
Thank you in advance!!

If product is column use DataFrame.set_index with DataFrame.cumsum for cumulative sum:
df1 = df.set_index('product').cumsum(axis=1)
If product is index:
df1 = df.cumsum(axis=1)

Python (pandas) - sum multiple columns based on one column

I have a dataframe of Covid-19 deaths by country. Countries are identified in the Country column. Sub-national classification is based on the Province column.
I want to generate a dataframe which sums all columns based on the value in the Country column (except the first 2, which are geographical data). In short, for each date, I want to compress the observations for all provinces of a country such that I get a single number for each country.
Right now, I am able to do that for a single date:
import pandas as pd
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-
19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv'
raw = pd.read_csv(url)
del raw['Lat']
del raw['Long']
raw.rename({'Country/Region': 'Country', 'Province/State': 'Province'}, axis=1, inplace=True)
raw2 = raw.groupby('Country')['6/29/20'].sum()
How can I achieve this for all dates?

You can use iloc:
raw2 = raw.iloc[:,4:].groupby(raw.Country).sum()

how to combine province data into one row country data in python

This is my first question here. So I have COVID data in python, the distribution of COVID cases in several provinces in each country.
What should I do if I want to make each country only have one data (one row), and drop the province column?
Thank you

You need to groupby date and country, apply sum() as the aggregate function, and drop the province column:
e.g. if you have a dataframe df, with columns date, country, province, and cases, then you should do:
grouped_df = df.groupby(['date', 'country']).sum().drop('province', axis=1)

Select DataFrame rows of a specific day

I have a DataFrame with a date_time column. The date_time column contains a date and time. I also managed to convert the column to a datetime object.
I want to create a new DataFrame containing all the rows of a specific DAY.
I managed to do it when I set the date column as the index and used the "loc" method.
Is there a way to do it even if the date column is not set as the index? I only found a method which returns the rows between two days.

You can use groupby() function. Let's say your dataframe is df,
df_group = df.groupby('Date') # assuming the column containing dates is called Date.
Now you can access rows of any date by passing the date in the get_group function,
df_group.get_group('date_here')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouping a Pandas DataFrame by Months for Distributor Names - python

Related

Merge Rows based on similar values in some columns

Making Pandas dataframe to display aggregate values based on date

Python (pandas) - sum multiple columns based on one column

how to combine province data into one row country data in python

Select DataFrame rows of a specific day

Categories

Resources