Making Pandas dataframe to display aggregate values based on date

Making Pandas dataframe to display aggregate values based on date - python

Hello a Python newbie here.
I have a dataframe that shows the product and how much they sold on each date
I need to change this dataframe to show the aggregate amount of units sold.
This is just an example dataframe and the actual dataframe that I am dealing with contains hundreds of products and 3 years worth of sales data.
It would be appreciated if there would be a way to do this efficiently.
Thank you in advance!!

If product is column use DataFrame.set_index with DataFrame.cumsum for cumulative sum:
df1 = df.set_index('product').cumsum(axis=1)
If product is index:
df1 = df.cumsum(axis=1)

Related

How can I reshape my wide data into a time series/long format?

See the following data snippet:
The column from the right is a data variable ranging from August 2001 to August '97
What I would like to do is merge all these columns together into one 'Date' column. For further context, the columns are of equal length.

If all you need is the dates, how much was purchased and the id of the material you could drop the columns that aren't dates (i.e. Del. time- Total) and transpose your dataset.
In pandas
dataframe = dataframe.T

How to count the number of distinct multiline index in pandas, only by one of the indices components

I have a dataframe that looks like this:
Input dataframe
I want to find the contribution of each category to the Price(USD) column by day. So far I've tried aggregating by Timestamp and Category, with the sum of Price(USD):
df3 = df.groupby(["Timestamp", "Category"]).sum()
Obtaining the following dataset:
Dataset grouped by Timestamp and Category
After this point, I haven't been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.
Ideally, a new column "Percentage" would contain :
Percentage
0.3/(0.3+0.2+0.1)
0.2/(0.3+0.2+0.1)
0.1/(0.3+0.2+0.1)
With the same pattern for the rest of the dataframe.
Thank you

Seems like you need
>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()

here is another way about it
df.groupby(['Timestamp','Category'])['price'].transform(sum) / df.groupby(['Timestamp'])['price'].transform(sum)

Dataframe: joining rows in based on a Month column

Within one dataframe, trying to concatenate rows which have the same price, customer etc. Only variable that's changing is Month.
Would the best solution be to split the dataframe into 2 ? and then use a merge function?

To find average of data of month end date only using pandas

I have a list of company names, dates, and pe ratios.
I need to find an average of the previous 10 years data of the given date such that only month-end date is considered.
for example if I need to find average as of 31st dec, 2015..... I need to first find data of all previous month ends from 31/12/2005 to 31/12/2015. and then their average.
sample data I have
required output:
required output
here is what I have done soo far....
df = pd.read_csv('daily_valuation_ratios_cc.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
columns = ['pe', 'price_bv', 'mcap_ns', 'ev_ebidta']
df_mean = df.groupby('Company Name')[columns].resample('M').mean()
but this method is finding mean on daily basis and is showing result monthly, unlike my sample output.
i am new to pandas, pls help.
Edit:
df3 = df.groupby(['Company Name','year','month'])
df3.first()
this code works, now I just have one problem, to export dataframe to to_csv. pls help

A dataframe has a special function called groupby that selects a column, and can be aggregated.
So if you were to run, data.groupby('pe') you would get that column.
Now if you were to tack on .describe, you would get the standard deviation/mean/min/ect.
Example:
data.groupby('pe').describe()
Edit: You can also use built-in aggregate functions such as .max()/.mean()/ect. with groupby().

Pandas, comparing different length dataframes for a value range & writing back to original df

I have two dataframes with different index lengths to compare.
df1: daily low stock prices (this has one low price per day)
df2: daily purchases of stock (this has more than one buy per day)
I'd like to iterate through the rows of the price per date in df2 checking if df2[Price] > df1[low] and adding YES in df2[In_range] for that row if it is, and NO if it's not.
I've included a screen shot of the tables and simple diagram with the description do you can see.
Picture of tables with simple diagram
If you need more clarification please let me know :)
Thanks,
Elliot

The best way would be to add the column 'low' to the second dataframe.
df2 = df2.merge(df1, on = ['Company', 'time'])
And then performing the check would be simple in a single dataframe
df2['In range'] = df2['Price'] >= df2['low']

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Making Pandas dataframe to display aggregate values based on date - python

If product is column use DataFrame.set_index with DataFrame.cumsum for cumulative sum: df1 = df.set_index('product').cumsum(axis=1) If product is index: df1 = df.cumsum(axis=1)

Related

How can I reshape my wide data into a time series/long format?

How to count the number of distinct multiline index in pandas, only by one of the indices components

Dataframe: joining rows in based on a Month column

To find average of data of month end date only using pandas

Pandas, comparing different length dataframes for a value range & writing back to original df

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Making Pandas dataframe to display aggregate values based on date - python

If product is column use DataFrame.set_index with DataFrame.cumsum for cumulative sum: df1 = df.set_index('product').cumsum(axis=1) If product is index: df1 = df.cumsum(axis=1)

Related

How can I reshape my wide data into a time series/long format?

How to count the number of distinct multiline index in pandas, only by one of the indices components

Dataframe: joining rows in based on a Month column

To find average of data of month end date only using pandas

Pandas, comparing different length dataframes for a value *range* & writing back to original df

Categories

Resources

Pandas, comparing different length dataframes for a value range & writing back to original df