Hello a Python newbie here.
I have a dataframe that shows the product and how much they sold on each date
I need to change this dataframe to show the aggregate amount of units sold.
This is just an example dataframe and the actual dataframe that I am dealing with contains hundreds of products and 3 years worth of sales data.
It would be appreciated if there would be a way to do this efficiently.
Thank you in advance!!
If product is column use DataFrame.set_index with DataFrame.cumsum for cumulative sum:
df1 = df.set_index('product').cumsum(axis=1)
If product is index:
df1 = df.cumsum(axis=1)
Related
See the following data snippet:
The column from the right is a data variable ranging from August 2001 to August '97
What I would like to do is merge all these columns together into one 'Date' column. For further context, the columns are of equal length.
If all you need is the dates, how much was purchased and the id of the material you could drop the columns that aren't dates (i.e. Del. time- Total) and transpose your dataset.
In pandas
dataframe = dataframe.T
I have a dataframe that looks like this:
Input dataframe
I want to find the contribution of each category to the Price(USD) column by day. So far I've tried aggregating by Timestamp and Category, with the sum of Price(USD):
df3 = df.groupby(["Timestamp", "Category"]).sum()
Obtaining the following dataset:
Dataset grouped by Timestamp and Category
After this point, I haven't been able to apply a function to each row to divide each Price(USD) by the sum of all different categories in each day and create a new column with these values.
Ideally, a new column "Percentage" would contain :
Percentage
0.3/(0.3+0.2+0.1)
0.2/(0.3+0.2+0.1)
0.1/(0.3+0.2+0.1)
With the same pattern for the rest of the dataframe.
Thank you
Seems like you need
>>> df.groupby(["Timestamp", "Category"]).sum() / df.groupby(["Timestamp"]).sum()
here is another way about it
df.groupby(['Timestamp','Category'])['price'].transform(sum) / df.groupby(['Timestamp'])['price'].transform(sum)
Within one dataframe, trying to concatenate rows which have the same price, customer etc. Only variable that's changing is Month.
Would the best solution be to split the dataframe into 2 ? and then use a merge function?
I have a list of company names, dates, and pe ratios.
I need to find an average of the previous 10 years data of the given date such that only month-end date is considered.
for example if I need to find average as of 31st dec, 2015..... I need to first find data of all previous month ends from 31/12/2005 to 31/12/2015. and then their average.
sample data I have
required output:
required output
here is what I have done soo far....
df = pd.read_csv('daily_valuation_ratios_cc.csv')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date')
columns = ['pe', 'price_bv', 'mcap_ns', 'ev_ebidta']
df_mean = df.groupby('Company Name')[columns].resample('M').mean()
but this method is finding mean on daily basis and is showing result monthly, unlike my sample output.
i am new to pandas, pls help.
Edit:
df3 = df.groupby(['Company Name','year','month'])
df3.first()
this code works, now I just have one problem, to export dataframe to to_csv. pls help
A dataframe has a special function called groupby that selects a column, and can be aggregated.
So if you were to run, data.groupby('pe') you would get that column.
Now if you were to tack on .describe, you would get the standard deviation/mean/min/ect.
Example:
data.groupby('pe').describe()
Edit: You can also use built-in aggregate functions such as .max()/.mean()/ect. with groupby().
I have two dataframes with different index lengths to compare.
df1: daily low stock prices (this has one low price per day)
df2: daily purchases of stock (this has more than one buy per day)
I'd like to iterate through the rows of the price per date in df2 checking if df2[Price] > df1[low] and adding YES in df2[In_range] for that row if it is, and NO if it's not.
I've included a screen shot of the tables and simple diagram with the description do you can see.
Picture of tables with simple diagram
If you need more clarification please let me know :)
Thanks,
Elliot
The best way would be to add the column 'low' to the second dataframe.
df2 = df2.merge(df1, on = ['Company', 'time'])
And then performing the check would be simple in a single dataframe
df2['In range'] = df2['Price'] >= df2['low']