Add data based on quartiles - python

df
price vol date
0 2 4 03-04-2020
1 4 24 03-04-2020
2 5 10 03-04-2020
How could I add a calculate the average price if vol is above the 75th or below the 25 percentile for volumes that month?
I tried:
df.loc[df['avg_price_by_day_and_quartile'] = df[(df['avg_price_by_day_and_quartile'] > vol.quantile(.75) & <vol.quantile(.25) ).groupby(date')['quartile'].transform('mea‌​n')
Expected Output:
price vol date quartile avg_price_by_day_and_quartile
0 2 4 03-04-2020 below 2
1 4 24 03-04-2020 above 4
2 5 10 03-04-2020

Related

How to write to the cell below level 0 of a multiindex

I've got a df with a MultiIndex like so
nums = np.arange(5)
key = ['kfc'] * 5
mi = pd.MultiIndex.from_arrays([key,nums])
df = pd.DataFrame({'rent': np.arange(10,60,10)})
df.set_index(mi)
rent
kfc 0 10
1 20
2 30
3 40
4 50
How can I write to the cell below kfc, I want to add meta info e.g. The address or the monthly rent
rent
kfc 0 10
NYC 1 20
2 30
3 40
4 50
According to your expected output you would need to recreate the df MultiIndex:
df.index = pd.MultiIndex.from_tuples(zip(['kfc'] + ['NYC'] * 4, df.index.levels[1]))
print(df)
rent
kfc 0 10
NYC 1 20
2 30
3 40
4 50

How to merge dataframe between dates

I have one dataframe data contains daily data of sales (DF).
I have another dataframe that contains quarterly data (DF1).
This is what the quarterly dataframe looks like DF1.
Date Computer Sale In Person Sales Net Sales
1/29/2021 1 2 3
4/30/2021 2 4 6
7/29/2021 3 6 9
1/29/2022 4 8 12
5/1/2022 5 10 15
7/30/2022 6 12 18
This is what the daily Data frame looks like: DF
Date Num of people
1 / 30 / 2021 45
1 / 31 / 2021 35
2 / 1 / 2021 25
5 / 1 / 2021 20
5 / 2 / 2021 15
I have columns Computer Sales, In Person Sales, Net Sales in the quarterly dataframe.
How to I merge the columns from above to the daily dataframe so that I can see on the daily dataframe the quarterly data. I want the final result to look like this
Date Num of people Computer Sale In Person Sales Net Sales
1/30/2021 45 1 2 3
1/31/2021 35 1 2 3
2/1/2021 25 1 2 3
5/1/2021 20 2 4 6
5/2/2021 15 2 4 6
So, for example. I want 1/30/2021 to be the figure that is 1/29/2021 and once the daily data goes past 4/30/2021 then merge the new quarterly Data.
Please let me know if I need to be more specific.
A possible solution:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
pd.merge_asof(df2, df1, on='Date', direction='backward')
Output:
Date Num of people Computer Sale In Person Sales Net Sales
0 2021-01-30 45 1 2 3
1 2021-01-31 35 1 2 3
2 2021-02-01 25 1 2 3
3 2021-05-01 20 2 4 6
4 2021-05-02 15 2 4 6

Getting Rolling Sum per Group

I have a dataframe like this:
Product_ID Quantity Year Quarter
1 100 2021 1
1 100 2021 2
1 50 2021 3
1 100 2021 4
1 100 2022 1
2 100 2021 1
2 100 2021 2
3 100 2021 1
3 100 2021 2
I would like to get the Sum of the last three months (excluding the current month), per Product_ID.
Therefore I tried this:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID'['Quantity'].shift(1,fill_value=0)
.rolling(3).sum().reset_index(0,drop=True)
)
# Shifting 1, because I want to exclude the current row.
# Rolling 3, because I want to have the 3 'rows' before
# Grouping by, because I want to have the calculation PER product
My code is failing, because it does not only calculate it per product, but it will give me also numbers for other products (let's say Product 2, quarter 1: gives me the 3 rows from product 1).
My proposed outcome:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
1 100 2021 1 0 # because we dont historical data for this id
1 100 2021 2 100 # sum of last month of this product
1 50 2021 3 200 # sum of last 2 months of this product
1 100 2021 4 250 # sum of last 3 months of this product
1 100 2022 1 250 # sum of last 3 months of this product
2 100 2021 1 0 # because we dont have hist data for this id
2 100 2021 2 100 # sum of last month of this product
3 100 2021 1 0 # etc
3 100 2021 2 100 # etc
You need to apply the rolling sum per group, you can use apply for this:
df['Qty_Sum_3qrts'] = (df.groupby('Product_ID')['Quantity']
.apply(lambda s: s.shift(1,fill_value=0)
.rolling(3, min_periods=1).sum())
)
output:
Product_ID Quantity Year Quarter Qty_Sum_3qrts
0 1 100 2021 1 0.0
1 1 100 2021 2 100.0
2 1 50 2021 3 200.0
3 1 100 2021 4 250.0
4 1 100 2022 1 250.0
5 2 100 2021 1 0.0
6 2 100 2021 2 100.0
7 3 100 2021 1 0.0
8 3 100 2021 2 100.0

Pandas: Group by and conditional sum based on value of current row

My dataframe looks like this:
customer_nr
order_value
year_ordered
payment_successful
1
50
1980
1
1
75
2017
0
1
10
2020
1
2
55
2000
1
2
300
2007
1
2
15
2010
0
I want to know the total amount a customer has successfully paid in the years before, for a specific order.
The expected output is as follows:
customer_nr
order_value
year_ordered
payment_successful
total_successfully_previously_paid
1
50
1980
1
0
1
75
2017
0
50
1
10
2020
1
50
2
55
2000
1
0
2
300
2007
1
55
2
15
2010
0
355
Closest i've gotten is this:
df.groupby(['customer_nr', 'payment_successful'], as_index=False)['order_value'].sum()
That just gives me the summed amount successfully and unsuccessfully paid all time per customer. It doesn't account for selecting only previous orders to participate in the sum.
Try:
df["total_successfully_previously_paid"] = (df["payment_successful"].mul(df["order_value"])
.groupby(df["customer_nr"])
.transform(lambda x: x.cumsum().shift().fillna(0))
)
>>> df
customer_nr ... total_successfully_previously_paid
0 1 ... 0.0
1 1 ... 50.0
2 1 ... 50.0
3 2 ... 0.0
4 2 ... 55.0
5 2 ... 355.0
[6 rows x 5 columns]

Python Dataframe-How to groupby three different columns consisting Year, Month, Day data and calculate sum from fourth column

My dataframe is given below:
input_df =
index Year Month Day Hour Minute GHI
0 2017 1 1 7 30 100
1 2017 1 1 8 30 200
2 2017 1 2 9 30 300
3 2017 1 2 10 30 400
4 2017 2 1 11 30 500
5 2017 2 1 12 30 600
6 2017 2 2 13 30 700
I want to sum each day GHI data. From above I am expecting an output like below:
result_df =
index Year Month Day GHI
0 2017 1 1 300
1 2017 1 2 700
2 2017 2 1 1100
3 2017 2 2 700
My code and my present output is:
result_df = input_df.groupby(['Year','Month','Day'])['GHI'].sum()
print(result_df)
result_df =
index Year Month Day GHI
0 2017 1 1 1400
1 2017 2 2 1400
My above code is combining first day in each month and summing the data. But it is wrong. How to overcome it?
You are incredibly close in your attempt. The thing to bear in mind is that pd.groupby() has a parameter as_index with default value True. Therefore your groupby() outputs a multi-index data frame. To get the desired output you can either chain the reset_index() method after the groupby or change the value of the as_index parameter to False.
result_df = input_df.groupby(['Year','Month','Day'])['GHI'].sum()
result_df
Out[12]:
Year Month Day
2017 1 1 300
2 700
2 1 1100
2 700
Name: GHI, dtype: int64
# Getting the desired output
input_df.groupby(['Year','Month','Day'])['GHI'].sum().reset_index()
Out[16]:
Year Month Day GHI
0 2017 1 1 300
1 2017 1 2 700
2 2017 2 1 1100
3 2017 2 2 700
input_df.groupby(['Year','Month','Day'], as_index=False)['GHI'].sum()
Out[17]:
Year Month Day GHI
0 2017 1 1 300
1 2017 1 2 700
2 2017 2 1 1100
3 2017 2 2 700

Categories