Pandas DataFrame Group and Rollup in one operation

Pandas DataFrame Group and Rollup in one operation - python

I have a Pandas DataFrame with two columns "close_time" of a trade (DateTime format) and the "net_profit" from that trade. I have shared some sample data below. I need to find the count of total trades and count of profitable trades by day. So, for example, the output would look like
+-----------------------------------------------------------+
| Close_day Total_Trades Total_Profitable_Trades |
+-----------------------------------------------------------+
| 2014-11-03 5 4 |
+-----------------------------------------------------------+
Can this be done using something like groupby? How?
+------------------------------------+
| close_time net_profit |
+------------------------------------+
| 0 2014-10-31 14:41:41 20.84 |
| 1 2014-11-03 10:50:59 238.74 |
| 2 2014-11-03 11:05:10 491.32 |
| 3 2014-11-03 12:31:06 55.87 |
| 4 2014-11-03 14:31:34 -402.29 |
| 5 2014-11-03 20:33:29 164.18 |
| 6 2014-11-04 16:30:24 -296.96 |
| 7 2014-11-04 23:59:21 281.86 |
| 8 2014-11-04 23:59:34 -296.37 |
| 9 2014-11-05 10:14:42 517.55 |
| 10 2014-11-05 20:38:49 350.35 |
| 11 2014-11-07 11:23:31 710.13 |
| 12 2014-11-07 11:23:38 137.55 |
| 13 2014-11-11 19:00:01 201.97 |
| 14 2014-11-11 19:00:15 -484.77 |
| 15 2014-11-12 23:41:04 -1346.71 |
| 16 2014-11-12 23:41:25 514.30 |
| 17 2014-11-13 18:55:34 103.34 |
| 18 2014-11-13 18:55:43 -180.37 |
| 19 2014-11-26 17:10:59 -1756.69 |
+------------------------------------+

Setup
Make sure that your close_time is datetime by using
df.close_time = pd.to_datetime(df.close_time)
You can use groupby and agg here:
out = (df.groupby(df.close_time.dt.date)
.net_profit.agg(['count', lambda x: x.gt(0).sum()])).astype(int)
out.columns = ['trades', 'profitible_trades']
trades profitible_trades
close_time
2014-10-31 1 1
2014-11-03 5 4
2014-11-04 3 1
2014-11-05 2 2
2014-11-07 2 2
2014-11-11 2 1
2014-11-12 2 1
2014-11-13 2 1
2014-11-26 1 0

Related

How to reindex a datetime-based multiindex in pandas

I have a dataframe that counts the number of times an event has occured per user per day. Users may have 0 events per day and (since the table is an aggregate from a raw event log) rows with 0 events are missing from the dataframe. I would like to add these missing rows and group the data by week so that each user has one entry per week (including 0 if applicable).
Here is an example of my input:
import numpy as np
import pandas as pd
np.random.seed(42)
df = pd.DataFrame({
"person_id": np.arange(3).repeat(5),
"date": pd.date_range("2022-01-01", "2022-01-15", freq="d"),
"event_count": np.random.randint(1, 7, 15),
})
# end of each week
# Note: week 2022-01-23 is not in df, but should be part of the result
desired_index = pd.to_datetime(["2022-01-02", "2022-01-09", "2022-01-16", "2022-01-23"])
df
| | person_id | date | event_count |
|---:|------------:|:--------------------|--------------:|
| 0 | 0 | 2022-01-01 00:00:00 | 4 |
| 1 | 0 | 2022-01-02 00:00:00 | 5 |
| 2 | 0 | 2022-01-03 00:00:00 | 3 |
| 3 | 0 | 2022-01-04 00:00:00 | 5 |
| 4 | 0 | 2022-01-05 00:00:00 | 5 |
| 5 | 1 | 2022-01-06 00:00:00 | 2 |
| 6 | 1 | 2022-01-07 00:00:00 | 3 |
| 7 | 1 | 2022-01-08 00:00:00 | 3 |
| 8 | 1 | 2022-01-09 00:00:00 | 3 |
| 9 | 1 | 2022-01-10 00:00:00 | 5 |
| 10 | 2 | 2022-01-11 00:00:00 | 4 |
| 11 | 2 | 2022-01-12 00:00:00 | 3 |
| 12 | 2 | 2022-01-13 00:00:00 | 6 |
| 13 | 2 | 2022-01-14 00:00:00 | 5 |
| 14 | 2 | 2022-01-15 00:00:00 | 2 |
This is how my desired result looks like:
| | person_id | level_1 | event_count |
|---:|------------:|:--------------------|--------------:|
| 0 | 0 | 2022-01-02 00:00:00 | 9 |
| 1 | 0 | 2022-01-09 00:00:00 | 13 |
| 2 | 0 | 2022-01-16 00:00:00 | 0 |
| 3 | 0 | 2022-01-23 00:00:00 | 0 |
| 4 | 1 | 2022-01-02 00:00:00 | 0 |
| 5 | 1 | 2022-01-09 00:00:00 | 11 |
| 6 | 1 | 2022-01-16 00:00:00 | 5 |
| 7 | 1 | 2022-01-23 00:00:00 | 0 |
| 8 | 2 | 2022-01-02 00:00:00 | 0 |
| 9 | 2 | 2022-01-09 00:00:00 | 0 |
| 10 | 2 | 2022-01-16 00:00:00 | 20 |
| 11 | 2 | 2022-01-23 00:00:00 | 0 |
I can produce it using:
(
df
.groupby(["person_id", pd.Grouper(key="date", freq="w")]).sum()
.groupby("person_id").apply(
lambda df: (
df
.reset_index(drop=True, level=0)
.reindex(desired_index, fill_value=0))
)
.reset_index()
)
However, according to the docs of reindex, I should be able to use it with level=1 as a kwarg directly and without having to do another groupby. However, when I do this I get an "inner join" of the two indices instead of an "outer join":
result = (
df
.groupby(["person_id", pd.Grouper(key="date", freq="w")]).sum()
.reindex(desired_index, level=1)
.reset_index()
)
| | person_id | date | event_count |
|---:|------------:|:--------------------|--------------:|
| 0 | 0 | 2022-01-02 00:00:00 | 9 |
| 1 | 0 | 2022-01-09 00:00:00 | 13 |
| 2 | 1 | 2022-01-09 00:00:00 | 11 |
| 3 | 1 | 2022-01-16 00:00:00 | 5 |
| 4 | 2 | 2022-01-16 00:00:00 | 20 |
Why is that, and how am I supposed to use df.reindex correctly?
I have found a similar SO question on reindexing a multi-index level, but the accepted answer there uses df.unstack, which doesn't work for me, because not every level of my desired index occurs in my current index (and vice versa).

You need reindex by both levels of MultiIndex:
mux = pd.MultiIndex.from_product([df['person_id'].unique(), desired_index],
names=['person_id','date'])
result = (
df
.groupby(["person_id", pd.Grouper(key="date", freq="w")]).sum()
.reindex(mux, fill_value=0)
.reset_index()
)
print (result)
person_id date event_count
0 0 2022-01-02 9
1 0 2022-01-09 13
2 0 2022-01-16 0
3 0 2022-01-23 0
4 1 2022-01-02 0
5 1 2022-01-09 11
6 1 2022-01-16 5
7 1 2022-01-23 0
8 2 2022-01-02 0
9 2 2022-01-09 0
10 2 2022-01-16 20
11 2 2022-01-23 0

How to build sequence of purchases for each ID?

I want to create a dataframe that shows me the sequence of what users purchasing according to the sequence column. For example this is my current df:
user_id | sequence | product | price
1 | 1 | A | 10
1 | 2 | C | 15
1 | 3 | G | 1
2 | 1 | B | 20
2 | 2 | T | 45
2 | 3 | A | 10
...
I want to convert it to the following format:
user_id | source_product | target_product | cum_total_price
1 | A | C | 25
1 | C | G | 16
2 | B | T | 65
2 | T | A | 75
...
How can I achieve this?

shift + cumsum + groupby.apply:
def seq(g):
g['source_product'] = g['product']
g['target_product'] = g['product'].shift(-1)
g['price'] = g.price.cumsum().shift(-1)
return g[['user_id', 'source_product', 'target_product', 'price']].iloc[:-1]
df.sort_values('sequence').groupby('user_id', group_keys=False).apply(seq)
# user_id source_product target_product price
#0 1 A C 25.0
#1 1 C G 26.0
#3 2 B T 65.0
#4 2 T A 75.0

groupby/eq compute mean of specific column

Im trying to figure out to how to use groupby/eq to computer the mean of specific column, i have a df as seen below (original df).
I would like to groupby 'Group' and 'players' with class equals 1 and get the mean of the 'score'.
example:
Group = a, players =2
(16+13+19)/3 = 16
+-------+---------+-------+-------+------------+
| Group | players | class | score | score_mean |
+-------+---------+-------+-------+------------+
| a | 2 | 2 | 14 | |
| a | 2 | 1 | 16 | 16 |
| a | 2 | 1 | 13 | 16 |
| a | 2 | 2 | 13 | |
| a | 2 | 1 | 19 | 16 |
| a | 2 | 2 | 17 | |
| a | 2 | 2 | 14 | |
+-------+---------+-------+-------+------------+
i've tried:
df['score_mean'] = df['class'].eq(1).groupby(['Group', 'players'])['score'].transform('mean')
but i kept getting "KeyError"
original df:
+----+-------+---------+-------+-------+
| | Group | players | class | score |
+----+-------+---------+-------+-------+
| 0 | a | 1 | 1 | 10 |
| 1 | c | 2 | 1 | 20 |
| 2 | a | 1 | 3 | 29 |
| 3 | c | 1 | 3 | 22 |
| 4 | a | 2 | 2 | 14 |
| 5 | b | 1 | 2 | 16 |
| 6 | a | 2 | 1 | 16 |
| 7 | b | 2 | 3 | 17 |
| 8 | c | 1 | 2 | 22 |
| 9 | b | 1 | 2 | 23 |
| 10 | c | 2 | 2 | 22 |
| 11 | d | 1 | 1 | 13 |
| 12 | a | 2 | 1 | 13 |
| 13 | d | 1 | 3 | 23 |
| 14 | a | 2 | 2 | 13 |
| 15 | d | 2 | 1 | 34 |
| 16 | b | 1 | 3 | 32 |
| 17 | c | 2 | 2 | 29 |
| 18 | b | 2 | 2 | 28 |
| 19 | a | 2 | 1 | 19 |
| 20 | a | 1 | 1 | 19 |
| 21 | c | 1 | 1 | 27 |
| 22 | b | 1 | 3 | 47 |
| 23 | a | 2 | 2 | 17 |
| 24 | c | 1 | 1 | 14 |
| 25 | c | 2 | 2 | 25 |
| 26 | a | 1 | 3 | 67 |
| 27 | b | 2 | 3 | 21 |
| 28 | a | 1 | 3 | 27 |
| 29 | c | 1 | 1 | 16 |
| 30 | a | 2 | 2 | 14 |
| 31 | b | 1 | 2 | 25 |
+----+-------+---------+-------+-------+
data = {'Group':['a','c','a','c','a','b','a','b','c','b','c','d','a','d','a','d',
'b','c','b','a','a','c','b','a','c','c','a','b','a','c','a','b'],
'players':[1,2,1,1,2,1,2,2,1,1,2,1,2,1,2,2,1,2,2,2,1,1,1,2,1,2,1,2,1,1,2,1],
'class':[1,1,3,3,2,2,1,3,2,2,2,1,1,3,2,1,3,2,2,1,1,1,3,2,1,2,3,3,3,1,2,2],
'score':[10,20,29,22,14,16,16,17,22,23,22,13,13,23,13,34,32,29,28,19,19,27,47,17,14,25,67,21,27,16,14,25]}
df = pd.DataFrame(data)
kindly advice
Many thanks & Regards

Try:
Via set_index(),groupby() ,assign() and reset_index() method:
df=(df.set_index(['Group','players'])
.assign(score_mean=df[df['class'].eq(1)].groupby(['Group', 'players'])['score'].mean())
.reset_index())
Update:
If you want the first df as your output then use:
grouped=df.groupby(['Group', 'players','class']).transform('mean')
grouped=grouped.assign(players=df['players'],Group=df['Group'],Class=df['class']).where(df['Group']=='a').dropna()
grouped['score']=grouped.apply(lambda x:float('NaN') if x['players']==1 else x['score'],1)
grouped=grouped.dropna(subset=['score'])
Now if you print grouped you will get your desired output

If I got you right, need values returned only where class=1. Not sure what that will serve but code below. Use groupby transform and chain where
df['score_mean']=df.groupby(['Group','players'])['score'].transform('mean').where(df['class']==1).fillna('')
Group players class score score_mean
0 a 1 1 10 10
1 a 2 1 20 20
2 a 3 5 29
3 a 4 5 22
4 a 5 5 14
5 b 1 7 16
6 b 2 7 16
7 b 3 7 17
8 c 1 4 22
9 c 2 2 23
10 c 3 2 22
11 d 1 4 13
12 d 2 4 13
13 d 3 3 23
14 d 4 8 13
15 d 5 7 34
16 e 1 7 32
17 e 2 2 29
18 e 3 2 28
19 e 4 1 19 19
20 e 5 1 19 19
21 e 6 1 27 27
22 f 1 5 47
23 f 2 5 17
24 f 3 7 14
25 f 4 7 25
26 g 1 3 67
27 g 2 3 21
28 g 3 3 27
29 g 4 8 16
30 g 5 8 14
31 g 6 8 25

You could first filter by class and then create score_mean by doing a groupby and transform.
(
df[df['class']==1]
.assign(score_mean = lambda x: x.groupby(['Group', 'players']).score.transform('mean'))
)
Group players class score score_mean
0 a 1 1 10 14.5
1 c 2 1 20 20.0
6 a 2 1 16 16.0
11 d 1 1 13 13.0
12 a 2 1 13 16.0
15 d 2 1 34 34.0
19 a 2 1 19 16.0
20 a 1 1 19 14.5
21 c 1 1 27 19.0
24 c 1 1 14 19.0
29 c 1 1 16 19.0
If you want to keep other classes and set the mean to '', you can do:
(
df[df['class']==1]
.groupby(['Group', 'players']).score.transform('mean')
.pipe(lambda x: df.assign(score_mean = x))
.fillna('')
)

Pandasql with conditions

I have two dataframes:
First one i have student information. I will call it df1
user_id | plan | subplan | matrix_code | student_semester
102532 | GADMSSP | GSP10 | 1501 | 8
106040 | GRINTSP | | 1901 | 4
106114 | GCSOSSULA | | 1901 | 4
106504 | GCSOSSP | | 1902 | 3
106664 | GCINESP | | 1901 | 4
Second one I have the requirements of electives for an institution. I will call it df2.
plan | subplan | matrix_code | semester | credits| cumulative_credits
GADMSSP | | 1501 | 5 | 4 | 4
GADMSSP | | 1501 | 6 | 4 | 8
GADMSSP | | 1501 | 7 | 4 | 12
GADMSSP | | 1501 | 8 | 0 | 12
GRINTSP | | 1901 | 7 | 2 | 2
GRINTSP | | 1901 | 8 | 0 | 2
GCSOSSULA | | 1901 | 3 | 4 | 4
GCSOSSULA | | 1901 | 4 | 0 | 4
GCSOSSULA | | 1901 | 5 | 0 | 4
GCSOSSULA | GSUL5 | 1901 | 5 | 4 | 8
GCSOSSULA | | 1901 | 6 | 0 | 4
GCSOSSULA | GSUL5 | 1901 | 6 | 0 | 8
GCSOSSULA | | 1901 | 7 | 0 | 4
GCSOSSULA | GSUL5 | 1901 | 7 | 0 | 8
GCSOSSULA | | 1901 | 8 | 0 | 4
GCSOSSULA | GSUL5 | 1901 | 8 | 0 | 8
GCSOSSP | | 1902 | 5 | 4 | 4
GCSOSSP | | 1902 | 6 | 4 | 8
GCSOSSP | | 1902 | 7 | 4 | 12
GCSOSSP | | 1902 | 8 | 0 | 12
GCINESP | | 1901 | 2 | 4 | 4
GCINESP | | 1901 | 3 | 4 | 8
GCINESP | | 1901 | 4 | 4 | 12
GCINESP | | 1901 | 5 | 4 | 16
GCINESP | | 1901 | 6 | 4 | 24
GCINESP | | 1901 | 7 | 4 | 32
GCINESP | | 1901 | 8 | 4 | 40
So i have to merge the df considering some conditions:
plan and matrix_code must be the same for df1 and df2.
df1.subplan is either the same of df2.subplan or it can be null. So user_id 102532 in line 1 of df1 will get the requirements of df2.subplan null, since there is no indication of specific subplan requirements for this plan and matrix_code.
Get student_semester +1, but considering max df2.semester as the limit of student_semester. So user_id 102532 in line 1 must remain in semester 8. This one I cannot add +1 semester, but i would like to indicate that it is a user that did not reach the requirements in the last semester.
I am only interested in cumulative_credits.
For this two dfs the result should be something like this:
user_id | plan | subplan | matrix_code | semester | student_semester | cumulative_credits
102532 | GADMSSP | GSP10 | 1501 | 8 | 9 | 12
106040 | GRINTSP | | 1901 | 5 | 4 | 0
106114 | GCSOSSULA | | 1901 | 5 | 4 | 4
106504 | GCSOSSP | | 1902 | 4 | 3 | 0
106664 | GCINESP | | 1901 | 5 | 4 | 16
But if there is no possible way to get the students with 0 cumulative_credits, the result should be:
user_id | plan | subplan | matrix_code | semester | student_semester | cumulative_credits
102532 | GADMSSP | GSP10 | 1501 | 8 | 9 | 12
106114 | GCSOSSULA | | 1901 | 5 | 4 | 4
106664 | GCINESP | | 1901 | 5 | 4 | 16
What i did untill now is the following:
pip install -U pandasql
import pandas as pd
pysqldf = lambda q: sqldf(q, globals())
df2 = df2.groupby(['plan', 'subplan', 'matrix_code', 'semester']).cumulative_credits.max()
df2 = df2.to_frame()
df2 = df2.reset_index()
electives = """
SELECT user_id
,a.plan
,a.subplan as "student_subplan"
,a.matrix_code
,a.student_semester
,b.subplan as "matrix_subplan"
,b.semester
,cumulative_credits
FROM df1 a
LEFT JOIN df2 b
ON a.plan = b.plan
AND a.matrix_code = b.matrix_code
WHERE (b.subplan = '' OR a.subplan = b.subplan)
"""
electives = pysqldf(electives)
Then i was trying to get the 3rd condition, but I have no clue in the right way to do this. I think i could use a lambda but I am not sure how.
df_s['semester_x'] = df_s['student_semester'] +1 | df_s['student_semester'] == df_s['semester'].max()
Also, if there is a better way to do the previous conditions steps using a merge with a condition, it could be nice.
EDIT - SOLUTION:
I used part of Parfait's solution. I just made a conditional logic to get the cumulative credits of student next semester instead of max cumulative credits of matrix code.
Here is what I've done:
First part - Parfait's solution:
agg = (pd.merge(df1, df2, on=['plano', 'matriz'], suffixes=["", "_"])
.fillna('')
.query("(subplano == '') | (subplano_aluno == subplano)")
.rename({'subplano':'subplano_matriz', 'semestre_': 'semestre_matriz', 'semestre': 'semestre_aluno'}, axis='columns')
Second part:
y = """
with a as
(
SELECT DISTINCT plan
,CASE
WHEN plan LIKE '%SULB%' OR plano LIKE '%SULC%' THEN 10
WHEN plan LIKE '%SULD%' OR plano LIKE '%SULE%' THEN 12
ELSE 8
END as "semester_max"
FROM agg
)
SELECT DISTINCT
user_id
,student_semester
,plan
,student_subplan
,matrix_code
,matrix_subplan
,cumulative_credits
,matrix_semester
,semester_max
,CASE
WHEN student_semester < semester_max THEN (student_semester)+1
WHEN student_semester = semester_max THEN student_semester
END as "next_semester"
FROM
(
SELECT DISTINCT
user_id
,student_semester
,b.plan
,student_subplan
,matrix_code
,matrix_subplan
,cumulative_credits
,matrix_semester
,semester_max
FROM agg b
INNER JOIN a ON b.plano = a.plano
) x
WHERE matrix_semester = next_semester
"""
z = pysqldf(x)

Consider adding a CASE statement in SQL query:
SELECT d1.user_id
, d1.plan
, d1.subplan AS student_subplan
, d1.matrix_code
, d1.student_semester
, d2.subplan AS matrix_subplan
, CASE
WHEN d1.student_semester = MAX(d2.semester)
THEN d1.student_semester
ELSE d1.student_semester + 1
END AS semester
, MAX(d2.cumulative_credits) AS cumulative_credits
FROM df1 d1
LEFT JOIN df2 d2
ON d1.plan = d2.plan
AND d1.matrix_code = d2.matrix_code
WHERE (d2.subplan IS NULL OR d1.subplan = d2.subplan)
GROUP BY d1.user_id
, d1.plan
, d1.subplan
, d1.matrix_code
, d1.student_semester
, d2.subplan;
Online Demo
In Pandas, translation would use merge + groupby + Series.where for case conditional logic:
# MERGE
agg = (pd.merge(df1, df2, on=['plan', 'matrix_code'], suffixes=["", "_"])
.fillna('')
.query("(subplan_ == '') | (subplan == subplan_)")
.rename({'subplan':'student_subplan', 'subplan_':'matrix_subplan'}, axis='columns')
)
# AGGRGEATION
agg = (agg.groupby(['user_id', 'plan', 'student_subplan', 'matrix_code',
'student_semester', 'matrix_subplan'], as_index=False)
.agg({'semester':'max', 'cumulative_credits':'max'})
)
# CONDITIONAL LOGIC
agg['semester'] = agg['student_semester'].where(agg['semester'] == agg['student_semester'],
agg['student_semester'].add(1))
agg
# user_id plan student_subplan matrix_code student_semester matrix_subplan semester cumulative_credits
# 0 102532 GADMSSP GSP10 1501 8 8 12
# 1 106040 GRINTSP 1901 4 5 2
# 2 106114 GCSOSSULA 1901 4 5 4
# 3 106504 GCSOSSP 1902 3 4 12
# 4 106664 GCINESP 1901 4 5 40

Split a column and combine rows where there are multiple data measures

I'm trying to use python to solve my data analysis problem.
I have a table like this:
+----------+-----+------+--------+-------------+--------------+
| ID | QTR | Year | MEF_ID | Qtr_Measure | Value_column |
+----------+-----+------+--------+-------------+--------------+
| 11 | 1 | 2020 | Name1 | QTRAVG | 5 |
| 11 | 2 | 2020 | Name1 | QTRAVG | 8 |
| 11 | 3 | 2020 | Name1 | QTRAVG | 6 |
| 11 | 4 | 2020 | Name1 | QTRAVG | 9 |
| 15 | 1 | 2020 | Name2 | QTRAVG | 67 |
| 15 | 2 | 2020 | Name2 | QTRAVG | 89 |
| 15 | 3 | 2020 | Name2 | QTRAVG | 100 |
| 15 | 4 | 2020 | Name2 | QTRAVG | 121 |
| 11 | 1 | 2020 | Name1 | QTRMAX | 6 |
| 11 | 2 | 2020 | Name1 | QTRMAX | 9 |
| 11 | 3 | 2020 | Name1 | QTRMAX | 7 |
| 11 | 4 | 2020 | Name1 | QTRMAX | 10 |
+----------+-----+------+--------+-------------+--------------+
I want to arrange the Value_column in a way that can capture when there is multiple Qtr_measures for unique IDs and MEF_IDs. When doing this, the overall size of the table will be reduced and I would like to have columns replacing Qtr_Measures with the type as below:
+----------+-----+------+--------+-------------+--------+--------+
| ID | QTR | Year | MEF_ID | Qtr_Measure | QTRAVG | QTRMAX |
+----------+-----+------+--------+-------------+--------+--------+
| 11 | 1 | 2020 | Name1 | QTRAVG | 5 | 6 |
| 11 | 2 | 2020 | Name1 | QTRAVG | 8 | 9 |
| 11 | 3 | 2020 | Name1 | QTRAVG | 6 | 7 |
| 11 | 4 | 2020 | Name1 | QTRAVG | 9 | 10 |
| 15 | 1 | 2020 | Name2 | QTRAVG | 67 | |
| 15 | 2 | 2020 | Name2 | QTRAVG | 89 | |
| 15 | 3 | 2020 | Name2 | QTRAVG | 100 | |
| 15 | 4 | 2020 | Name2 | QTRAVG | 121 | |
+----------+-----+------+--------+-------------+--------+--------+
How can I do this with python?
Thank you

Use pivot_table with reset_index and rename_axis:
piv = (df.pivot_table(index=['ID', 'QTR', 'Year', 'MEF_ID'],
values='Value_column',
columns='Qtr_Measure')
.reset_index()
.rename_axis(None, axis=1)
)
print(piv)
ID QTR Year MEF_ID QTRAVG QTRMAX
0 11 1 2020 Name1 5.0 6.0
1 11 2 2020 Name1 8.0 9.0
2 11 3 2020 Name1 6.0 7.0
3 11 4 2020 Name1 9.0 10.0
4 15 1 2020 Name2 67.0 NaN
5 15 2 2020 Name2 89.0 NaN
6 15 3 2020 Name2 100.0 NaN
7 15 4 2020 Name2 121.0 NaN

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas DataFrame Group and Rollup in one operation - python

Related

How to reindex a datetime-based multiindex in pandas

How to build sequence of purchases for each ID?

groupby/eq compute mean of specific column

Pandasql with conditions

Split a column and combine rows where there are multiple data measures

Categories

Resources