Maybe groupby is the wrong approach. Seems like it should work but I'm not seeing it...
I want to group an event by it's outcome. Here is my DataFrame (df):
Status Event
SUCCESS Run
SUCCESS Walk
SUCCESS Run
FAILED Walk
Here is my desired result:
Event SUCCESS FAILED
Run 2 1
Walk 0 1
I'm trying to make a grouped object but I can't figure out how to call it to display what I want.
grouped = df['Status'].groupby(df['Event'])
try this:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
I'd do:
df.groupby('Event').Status.value_counts().unstack().fillna(0)
Or use the fill_value argument:
df.groupby('Event').Status.value_counts().unstack(fill_value=0)
Timing
An alternative solution, using pivot_table() method:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
Timing against 700K DF:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
Conclusion: the #ayhan's solution currently wins:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
Here's a NumPy based approach -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
Sample input, output on a more generic case -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1
Related
Maybe groupby is the wrong approach. Seems like it should work but I'm not seeing it...
I want to group an event by it's outcome. Here is my DataFrame (df):
Status Event
SUCCESS Run
SUCCESS Walk
SUCCESS Run
FAILED Walk
Here is my desired result:
Event SUCCESS FAILED
Run 2 1
Walk 0 1
I'm trying to make a grouped object but I can't figure out how to call it to display what I want.
grouped = df['Status'].groupby(df['Event'])
try this:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
I'd do:
df.groupby('Event').Status.value_counts().unstack().fillna(0)
Or use the fill_value argument:
df.groupby('Event').Status.value_counts().unstack(fill_value=0)
Timing
An alternative solution, using pivot_table() method:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
Timing against 700K DF:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
Conclusion: the #ayhan's solution currently wins:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
Here's a NumPy based approach -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
Sample input, output on a more generic case -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1
Maybe groupby is the wrong approach. Seems like it should work but I'm not seeing it...
I want to group an event by it's outcome. Here is my DataFrame (df):
Status Event
SUCCESS Run
SUCCESS Walk
SUCCESS Run
FAILED Walk
Here is my desired result:
Event SUCCESS FAILED
Run 2 1
Walk 0 1
I'm trying to make a grouped object but I can't figure out how to call it to display what I want.
grouped = df['Status'].groupby(df['Event'])
try this:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
I'd do:
df.groupby('Event').Status.value_counts().unstack().fillna(0)
Or use the fill_value argument:
df.groupby('Event').Status.value_counts().unstack(fill_value=0)
Timing
An alternative solution, using pivot_table() method:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
Timing against 700K DF:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
Conclusion: the #ayhan's solution currently wins:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
Here's a NumPy based approach -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
Sample input, output on a more generic case -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1
I have the following dataframe:
df = pd.DataFrame([
(1, 1, 'term1'),
(1, 2, 'term2'),
(1, 1, 'term1'),
(1, 1, 'term2'),
(2, 2, 'term3'),
(2, 3, 'term1'),
(2, 2, 'term1')
], columns=['id', 'group', 'term'])
I want to group it by id and group and calculate the number of each term for this id, group pair.
So in the end I am going to get something like this:
I was able to achieve what I want by looping over all the rows with df.iterrows() and creating a new dataframe, but this is clearly inefficient. (If it helps, I know the list of all terms beforehand and there are ~10 of them).
It looks like I have to group by and then count values, so I tried that with df.groupby(['id', 'group']).value_counts() which does not work because value_counts operates on the groupby series and not a dataframe.
Anyway I can achieve this without looping?
I use groupby and size
df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
Timing
1,000,000 rows
df = pd.DataFrame(dict(id=np.random.choice(100, 1000000),
group=np.random.choice(20, 1000000),
term=np.random.choice(10, 1000000)))
using pivot_table() method:
In [22]: df.pivot_table(index=['id','group'], columns='term', aggfunc='size', fill_value=0)
Out[22]:
term term1 term2 term3
id group
1 1 2 1 0
2 0 1 0
2 2 1 0 1
3 1 0 0
Timing against 700K rows DF:
In [24]: df = pd.concat([df] * 10**5, ignore_index=True)
In [25]: df.shape
Out[25]: (700000, 3)
In [3]: %timeit df.groupby(['id', 'group', 'term'])['term'].size().unstack(fill_value=0)
1 loop, best of 3: 226 ms per loop
In [4]: %timeit df.pivot_table(index=['id','group'], columns='term', aggfunc='size', fill_value=0)
1 loop, best of 3: 236 ms per loop
In [5]: %timeit pd.crosstab([df.id, df.group], df.term)
1 loop, best of 3: 355 ms per loop
In [6]: %timeit df.groupby(['id','group','term'])['term'].size().unstack().fillna(0).astype(int)
1 loop, best of 3: 232 ms per loop
In [7]: %timeit df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
1 loop, best of 3: 231 ms per loop
Timing against 7M rows DF:
In [9]: df = pd.concat([df] * 10, ignore_index=True)
In [10]: df.shape
Out[10]: (7000000, 3)
In [11]: %timeit df.groupby(['id', 'group', 'term'])['term'].size().unstack(fill_value=0)
1 loop, best of 3: 2.27 s per loop
In [12]: %timeit df.pivot_table(index=['id','group'], columns='term', aggfunc='size', fill_value=0)
1 loop, best of 3: 2.3 s per loop
In [13]: %timeit pd.crosstab([df.id, df.group], df.term)
1 loop, best of 3: 3.37 s per loop
In [14]: %timeit df.groupby(['id','group','term'])['term'].size().unstack().fillna(0).astype(int)
1 loop, best of 3: 2.28 s per loop
In [15]: %timeit df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
1 loop, best of 3: 1.89 s per loop
Instead of remembering lengthy solutions, how about the one that pandas has built in for you:
df.groupby(['id', 'group', 'term']).count()
You can use crosstab:
print (pd.crosstab([df.id, df.group], df.term))
term term1 term2 term3
id group
1 1 2 1 0
2 0 1 0
2 2 1 0 1
3 1 0 0
Another solution with groupby with aggregating size, reshaping by unstack:
df.groupby(['id', 'group', 'term'])['term'].size().unstack(fill_value=0)
term term1 term2 term3
id group
1 1 2 1 0
2 0 1 0
2 2 1 0 1
3 1 0 0
Timings:
df = pd.concat([df]*10000).reset_index(drop=True)
In [48]: %timeit (df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0))
100 loops, best of 3: 12.4 ms per loop
In [49]: %timeit (df.groupby(['id', 'group', 'term'])['term'].size().unstack(fill_value=0))
100 loops, best of 3: 12.2 ms per loop
If you want to use value_counts you can use it on a given series, and resort to the following:
df.groupby(["id", "group"])["term"].value_counts().unstack(fill_value=0)
or in an equivalent fashion, using the .agg method:
df.groupby(["id", "group"]).agg({"term": "value_counts"}).unstack(fill_value=0)
Another option is to directly use value_counts on the DataFrame itself without resorting to groupby:
df.value_counts().unstack(fill_value=0)
Another alternative:
df.assign(count=1).groupby(['id', 'group','term']).sum().unstack(fill_value=0).xs("count", 1)
term term1 term2 term3
id group
1 1 2 1 0
2 0 1 0
2 2 1 0 1
3 1 0 0
Maybe groupby is the wrong approach. Seems like it should work but I'm not seeing it...
I want to group an event by it's outcome. Here is my DataFrame (df):
Status Event
SUCCESS Run
SUCCESS Walk
SUCCESS Run
FAILED Walk
Here is my desired result:
Event SUCCESS FAILED
Run 2 1
Walk 0 1
I'm trying to make a grouped object but I can't figure out how to call it to display what I want.
grouped = df['Status'].groupby(df['Event'])
try this:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
I'd do:
df.groupby('Event').Status.value_counts().unstack().fillna(0)
Or use the fill_value argument:
df.groupby('Event').Status.value_counts().unstack(fill_value=0)
Timing
An alternative solution, using pivot_table() method:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
Timing against 700K DF:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
Conclusion: the #ayhan's solution currently wins:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
Here's a NumPy based approach -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
Sample input, output on a more generic case -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1
Maybe groupby is the wrong approach. Seems like it should work but I'm not seeing it...
I want to group an event by it's outcome. Here is my DataFrame (df):
Status Event
SUCCESS Run
SUCCESS Walk
SUCCESS Run
FAILED Walk
Here is my desired result:
Event SUCCESS FAILED
Run 2 1
Walk 0 1
I'm trying to make a grouped object but I can't figure out how to call it to display what I want.
grouped = df['Status'].groupby(df['Event'])
try this:
pd.crosstab(df.Event, df.Status)
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
len("df.groupby('Event').Status.value_counts().unstack().fillna(0)")
61
len("df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)")
74
len("pd.crosstab(df.Event, df.Status)")
32
I'd do:
df.groupby('Event').Status.value_counts().unstack().fillna(0)
Or use the fill_value argument:
df.groupby('Event').Status.value_counts().unstack(fill_value=0)
Timing
An alternative solution, using pivot_table() method:
In [5]: df.pivot_table(index='Event', columns='Status', aggfunc=len, fill_value=0)
Out[5]:
Status FAILED SUCCESS
Event
Run 0 2
Walk 1 1
Timing against 700K DF:
In [74]: df.shape
Out[74]: (700000, 2)
In [75]: # (c) Merlin
In [76]: %%timeit
....: pd.crosstab(df.Event, df.Status)
....:
1 loop, best of 3: 333 ms per loop
In [77]: # (c) piRSquared
In [78]: %%timeit
....: df.groupby('Event').Status.value_counts().unstack().fillna(0)
....:
1 loop, best of 3: 325 ms per loop
In [79]: # (c) MaxU
In [80]: %%timeit
....: df.pivot_table(index='Event', columns='Status',
....: aggfunc=len, fill_value=0)
....:
1 loop, best of 3: 367 ms per loop
In [81]: # (c) ayhan
In [82]: %%timeit
....: (df.assign(ones = np.ones(len(df)))
....: .pivot_table(index='Event', columns='Status',
....: aggfunc=np.sum, values = 'ones')
....: )
....:
1 loop, best of 3: 264 ms per loop
In [83]: # (c) Divakar
In [84]: %%timeit
....: unq1,ID1 = np.unique(df['Event'],return_inverse=True)
....: unq2,ID2 = np.unique(df['Status'],return_inverse=True)
....: # Get linear indices/tags corresponding to grouped headers
....: tag = ID1*(ID2.max()+1) + ID2
....: # Setup 2D Numpy array equivalent of expected Dataframe
....: out = np.zeros((len(unq1),len(unq2)),dtype=int)
....: unqID, count = np.unique(tag,return_counts=True)
....: np.put(out,unqID,count)
....: # Finally convert to Dataframe
....: df_out = pd.DataFrame(out,columns=unq2)
....: df_out.index = unq1
....:
1 loop, best of 3: 2.25 s per loop
Conclusion: the #ayhan's solution currently wins:
(df.assign(ones = np.ones(len(df)))
.pivot_table(index='Event', columns='Status', values = 'ones',
aggfunc=np.sum, fill_value=0)
)
Here's a NumPy based approach -
# Get unique header strings for input dataframes
unq1,ID1 = np.unique(df['Event'],return_inverse=True)
unq2,ID2 = np.unique(df['Status'],return_inverse=True)
# Get linear indices/tags corresponding to grouped headers
tag = ID1*(ID2.max()+1) + ID2
# Setup 2D Numpy array equivalent of expected Dataframe
out = np.zeros((len(unq1),len(unq2)),dtype=int)
unqID, count = np.unique(tag,return_counts=True)
np.put(out,unqID,count)
# Finally convert to Dataframe
df_out = pd.DataFrame(out,columns=unq2)
df_out.index = unq1
Sample input, output on a more generic case -
In [179]: df
Out[179]:
Event Status
0 Sit PASS
1 Run SUCCESS
2 Walk SUCCESS
3 Run PASS
4 Run SUCCESS
5 Walk FAILED
6 Walk PASS
In [180]: df_out
Out[180]:
FAILED PASS SUCCESS
Run 0 1 2
Sit 0 1 0
Walk 1 1 1