Unroll a matrix in Pandas

Unroll a matrix in Pandas - python

I've got a matrix like this:
df = pd.DataFrame({'a':[7, 0, 3], 'b':[0, 4, 2], 'c':[3, 2, 9]})
df.index = list(df)
df
a b c
a 7 0 3
b 0 4 2
c 3 2 9
And I'd like to get something like this:
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9
For which I've written the following code:
vv = pd.DataFrame(columns=['C1', 'C2', 'V'])
i = 0
for cat1 in df.index:
for cat2 in df.index:
vv.loc[i] = [cat1, cat2, d[cat1][cat2]]
i += 1
vv['V'] = vv['V'].astype(int)
Is there a better/faster/more elegant way of doing this?

In [90]: df = df.stack().reset_index()
In [91]: df.columns = ['C1', 'C2', 'v']
In [92]: df
Out[92]:
C1 C2 v
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

YOu can use the stack() method followed by resetting the index and renaming the columns.
df = pd.DataFrame({'a':[7, 0, 3], 'b':[0, 4, 2], 'c':[3, 2, 9]})
df.index = list(df)
result = df.stack().reset_index().rename(columns={'level_0':'C1', 'level_1':'C2',0:'V'})
print(result)
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

Use:
df = (df.rename_axis('C2')
.reset_index()
.melt('C2', var_name='C1', value_name='V')
.reindex(columns=['C1','C2','V']))
print (df)
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

You can use stack:
df.stack()
a a 7
b 0
c 3
b a 0
b 4
c 2
c a 3
b 2
c 9
dtype: int64
The pd.set_option('display.multi_sparse', False) will desparsen the series, showing the values in every row
Additionally, with proper renaming in a pipeline
df.stack()
.reset_index()
.rename(columns={'level_0': 'C1', 'level_1': 'C2', 0:'V'})
yields:
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

To complete the answer and get the same output, I've added the following code:
vv = df.stack().reset_index()
vv.columns = ['C1', 'C2', 'V']

Related

Pandas generate numeric sequence for groups in new column

I am working on a data frame as below,
import pandas as pd
df=pd.DataFrame({'A':['A','A','A','B','B','C','C','C','C'],
'B':['a','a','b','a','b','a','b','c','c'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B b
5 C a
6 C b
7 C c
8 C c
I want to create a new column with the sequence value for Column B subgroups based on Column A groups like below
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 1
4 B b 2
5 C a 3
6 C b 1
7 C c 2
8 C c 2
I tried this , but does not give me desired output
df['C'] = df.groupby(['A','B']).cumcount()+1

IIUC, I think you want something like this:
df['C'] = df.groupby('A')['B'].transform(lambda x: (x != x.shift()).cumsum())
Output:
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 1
4 B b 2
5 C c 1
6 C b 2
7 C c 3
8 C c 3

Rank by group after sorting in pandas

I have a dataframe which looks like this
pd.DataFrame({'A': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
...: 'B': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C2'],
...: 'X': [1, 2, 1, 2, 2, 3, 4, 5],
...: 'Y': [2, 1, 2, 2, 7, 5, 7, 7],
...: 'Z': [2, 1, 2, 1, 5, 8, 1, 9]})
Out[10]:
A B X Y Z
0 A C1 1 2 2
1 B C1 2 1 1
2 C C1 1 2 2
3 D C1 2 2 1
4 E C2 2 7 5
5 F C2 3 5 8
6 G C2 4 7 1
7 H C2 5 7 9
I need to sort the dataframe by columns B, X, Y, Z and then rank within each group of B.
Resulting dataframe should look like this.
Out[12]:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
5 F C2 3 5 2 2
4 E C2 2 1 5 3
7 H C2 5 7 9 4
I know I can use df.sort_values(['B', 'Z', 'Y', 'X']) to bring in right order but struggling to apply the rank.
what is the 1 line of code for sorting and ranking?

You can use groupby().cumcount():
df['R'] = df.sort_values(['B','X','Y','Z']).groupby('B').cumcount() + 1
Output:
A B X Y Z R
0 A C1 1 2 2 3
1 B C1 2 1 1 1
2 C C1 1 2 2 4
3 D C1 2 2 1 2
4 E C2 2 7 5 2
5 F C2 3 5 8 3
6 G C2 4 7 1 1
7 H C2 5 7 9 4
To match your output, separate sort_values and groupby():
df = df.sort_values(['B','Z','Y','X'])
df['R'] = df.groupby('B').cumcount() + 1
Output:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
4 E C2 2 7 5 2
5 F C2 3 5 8 3
7 H C2 5 7 9 4

Get values and column names

I have a pandas data frame that looks something like this:
data = {'1' : [0, 2, 0, 0], '2' : [5, 0, 0, 2], '3' : [2, 0, 0, 0], '4' : [0, 7, 0, 0]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd'])
df
1 2 3 4
a 0 5 2 0
b 2 0 0 7
c 0 0 0 0
d 0 2 0 0
I know I can get the maximum value and the corresponding column name for each row by doing (respectively):
df.max(1)
df.idxmax(1)
How can I get the values and the column name for every cell that is not zero?
So in this case, I'd want 2 tables, one giving me each value != 0 for each row:
a 5
a 2
b 2
b 7
d 2
And one giving me the column names for those values:
a 2
a 3
b 1
b 4
d 2
Thanks!

You can use stack for Series, then filter by boolean indexing, rename_axis, reset_index and last drop column or select columns by subset:
s = df.stack()
df1 = s[s!= 0].rename_axis(['a','b']).reset_index(name='c')
print (df1)
a b c
0 a 2 5
1 a 3 2
2 b 1 2
3 b 4 7
4 d 2 2
df2 = df1.drop('b', axis=1)
print (df2)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1.drop('c', axis=1)
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2
df3 = df1[['a','c']]
print (df3)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1[['a','b']]
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2

Merge and split columns in pandas dataframe

I want to know how to merge multiple columns, and split them again.
Input data
A B C
1 3 5
2 4 6
Merge A, B, C to one column X
X
1
2
3
4
5
6
Process something with X, then split X into A, B, C again. The number of rows for A, B, C is same(2).
A B C
1 3 5
2 4 6
Is there any simple way for this work?

Start with df:
A B C
0 1 3 5
1 2 4 6
Next, get all values in one column:
df2 = df.unstack().reset_index(drop=True).rename('X').to_frame()
print(df2)
X
0 1
1 2
2 3
3 4
4 5
5 6
And, convert back to original shape:
df3 = pd.DataFrame(df2.values.reshape(2,-1, order='F'), columns=list('ABC'))
print(df3)
A B C
0 1 3 5
1 2 4 6

Setup
df=pd.DataFrame({'A': {0: 1, 1: 2}, 'B': {0: 3, 1: 4}, 'C': {0: 5, 1: 6}})
df
Out[684]:
A B C
0 1 3 5
1 2 4 6
Solution
Merge df to 1 column:
df2 = pd.DataFrame(df.values.flatten('F'),columns=['X'])
Out[686]:
X
0 1
1 2
2 3
3 4
4 5
5 6
Split it back to 3 columns:
pd.DataFrame(df2.values.reshape(-1,3,order='F'),columns=['A','B','C'])
Out[701]:
A B C
0 1 3 5
1 2 4 6

un unwind in the way you'd like, you need to either unstack or ravel with order='F'
Option 1
def proc1(df):
v = df.values
s = v.ravel('F')
s = s * 2
return pd.DataFrame(s.reshape(v.shape, order='F'), df.index, df.columns)
proc1(df)
A B C
0 2 6 10
1 4 8 12
Option 2
def proc2(df):
return df.unstack().mul(2).unstack(0)
proc2(df)
A B C
0 2 6 10
1 4 8 12

Pandas number rows within group in increasing order

Given the following data frame:
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':['a','a','b','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!

Use groupby/cumcount:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3

Use groupby.rank function.
Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unroll a matrix in Pandas - python

In [90]: df = df.stack().reset_index() In [91]: df.columns = ['C1', 'C2', 'v'] In [92]: df Out[92]: C1 C2 v 0 a a 7 1 a b 0 2 a c 3 3 b a 0 4 b b 4 5 b c 2 6 c a 3 7 c b 2 8 c c 9

Use: df = (df.rename_axis('C2') .reset_index() .melt('C2', var_name='C1', value_name='V') .reindex(columns=['C1','C2','V'])) print (df) C1 C2 V 0 a a 7 1 a b 0 2 a c 3 3 b a 0 4 b b 4 5 b c 2 6 c a 3 7 c b 2 8 c c 9

To complete the answer and get the same output, I've added the following code: vv = df.stack().reset_index() vv.columns = ['C1', 'C2', 'V']

Related

Pandas generate numeric sequence for groups in new column

Rank by group after sorting in pandas

Get values and column names

Merge and split columns in pandas dataframe

Pandas number rows within group in increasing order

Categories

Resources