Unroll a matrix in Pandas - python

I've got a matrix like this:
df = pd.DataFrame({'a':[7, 0, 3], 'b':[0, 4, 2], 'c':[3, 2, 9]})
df.index = list(df)
df
a b c
a 7 0 3
b 0 4 2
c 3 2 9
And I'd like to get something like this:
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9
For which I've written the following code:
vv = pd.DataFrame(columns=['C1', 'C2', 'V'])
i = 0
for cat1 in df.index:
for cat2 in df.index:
vv.loc[i] = [cat1, cat2, d[cat1][cat2]]
i += 1
vv['V'] = vv['V'].astype(int)
Is there a better/faster/more elegant way of doing this?

In [90]: df = df.stack().reset_index()
In [91]: df.columns = ['C1', 'C2', 'v']
In [92]: df
Out[92]:
C1 C2 v
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

YOu can use the stack() method followed by resetting the index and renaming the columns.
df = pd.DataFrame({'a':[7, 0, 3], 'b':[0, 4, 2], 'c':[3, 2, 9]})
df.index = list(df)
result = df.stack().reset_index().rename(columns={'level_0':'C1', 'level_1':'C2',0:'V'})
print(result)
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

Use:
df = (df.rename_axis('C2')
.reset_index()
.melt('C2', var_name='C1', value_name='V')
.reindex(columns=['C1','C2','V']))
print (df)
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

You can use stack:
df.stack()
a a 7
b 0
c 3
b a 0
b 4
c 2
c a 3
b 2
c 9
dtype: int64
The pd.set_option('display.multi_sparse', False) will desparsen the series, showing the values in every row
Additionally, with proper renaming in a pipeline
df.stack()
.reset_index()
.rename(columns={'level_0': 'C1', 'level_1': 'C2', 0:'V'})
yields:
C1 C2 V
0 a a 7
1 a b 0
2 a c 3
3 b a 0
4 b b 4
5 b c 2
6 c a 3
7 c b 2
8 c c 9

To complete the answer and get the same output, I've added the following code:
vv = df.stack().reset_index()
vv.columns = ['C1', 'C2', 'V']

Related

Pandas generate numeric sequence for groups in new column

I am working on a data frame as below,
import pandas as pd
df=pd.DataFrame({'A':['A','A','A','B','B','C','C','C','C'],
'B':['a','a','b','a','b','a','b','c','c'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B b
5 C a
6 C b
7 C c
8 C c
I want to create a new column with the sequence value for Column B subgroups based on Column A groups like below
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 1
4 B b 2
5 C a 3
6 C b 1
7 C c 2
8 C c 2
I tried this , but does not give me desired output
df['C'] = df.groupby(['A','B']).cumcount()+1
IIUC, I think you want something like this:
df['C'] = df.groupby('A')['B'].transform(lambda x: (x != x.shift()).cumsum())
Output:
A B C
0 A a 1
1 A a 1
2 A b 2
3 B a 1
4 B b 2
5 C c 1
6 C b 2
7 C c 3
8 C c 3

Rank by group after sorting in pandas

I have a dataframe which looks like this
pd.DataFrame({'A': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
...: 'B': ['C1', 'C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C2'],
...: 'X': [1, 2, 1, 2, 2, 3, 4, 5],
...: 'Y': [2, 1, 2, 2, 7, 5, 7, 7],
...: 'Z': [2, 1, 2, 1, 5, 8, 1, 9]})
Out[10]:
A B X Y Z
0 A C1 1 2 2
1 B C1 2 1 1
2 C C1 1 2 2
3 D C1 2 2 1
4 E C2 2 7 5
5 F C2 3 5 8
6 G C2 4 7 1
7 H C2 5 7 9
I need to sort the dataframe by columns B, X, Y, Z and then rank within each group of B.
Resulting dataframe should look like this.
Out[12]:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
5 F C2 3 5 2 2
4 E C2 2 1 5 3
7 H C2 5 7 9 4
I know I can use df.sort_values(['B', 'Z', 'Y', 'X']) to bring in right order but struggling to apply the rank.
what is the 1 line of code for sorting and ranking?
You can use groupby().cumcount():
df['R'] = df.sort_values(['B','X','Y','Z']).groupby('B').cumcount() + 1
Output:
A B X Y Z R
0 A C1 1 2 2 3
1 B C1 2 1 1 1
2 C C1 1 2 2 4
3 D C1 2 2 1 2
4 E C2 2 7 5 2
5 F C2 3 5 8 3
6 G C2 4 7 1 1
7 H C2 5 7 9 4
To match your output, separate sort_values and groupby():
df = df.sort_values(['B','Z','Y','X'])
df['R'] = df.groupby('B').cumcount() + 1
Output:
A B X Y Z R
1 B C1 2 1 1 1
3 D C1 2 2 1 2
0 A C1 1 2 2 3
2 C C1 1 2 2 4
6 G C2 4 7 1 1
4 E C2 2 7 5 2
5 F C2 3 5 8 3
7 H C2 5 7 9 4

Get values and column names

I have a pandas data frame that looks something like this:
data = {'1' : [0, 2, 0, 0], '2' : [5, 0, 0, 2], '3' : [2, 0, 0, 0], '4' : [0, 7, 0, 0]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd'])
df
1 2 3 4
a 0 5 2 0
b 2 0 0 7
c 0 0 0 0
d 0 2 0 0
I know I can get the maximum value and the corresponding column name for each row by doing (respectively):
df.max(1)
df.idxmax(1)
How can I get the values and the column name for every cell that is not zero?
So in this case, I'd want 2 tables, one giving me each value != 0 for each row:
a 5
a 2
b 2
b 7
d 2
And one giving me the column names for those values:
a 2
a 3
b 1
b 4
d 2
Thanks!
You can use stack for Series, then filter by boolean indexing, rename_axis, reset_index and last drop column or select columns by subset:
s = df.stack()
df1 = s[s!= 0].rename_axis(['a','b']).reset_index(name='c')
print (df1)
a b c
0 a 2 5
1 a 3 2
2 b 1 2
3 b 4 7
4 d 2 2
df2 = df1.drop('b', axis=1)
print (df2)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1.drop('c', axis=1)
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2
df3 = df1[['a','c']]
print (df3)
a c
0 a 5
1 a 2
2 b 2
3 b 7
4 d 2
df3 = df1[['a','b']]
print (df3)
a b
0 a 2
1 a 3
2 b 1
3 b 4
4 d 2

Merge and split columns in pandas dataframe

I want to know how to merge multiple columns, and split them again.
Input data
A B C
1 3 5
2 4 6
Merge A, B, C to one column X
X
1
2
3
4
5
6
Process something with X, then split X into A, B, C again. The number of rows for A, B, C is same(2).
A B C
1 3 5
2 4 6
Is there any simple way for this work?
Start with df:
A B C
0 1 3 5
1 2 4 6
Next, get all values in one column:
df2 = df.unstack().reset_index(drop=True).rename('X').to_frame()
print(df2)
X
0 1
1 2
2 3
3 4
4 5
5 6
And, convert back to original shape:
df3 = pd.DataFrame(df2.values.reshape(2,-1, order='F'), columns=list('ABC'))
print(df3)
A B C
0 1 3 5
1 2 4 6
Setup
df=pd.DataFrame({'A': {0: 1, 1: 2}, 'B': {0: 3, 1: 4}, 'C': {0: 5, 1: 6}})
df
Out[684]:
A B C
0 1 3 5
1 2 4 6
Solution
Merge df to 1 column:
df2 = pd.DataFrame(df.values.flatten('F'),columns=['X'])
Out[686]:
X
0 1
1 2
2 3
3 4
4 5
5 6
Split it back to 3 columns:
pd.DataFrame(df2.values.reshape(-1,3,order='F'),columns=['A','B','C'])
Out[701]:
A B C
0 1 3 5
1 2 4 6
un unwind in the way you'd like, you need to either unstack or ravel with order='F'
Option 1
def proc1(df):
v = df.values
s = v.ravel('F')
s = s * 2
return pd.DataFrame(s.reshape(v.shape, order='F'), df.index, df.columns)
proc1(df)
A B C
0 2 6 10
1 4 8 12
Option 2
def proc2(df):
return df.unstack().mul(2).unstack(0)
proc2(df)
A B C
0 2 6 10
1 4 8 12

Pandas number rows within group in increasing order

Given the following data frame:
import pandas as pd
import numpy as np
df=pd.DataFrame({'A':['A','A','A','B','B','B'],
'B':['a','a','b','a','a','a'],
})
df
A B
0 A a
1 A a
2 A b
3 B a
4 B a
5 B a
I'd like to create column 'C', which numbers the rows within each group in columns A and B like this:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
I've tried this so far:
df['C']=df.groupby(['A','B'])['B'].transform('rank')
...but it doesn't work!
Use groupby/cumcount:
In [25]: df['C'] = df.groupby(['A','B']).cumcount()+1; df
Out[25]:
A B C
0 A a 1
1 A a 2
2 A b 1
3 B a 1
4 B a 2
5 B a 3
Use groupby.rank function.
Here the working example.
df = pd.DataFrame({'C1':['a', 'a', 'a', 'b', 'b'], 'C2': [1, 2, 3, 4, 5]})
df
C1 C2
a 1
a 2
a 3
b 4
b 5
df["RANK"] = df.groupby("C1")["C2"].rank(method="first", ascending=True)
df
C1 C2 RANK
a 1 1
a 2 2
a 3 3
b 4 1
b 5 2

Categories