Split matrix in python into square matrices? - python

Is there a quick and easy way to split a MxN matrix into matrices of size AxA (square matrices) starting greedily from the top left in python specifically? I had a 2d numpy array.
For example
1 2 3 4
6 7 8 9
1 2 3 4
6 7 8 9
0 0 0 0
If I want to split into 2X2 the outcome should be a list like:
1 2
6 7
3 4
8 9
1 2
6 7
3 4
8 9
(Notice the 0 0 0 0 at the bottom gets left out)
Is there a "clean" way to write this? I can write it in brute force but it is not at all pretty.

You may do this in one line (using numpy) by:
test = np.arange(35).reshape(5,7)
M, N = test.shape
A = 2
print(test)
print('\n')
split_test = test[0:M-M%A, 0:N-N%A].reshape(M//A, A, -1, A).swapaxes(1, 2).reshape(-1, A, A)
print(split_test)
Output of above code is:
[[ 0 1 2 3 4 5 6]
[ 7 8 9 10 11 12 13]
[14 15 16 17 18 19 20]
[21 22 23 24 25 26 27]
[28 29 30 31 32 33 34]]
[[[ 0 1]
[ 7 8]]
[[ 2 3]
[ 9 10]]
[[ 4 5]
[11 12]]
[[14 15]
[21 22]]
[[16 17]
[23 24]]
[[18 19]
[25 26]]]

If you are ok with using skimage:
a = np.r_[np.add.outer((1,6,1,6),range(4)),[[0,0,0,0]]]
from skimage.util import view_as_windows
sz = 2,2
view_as_windows(a,sz,sz)
# array([[[[1, 2],
# [6, 7]],
#
# [[3, 4],
# [8, 9]]],
#
#
# [[[1, 2],
# [6, 7]],
#
# [[3, 4],
# [8, 9]]]])

Related

Reshape data frame, so the index column values become the columns

I want to reshape the data so that the values in the index column become the columns
My Data frame:
Gender_Male Gender_Female Location_london Location_North Location_South
Cat
V 5 4 4 2 3
W 15 12 12 7 8
X 11 15 16 4 6
Y 22 18 21 9 9
Z 8 7 7 4 4
Desired Data frame:
Is there an easy way to do this? I also have 9 other categorical variables in my data set in addition to the Gender and Location variables. I have only included two variables to keep the example simple.
Code to create the example dataframe:
df1 = pd.DataFrame({
'Cat' : ['V','W', 'X', 'Y', 'Z'],
'Gender_Male' : [5, 15, 11, 22, 8],
'Gender_Female' : [4, 12, 15, 18, 7],
'Location_london': [4,12, 16, 21, 7],
'Location_North' : [2, 7, 4, 9, 4],
'Location_South' : [3, 8, 6, 9, 4]
}).set_index('Cat')
df1
You can transpose the dataframe and then split and set the new index:
Transpose
dft = df1.T
print(dft)
Cat V W X Y Z
Gender_Male 5 15 11 22 8
Gender_Female 4 12 15 18 7
Location_london 4 12 16 21 7
Location_North 2 7 4 9 4
Location_South 3 8 6 9 4
Split and set the new index
dft.index = dft.index.str.split('_', expand=True)
dft.columns.name = None
print(dft)
V W X Y Z
Gender Male 5 15 11 22 8
Female 4 12 15 18 7
Location london 4 12 16 21 7
North 2 7 4 9 4
South 3 8 6 9 4

python shuffle columns in matrix

There is a way to randomly permute the columns of a matrix? I tried to use np.random.permutation but the obtained result is not what i need.
What i would like to obtain is to change randomly the position of the columns of the matrix, without to change the position of the values of each columns.
Es.
starting matrix:
1 6 11 16
2 7 12 17
3 8 13 18
4 9 14 19
5 10 15 20
Resulting matrix
11 7 1 16
12 8 2 17
13 9 3 18
14 10 4 19
15 11 5 20
You could shuffle the transposed array:
q = np.array([1, 6, 11, 16, 2, 7, 12, 17, 3, 8, 13, 18, 4, 9, 14, 19, 5, 10, 15, 20])
q = q.reshape((5,4))
print(q)
# [[ 1 6 11 16]
# [ 2 7 12 17]
# [ 3 8 13 18]
# [ 4 9 14 19]
# [ 5 10 15 20]]
np.random.shuffle(np.transpose(q))
print(q)
# [[ 1 16 6 11]
# [ 2 17 7 12]
# [ 3 18 8 13]
# [ 4 19 9 14]
# [ 5 20 10 15]]
Another option for general axis is:
q = np.array([1, 6, 11, 16, 2, 7, 12, 17, 3, 8, 13, 18, 4, 9, 14, 19, 5, 10, 15, 20])
q = q.reshape((5,4))
q = q[:, np.random.permutation(q.shape[1])]
print(q)
# [[ 6 11 16 1]
# [ 7 12 17 2]
# [ 8 13 18 3]
# [ 9 14 19 4]
# [10 15 20 5]]

Is there an opposite function of pandas.DataFrame.droplevel (like keeplevel)?

Is there an opposite function of pandas.DataFrame.droplevel where I can keep some levels of the multi-level index/columns using either the level name or index?
Example:
df = pd.DataFrame([
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16]
], columns=['a','b','c','d']).set_index(['a','b','c']).T
a 1 5 9 13
b 2 6 10 14
c 3 7 11 15
d 4 8 12 16
Both the following commands can return the following dataframe:
df.droplevel(['a','b'], axis=1)
df.droplevel([0, 1], axis=1)
c 3 7 11 15
d 4 8 12 16
I am looking for a "keeplevel" command such that both the following commands can return the following dataframe:
df.keeplevel(['a','b'], axis=1)
df.keeplevel([0, 1], axis=1)
a 1 5 9 13
b 2 6 10 14
d 4 8 12 16
There is no keeplevel because it would be redundant: in a closed and well-defined set, when you define what you want to drop, you automatically define what you want to keep
You may get the difference from what you have and what droplevel returns.
def keeplevel(df, levels, axis=1):
return df.droplevel(df.axes[axis].droplevel(levels).names, axis=axis)
>>> keeplevel(df, [0, 1])
a 1 5 9 13
b 2 6 10 14
d 4 8 12 16
Using set to find the different
df.droplevel(list(set(df.columns.names)-set(['a','b'])),axis=1)
Out[134]:
a 1 5 9 13
b 2 6 10 14
d 4 8 12 16
You can modify the Index objects, which should be fast. Note, this will even modify inplace.
def keep_level(df, keep, axis):
idx = pd.MultiIndex.from_arrays([df.axes[axis].get_level_values(x) for x in keep])
df.set_axis(idx, axis=axis, inplace=True)
return df
keep_level(df.copy(), ['a', 'b'], 1) # Copy to not modify original for illustration
#a 1 5 9 13
#b 2 6 10 14
#d 4 8 12 16
keep_level(df.copy(), [0, 1], 1)
#a 1 5 9 13
#b 2 6 10 14
#d 4 8 12 16

Lookup values of one Pandas dataframe in another

I have two dataframes, and I want to do a lookup much like a Vlookup in excel.
df_orig.head()
A
0 3
1 4
2 6
3 7
4 8
df_new
Combined Length Group_name
0 [8, 9, 112, 114, 134, 135] 6 Group 1
1 [15, 16, 17, 18, 19, 20] 6 Group 2
2 [15, 16, 17, 18, 19] 5 Group 3
3 [16, 17, 18, 19, 20] 5 Group 4
4 [15, 16, 17, 18] 4 Group 5
5 [8, 9, 112, 114] 4 Group 6
6 [18, 19, 20] 3 Group 7
7 [28, 29, 30] 3 Group 8
8 [21, 22] 2 Group 9
9 [28, 29] 2 Group 10
10 [26, 27] 2 Group 11
11 [24, 25] 2 Group 12
12 [3, 4] 2 Group 13
13 [6, 7] 2 Group 14
14 [11, 14] 2 Group 15
15 [12, 13] 2 Group 16
16 [0, 1] 2 Group 17
How can I add the values in df_new["Group_name"] to df_orig["A"]?
The "Group_name" must be based on the lookup of the values from df_orig["A"] in df_new["Combined"].
So it would look like:
df_orig.head()
A Looked_up
0 3 Group 13
1 4 Group 13
2 6 Group 14
3 7 Group 14
4 8 Group 1
Thank you!
Two steps ***unnest*** + merge
df=pd.DataFrame({'Combined':df.Combined.sum(),'Group_name':df['Group_name'].repeat(df.Length)})
df_orig.merge(df.groupby('Combined').head(1).rename(columns={'Combined':'A'}))
Out[77]:
A Group_name
0 3 Group 13
1 4 Group 13
2 6 Group 14
3 7 Group 14
4 8 Group 1
Here is one way which mimics a vlookup. Minimal example below.
import pandas as pd
df_origin = pd.DataFrame({'A': [3, 11, 0, 12, 6]})
df_new = pd.DataFrame({'Combined': [[3, 4, 5], [6, 7], [11, 14, 20],
[12, 13], [3, 1], [0, 4]],
'Group_name': ['Group 13', 'Group 14', 'Group 15',
'Group 16', 'Group 17', 'Group 18']})
df_new['ID'] = list(zip(*df_new['Combined'].tolist()))[0]
df_origin['Group_name'] = df_origin['A'].map(df_new.drop_duplicates('ID')\
.set_index('ID')['Group_name'])
Result
A Group_name
0 3 Group 13
1 11 Group 15
2 0 Group 18
3 12 Group 16
4 6 Group 14
Explanation
Extract the first element of lists in df_new['Combined'] via zip.
Use drop_duplicates and then create a series mapping ID to Group_name.
Finally, use pd.Series.map to map df_origin['A'] to Group_name via this series.

How to return max value of quantile cut range instead of quantile label

I need to bin continuous data into an arbitrary number of quantiles. However, my application needs the maximum value of quantile bin returned:
import pandas as pd
import numpy as np
In [1]: s = pd.Series(np.random.randint(0,20,20)); s[:5]
Out[1]:
0 0
1 15
2 5
3 19
4 15
Let's say I create 5 quantiles using pandas.qcut:
In [2]: bins = pd.qcut(s,5); bins
Out[2]:
Categorical:
array([[0, 1.8], (9.8, 15.2], (1.8, 6.2], (15.2, 19], (9.8, 15.2],
(1.8, 6.2], (6.2, 9.8], (6.2, 9.8], (15.2, 19], (9.8, 15.2],
[0, 1.8], (6.2, 9.8], (1.8, 6.2], [0, 1.8], (9.8, 15.2], [0, 1.8],
(15.2, 19], (15.2, 19], (6.2, 9.8], (1.8, 6.2]], dtype=object)
Levels (5): Index([[0, 1.8], (1.8, 6.2], (6.2, 9.8], (9.8, 15.2],
(15.2, 19]], dtype=object)
With bin labels:
In [3]: bins.labels
Out[3]: array([0, 3, 1, 4, 3, 1, 2, 2, 4, 3, 0, 2, 1, 0, 3, 0, 4, 4, 2, 1])
Rather than return the number of the quantile, is there a way I can return the upper bin edge that each value belongs to? Here's an example of my desired output:
original bin_max
0 0 1
1 15 15
2 5 5
3 19 19
4 15 15
5 2 5
6 7 9
7 7 9
8 16 19
9 12 15
10 0 1
11 8 9
12 5 5
13 1 1
14 11 15
15 1 1
16 18 19
17 16 19
18 9 9
19 3 5
This is the solution I'm currently using, but it seems inefficient to groupby the qcut when the value I need is already found in the qcut labels:
In [4]: s.groupby(pd.qcut(s,5)).transform(max)
Out[4]:
0 1
1 15
2 5
3 19
4 15
5 5
You could use retbins=True to get the edges of the bin as a numpy array:
import pandas as pd
import numpy as np
np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))
categories, edges = pd.qcut(s, 5, retbins=True)
df = pd.DataFrame({'original':s,
'bin_max': edges[1:][categories.labels]},
columns = ['original', 'bin_max'])
print(df)
yields
original bin_max
0 5 5.0
1 11 11.0
2 12 13.4
3 8 8.6
4 9 11.0
5 11 11.0
6 5 5.0
7 15 18.0
8 0 5.0
9 16 18.0
10 1 5.0
11 12 13.4
12 7 8.6
13 13 13.4
14 6 8.6
15 18 18.0
16 5 5.0
17 18 18.0
18 11 11.0
19 10 11.0
for me worked better with labels=False
import pandas as pd
import numpy as np
np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))
categories, edges = pd.qcut(s, 5, retbins=True, labels=False)
df = pd.DataFrame({'original':s,
'bin_max': edges[1:][categories]},
columns = ['original', 'bin_max'])
print(df)

Categories