I'm searching for a fast way for resize the matrix in a special way, without using for-loops:
I have a squared Matrix:
matrix = [[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]]
and my purpose is to resize it 3 (or n) times, where the values are diagonal blocks in the matrix and other values are zeros:
goal_matrix = [[ 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0],
[ 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0],
[ 0, 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5],
[ 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10, 0, 0],
[ 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10, 0],
[ 0, 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0,10],
[11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15, 0, 0],
[ 0,11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15, 0],
[ 0, 0,11, 0, 0,12, 0, 0,13, 0, 0,14, 0, 0,15],
[16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20, 0, 0],
[ 0,16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20, 0],
[ 0, 0,16, 0, 0,17, 0, 0,18, 0, 0,19, 0, 0,20],
[21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25, 0, 0],
[ 0,21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25, 0],
[ 0, 0,21, 0, 0,22, 0, 0,23, 0, 0,24, 0, 0,25]]
It should do something like this question, but without unnecessary zero padding.
Is there any mapping, padding or resizing function for doing this in a fast way?
IMO, it is inappropriate to reject the for loop blindly. Here I provide a solution without the for loop. When n is small, its performance is better than that of #MichaelSzczesny and #SalvatoreDanieleBianco solutions:
def mechanic(mat, n):
ar = np.zeros((*mat.shape, n * n), mat.dtype)
ar[..., ::n + 1] = mat[..., None]
return ar.reshape(
*mat.shape,
n,
n
).transpose(0, 3, 1, 2).reshape([s * n for s in mat.shape])
This solution obtains the expected output through a slice assignment, then transpose and reshape, but copies will occur in the last step of reshaping, making it inefficient when n is large.
After a simple test, I found that the solution that simply uses the for loop has the best performance:
def mechanic_for_loop(mat, n):
ar = np.zeros([s * n for s in mat.shape], mat.dtype)
for i in range(n):
ar[i::n, i::n] = mat
return ar
Next is a benchmark test using perfplot. The test functions are as follows:
import numpy as np
def mechanic(mat, n):
ar = np.zeros((*mat.shape, n * n), mat.dtype)
ar[..., ::n + 1] = mat[..., None]
return ar.reshape(
*mat.shape,
n,
n
).transpose(0, 3, 1, 2).reshape([s * n for s in mat.shape])
def mechanic_for_loop(mat, n):
ar = np.zeros([s * n for s in mat.shape], mat.dtype)
for i in range(n):
ar[i::n, i::n] = mat
return ar
def michael_szczesny(mat, n):
return np.einsum(
'ij,kl->ikjl',
mat,
np.eye(n, dtype=mat.dtype)
).reshape([s * n for s in mat.shape])
def salvatore_daniele_bianco(mat, n):
repeated_matrix = mat.repeat(n, axis=0).repeat(n, axis=1)
col_ids, row_ids = np.meshgrid(
np.arange(repeated_matrix.shape[0]),
np.arange(repeated_matrix.shape[1])
)
repeated_matrix[(col_ids % n) - (row_ids % n) != 0] = 0
return repeated_matrix
functions = [
mechanic,
mechanic_for_loop,
michael_szczesny,
salvatore_daniele_bianco
]
Resize times unchanged, array size changes:
if __name__ == '__main__':
from itertools import accumulate, repeat
from operator import mul
from perfplot import bench
bench(
functions,
list(accumulate(repeat(2, 11), mul)),
lambda n: (np.arange(n * n).reshape(n, n), 5),
xlabel='ar.shape[0]'
).show()
Output:
Resize times changes, array size unchanged:
if __name__ == '__main__':
from itertools import accumulate, repeat
from operator import mul
from perfplot import bench
ar = np.arange(25).reshape(5, 5)
bench(
functions,
list(accumulate(repeat(2, 11), mul)),
lambda n: (ar, n),
xlabel='resize times'
).show()
Output:
Input:
matrix = np.array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9,10],
[11,12,13,14,15],
[16,17,18,19,20],
[21,22,23,24,25]])
Solution:
repeated_matrix = matrix.repeat(3, axis=0).repeat(3, axis=1)
col_ids, row_ids = np.meshgrid(np.arange(repeated_matrix.shape[0]), np.arange(repeated_matrix.shape[1]))
repeated_matrix[(col_ids%3)-(row_ids%3)!=0]=0
Output (repeated_matrix):
array([[ 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0, 0],
[ 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5, 0],
[ 0, 0, 1, 0, 0, 2, 0, 0, 3, 0, 0, 4, 0, 0, 5],
[ 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10, 0, 0],
[ 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10, 0],
[ 0, 0, 6, 0, 0, 7, 0, 0, 8, 0, 0, 9, 0, 0, 10],
[11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15, 0, 0],
[ 0, 11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15, 0],
[ 0, 0, 11, 0, 0, 12, 0, 0, 13, 0, 0, 14, 0, 0, 15],
[16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20, 0, 0],
[ 0, 16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20, 0],
[ 0, 0, 16, 0, 0, 17, 0, 0, 18, 0, 0, 19, 0, 0, 20],
[21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25, 0, 0],
[ 0, 21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25, 0],
[ 0, 0, 21, 0, 0, 22, 0, 0, 23, 0, 0, 24, 0, 0, 25]])
basically you can define your custom function to do this on any matrix like:
def what_you_whant(your_matrix, n_repeats):
repeated_matrix = your_matrix.repeat(n_repeats, axis=0).repeat(n_repeats, axis=1)
col_ids, row_ids = np.meshgrid(np.arange(repeated_matrix.shape[1]), np.arange(repeated_matrix.shape[0]))
repeated_matrix[(col_ids%n_repeats)-(row_ids%n_repeats)!=0]=0
return repeated_matrix
As Michael Szczesny suggested in his comment:
The fastest way is to use the einsum, and multiplicate the matrix with the identification matrix with size of the block and reshape it to the expanded size:
np.einsum('ij,kl->ikjl', matrix, np.eye(3)).reshape(len(matrix) * 3, -1)
another more straight forward answer (but ~4x slower) is to use the Kronecker product. Again multiplying the matrix with the identity matrix:
np.kron(matrix, np.eye(3))
i am feeding in y_test and y_pred to a confusion matrix. My data is for multi label classification so the row values are one hot encodings.
my data has 30 labels but after feeding into the confusion matrix, the output only has 11 rows and cols which is confusing me. I thought i should have a 30X30.
Their formats are numpy arrays. (y_test and y_pred are dataframes of which i convert to numpy arrays using dataframe.values)
y_test.shape
(8680, 30)
y_test
array([[1, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
y_pred.shape
(8680, 30)
y_pred
array([[1, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
[1, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]])
I transform them to confusion matrix usable format:
y_test2 = y_test.argmax(axis=1)
y_pred2 = y_pred.argmax(axis=1)
conf_mat = confusion_matrix(y_test2, y_pred2)
here is what my confusion matrix look like:
conf_mat.shape
(11, 11)
conf_mat
array([[4246, 77, 13, 72, 81, 4, 6, 3, 0, 0, 4],
[ 106, 2010, 20, 23, 21, 0, 5, 2, 0, 0, 0],
[ 143, 41, 95, 32, 10, 3, 14, 1, 1, 1, 2],
[ 101, 1, 0, 351, 36, 0, 0, 0, 0, 0, 0],
[ 346, 23, 7, 10, 746, 5, 6, 4, 3, 3, 2],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
Why does my confusion matrix only have 11 X 11 shape? shouldn't it be 30X30?
I think you are not quit clear the definition of confusion_matrix
y_true = [2, 0, 2, 2, 0, 1]
y_pred = [0, 0, 2, 2, 0, 2]
confusion_matrix(y_true, y_pred)
array([[2, 0, 0],
[0, 0, 1],
[1, 0, 2]])
Which in data frame is
pd.DataFrame(confusion_matrix(y_true, y_pred),columns=[0,1,2],index=[0,1,2])
Out[245]:
0 1 2
0 2 0 0
1 0 0 1
2 1 0 2
The column and index are the category of input.
You have (11,11), which means you only have 11 categories in your data
All this means is that some labels are unused.
y_test.any(axis=0)
y_pred.any(axis=0)
Should show that only 11 of the columns have any 1s in them.
Here's what it would look like if that was not the case:
from sklearn.metrics import confusion_matrix
y_test = np.zeros((8680, 30))
y_pred = np.zeros((8680, 30))
y_test[np.arange(8680), np.random.randint(0, 30, 8680)] = 1
y_pred[np.arange(8680), np.random.randint(0, 30, 8680)] = 1
y_test2 = y_test.argmax(axis=1)
y_pred2 = y_pred.argmax(axis=1)
confusion_matrix(y_test2, y_pred2).shape # (30, 30)
How to use one array filter out another array with non-zero value?
from numpy import array
a = array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
b = array([[0, 0, 1, 0, 0],
[0, 0, 2, 0, 0],
[0, 0, 3, 0, 0],
[0, 0, 4, 0, 0],
[0, 0, 5, 0, 0]])
Expected result:
array([[ 0, 0, 2, 0, 0],
[ 0, 0, 7, 0, 0],
[ 0, 0, 12, 0, 0],
[ 0, 0, 17, 0, 0],
[ 0, 0, 22, 0, 0]])
Thank you
The easiest way if you want a new array would be np.where with 3 arguments:
>>> import numpy as np
>>> np.where(b, a, 0)
array([[ 0, 0, 2, 0, 0],
[ 0, 0, 7, 0, 0],
[ 0, 0, 12, 0, 0],
[ 0, 0, 17, 0, 0],
[ 0, 0, 22, 0, 0]])
If you want to change a in-place you could instead use boolean indexing based on b:
>>> a[b == 0] = 0
>>> a
array([[ 0, 0, 2, 0, 0],
[ 0, 0, 7, 0, 0],
[ 0, 0, 12, 0, 0],
[ 0, 0, 17, 0, 0],
[ 0, 0, 22, 0, 0]])
One line solution:
a * (b != 0)