How can I get uneven submatrices from NxN matrix? - python

I have a large NxN matrix that I'm looking to retrieve multiple submatrices from. Each of these submatrices can be different sizes but they can't overlap (see attached pic). Is there a function in Python that could remotely do what I'm looking to achieve?
example of submatrices in NxN matrix
This is what I've written so far; however, it doesn't give me back a square submatrix
import numpy as np
# Create a 10x10 matrix
matrix = np.arange(0, 100).reshape((10, 10))
print(matrix)
# Define the sizes of the submatrices
submatrix_sizes = [4, 4, 5]
# Calculate the starting and ending indices for each submatrix
starts = np.cumsum([0] + submatrix_sizes[:-1])
ends = np.cumsum(submatrix_sizes)
# Split the matrix into submatrices of the specified sizes
submatrices = np.split(matrix, ends, axis=1)[:-1]
# Print the submatrices
for i, submatrix in enumerate(submatrices):
print(f"Submatrix {i+1}:")
print(submatrix)
Output
[[ 0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 23 24 25 26 27 28 29]
[30 31 32 33 34 35 36 37 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 88 89]
[90 91 92 93 94 95 96 97 98 99]]
Submatrix 1:
[[ 0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]
[50 51 52 53]
[60 61 62 63]
[70 71 72 73]
[80 81 82 83]
[90 91 92 93]]
Submatrix 2:
[[ 4 5 6 7]
[14 15 16 17]
[24 25 26 27]
[34 35 36 37]
[44 45 46 47]
[54 55 56 57]
[64 65 66 67]
[74 75 76 77]
[84 85 86 87]
[94 95 96 97]]
Submatrix 3:
[[ 8 9]
[18 19]
[28 29]
[38 39]
[48 49]
[58 59]
[68 69]
[78 79]
[88 89]
[98 99]]

Your starts and ends are not calculated correctly:
It is impossible to have index of 13 on any axis on a 10x10 matix.
you don't use the calculated starts while slicing
starts = np.cumsum([0] + submatrix_sizes[:-1])
# has to be disiced how to calculate these correctly
ends = np.cumsum(submatrix_sizes)
breaks = list(zip(starts, ends))
# slicing x and y axis not only x
submatrix_sizes = [matrix[elem[0]:elem[1], elem[0]:elem[1]] for elem in breaks]

Related

Why is 4D realisation of Max-Pooling in numpy misleading?

I'm trying to understand an algorithm of Max-Pooling in numpy. There are many answers like this that offer to give a new 4 - dimensional shape to two - dimensional image and then call np.max on axis 1 and 3:
window = (2, 4)
arr = np.random.randint(99, size=(1,8,12))
shape = (arr.shape[1]//window[0], window[0], arr.shape[2]//window[1], window[1])
out = arr.reshape(shape).max(axis=(1, 3))
According to my visual understanding, I should operate on axis=(0, 2) so it will shrink to the size 1 and produce an output like so:
That makes a lot of sense but it's not correct:
arr = np.random.randint(99, size=(1,8,12)) =
[[[ 7 55 21 88 69 35 7 7 73 54 16 80]
[70 79 62 55 42 5 77 81 38 52 69 39]
[58 78 48 35 5 93 47 64 18 25 73 25]
[14 8 63 27 28 46 29 68 28 38 51 79]
[70 15 37 51 72 27 44 79 1 79 75 9]
[ 4 27 0 90 15 30 95 62 14 8 69 57]
[24 29 26 44 72 89 74 78 39 29 6 2]
[82 12 0 11 54 38 61 79 91 92 53 28]]]
--------------------------------------------------
arr.reshape(4, 2, 3, 4).max(axis=(0, 2)) =
[[73 93 75 88]
[91 92 95 90]]
--------------------------------------------------
arr.reshape(4, 2, 3, 4).max(axis=(1, 3)) =
[[88 81 80]
[78 93 79]
[90 95 79]
[82 89 92]]
So it doesn't ever agree with my picture in reality. What is the source of this disagreement? What are the reasons it's not working as expected?

Is there a numpy function which takes from each line i of a matrix an element on the column y[i] and puts them all into an array?

So basically, I need a numpy function which will do this or something similar to this:
correct_answers = np.array([scores[i][y[i]] for i in range(num_train)])
but using numpy, because Python list comprehension is too slow for me
scores is a num_train X columns matrix and y is an array of length num_train and takes values from 0 to columns - 1 inclusive
Is there a workaround using arange or something similar? Thanks.
import numpy as np
y = np.arange(81).reshape(9, 9)
correct_answers = y[np.arange(9), np.arange(9)]
output:
y =
[[ 0 1 2 3 4 5 6 7 8]
[ 9 10 11 12 13 14 15 16 17]
[18 19 20 21 22 23 24 25 26]
[27 28 29 30 31 32 33 34 35]
[36 37 38 39 40 41 42 43 44]
[45 46 47 48 49 50 51 52 53]
[54 55 56 57 58 59 60 61 62]
[63 64 65 66 67 68 69 70 71]
[72 73 74 75 76 77 78 79 80]]
correct_answers =
[ 0 10 20 30 40 50 60 70 80]
correct_answers = scores[np.arange(num_train), y[np.arange(num_train)]]
This does the thing I wanted to do, props to the other dude which gave me the idea

How tf.data.experimental.group_by_window() operates in Tensorflow 2.0

I am trying to understand the tf.data.experimental.group_by_window() method in Tensorflow 2 but I have some difficulties.
For a reproducible example I use the one presented in the documentation:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,\
els: els.batch(10), window_size=100))
i = 0
for elem in dataset20:
print('i is {0}\n'.format(i))
print('elem is {0}'.format(elem.numpy()))
i += 1
print('\n--------------------------------\n')
i is 0
elem is [0 2 4 6 8]
--------------------------------
i is 1
elem is [1 3 5 7 9]
--------------------------------
Part of the confusion may be that the output doesn't correspond to the example code. The actual output from this:
components = np.arange(100).astype(np.int64)
dataset20 = tf.data.Dataset.from_tensor_slices(components)
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _,els: els.batch(10), window_size=100))
for i, d in enumerate(dataset20):
print(i, d.numpy())
is
0 [ 0 2 4 6 8 10 12 14 16 18]
1 [20 22 24 26 28 30 32 34 36 38]
2 [40 42 44 46 48 50 52 54 56 58]
3 [60 62 64 66 68 70 72 74 76 78]
4 [80 82 84 86 88 90 92 94 96 98]
5 [ 1 3 5 7 9 11 13 15 17 19]
6 [21 23 25 27 29 31 33 35 37 39]
7 [41 43 45 47 49 51 53 55 57 59]
8 [61 63 65 67 69 71 73 75 77 79]
9 [81 83 85 87 89 91 93 95 97 99]
As described in the documentation here, the key func separates the data into groups with associated key values. In the example the key func separates the data [0, 99] into even and odd groups. The reduce_func then operates on the key, group pairs to produce another dataset. Note though that reduce_func only operates on groups of data no greater than window_size. In the example, the window size is greater than the two group sizes (100 vs 50 elements), so has no effect and all evens are given in batches of 10 followed by all odds. If window size is changed to a value less than 50 then it does have an effect. For example, if the window size is changed to 5 and also the batching is moved to outside the group_by_window function:
dataset20 = dataset20.apply(tf.data.experimental.group_by_window(key_func=lambda x: x%2, reduce_func=lambda _, els: els, window_size=5)).batch(10)
then the following output is produced:
0 [0 2 4 6 8 1 3 5 7 9]
1 [10 12 14 16 18 11 13 15 17 19]
2 [20 22 24 26 28 21 23 25 27 29]
3 [30 32 34 36 38 31 33 35 37 39]
4 [40 42 44 46 48 41 43 45 47 49]
5 [50 52 54 56 58 51 53 55 57 59]
6 [60 62 64 66 68 61 63 65 67 69]
7 [70 72 74 76 78 71 73 75 77 79]
8 [80 82 84 86 88 81 83 85 87 89]
9 [90 92 94 96 98 91 93 95 97 99]

Issue with matrix slicing in n-deminsional matrix

I was stuck in python function, but later solved it. I have a question regarding the python n-dimensional notation. That matrix was A(2,4,4,3). So what's the difference in accessing the matrix as A[:][0:3, 0:3, 3] and A[:][ 0:3, 0:3 ][3]
Test array(2,4,4,3):
[[[[ 0 1 2] [[[48 49 50]
[ 3 4 5] [51 52 53]
[ 6 7 8] [54 55 56]
[ 9 10 11]] [57 58 59]]
[[12 13 14] [[60 61 62]
[15 16 17] [63 64 65]
[18 19 20] [66 67 68]
[21 22 23]] [69 70 71]]
[[24 25 26] [[72 73 74]
[27 28 29] [75 76 77]
[30 31 32] [78 79 80]
[33 34 35]] [81 82 83]]
[[36 37 38] [[84 85 86]
[39 40 41] [87 88 89]
[42 43 44] [90 91 92]
[45 46 47]]] [93 94 95]]]
With data[0:4, 0:4, 1] you getting each second element from 4x4 array:
[[[ 3 4 5] [[51 52 53]
[15 16 17] [63 64 65]
[27 28 29] [75 76 77]
[39 40 41]] [87 88 89]]]
On the other hand with data[0:4, 0:4][1] you will get second part of 4x4x2 array:
[[[48 49 50]
[51 52 53]
[54 55 56]
[57 58 59]]
[[60 61 62]
[63 64 65]
[66 67 68]
[69 70 71]]
[[72 73 74]
[75 76 77]
[78 79 80]
[81 82 83]]
[[84 85 86]
[87 88 89]
[90 91 92]
[93 94 95]]]

Fill in missing values with nearest neighbour in Python numpy masked arrays?

I am working with a 2D Numpy masked_array in Python.
I need to change the data values in the masked area such that they equal the nearest unmasked value.
NB. If there are more than one nearest unmasked values then it can take any of those nearest values (which ever one turns out to be easiest to codeā€¦)
e.g.
import numpy
import numpy.ma as ma
a = numpy.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
>>> a [[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 -- -- -- -- -- 28 29]
[30 31 32 -- -- -- -- -- 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 -- 89]
[90 91 92 93 94 95 96 97 98 99]],
I need it to look like this:
>>> a.data
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 ? 14 15 16 ? 28 29]
[30 31 32 ? 44 45 46 ? 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 ? 89]
[90 91 92 93 94 95 96 97 98 99]],
NB. where "?" could take any of the adjacent unmasked values.
What is the most efficient way to do this?
Thanks for your help.
I generally use a distance transform, as wisely suggested by Juh_ in this question.
This does not directly apply to masked arrays, but I do not think it will be that hard to transpose there, and it is quite efficient, I've had no problem applying it to large 100MPix images.
Copying the relevant method there for reference :
import numpy as np
from scipy import ndimage as nd
def fill(data, invalid=None):
"""
Replace the value of invalid 'data' cells (indicated by 'invalid')
by the value of the nearest valid data cell
Input:
data: numpy array of any dimension
invalid: a binary array of same shape as 'data'. True cells set where data
value should be replaced.
If None (default), use: invalid = np.isnan(data)
Output:
Return a filled array.
"""
#import numpy as np
#import scipy.ndimage as nd
if invalid is None: invalid = np.isnan(data)
ind = nd.distance_transform_edt(invalid, return_distances=False, return_indices=True)
return data[tuple(ind)]
You could use np.roll to make shifted copies of a, then use boolean logic on the masks to identify the spots to be filled in:
import numpy as np
import numpy.ma as ma
a = np.arange(100).reshape(10,10)
fill_value=-99
a[2:4,3:8] = fill_value
a[8,8] = fill_value
a = ma.masked_array(a,a==fill_value)
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 -- -- -- -- -- 28 29]
# [30 31 32 -- -- -- -- -- 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 -- 89]
# [90 91 92 93 94 95 96 97 98 99]]
for shift in (-1,1):
for axis in (0,1):
a_shifted=np.roll(a,shift=shift,axis=axis)
idx=~a_shifted.mask * a.mask
a[idx]=a_shifted[idx]
print(a)
# [[0 1 2 3 4 5 6 7 8 9]
# [10 11 12 13 14 15 16 17 18 19]
# [20 21 22 13 14 15 16 28 28 29]
# [30 31 32 43 44 45 46 47 38 39]
# [40 41 42 43 44 45 46 47 48 49]
# [50 51 52 53 54 55 56 57 58 59]
# [60 61 62 63 64 65 66 67 68 69]
# [70 71 72 73 74 75 76 77 78 79]
# [80 81 82 83 84 85 86 87 98 89]
# [90 91 92 93 94 95 96 97 98 99]]
If you'd like to use a larger set of nearest neighbors, you could perhaps do something like this:
neighbors=((0,1),(0,-1),(1,0),(-1,0),(1,1),(-1,1),(1,-1),(-1,-1),
(0,2),(0,-2),(2,0),(-2,0))
Note that the order of the elements in neighbors is important. You probably want to fill in missing values with the nearest neighbor, not just any neighbor. There's probably a smarter way to generate the neighbors sequence, but I'm not seeing it at the moment.
a_copy=a.copy()
for hor_shift,vert_shift in neighbors:
if not np.any(a.mask): break
a_shifted=np.roll(a_copy,shift=hor_shift,axis=1)
a_shifted=np.roll(a_shifted,shift=vert_shift,axis=0)
idx=~a_shifted.mask*a.mask
a[idx]=a_shifted[idx]
Note that np.roll happily rolls the lower edge to the top, so a missing value at the top may be filled in by a value from the very bottom. If this is a problem, I'd have to think more about how to fix it. The obvious but not very clever solution would be to use if statements and feed the edges a different sequence of admissible neighbors...
For more complicated cases you could use scipy.spatial:
from scipy.spatial import KDTree
x,y=np.mgrid[0:a.shape[0],0:a.shape[1]]
xygood = np.array((x[~a.mask],y[~a.mask])).T
xybad = np.array((x[a.mask],y[a.mask])).T
a[a.mask] = a[~a.mask][KDTree(xygood).query(xybad)[1]]
print a
[[0 1 2 3 4 5 6 7 8 9]
[10 11 12 13 14 15 16 17 18 19]
[20 21 22 13 14 15 16 17 28 29]
[30 31 32 32 44 45 46 38 38 39]
[40 41 42 43 44 45 46 47 48 49]
[50 51 52 53 54 55 56 57 58 59]
[60 61 62 63 64 65 66 67 68 69]
[70 71 72 73 74 75 76 77 78 79]
[80 81 82 83 84 85 86 87 78 89]
[90 91 92 93 94 95 96 97 98 99]]

Categories