Reusable code to iterate along different array dimensions - python

Say, I have an N dimensional array my_array[D1][D2]...[DN]
For a certain application, like sensitivity analysis, I need to fix a point p=(d1, d2, ..., dN) and iterate along each dimension at a time.
The resulting behavior is
for x1 in range(0, D1):
do_something(my_array[x1][d2][d3]...[dN])
for x2 in range(0, D2):
do_something(my_array[d1][x2][d3]...[dN])
.
.
.
for xN in range(0, DN):
do_something(my_array[d1][d2][d3]...[xN])
As you can see, there are many duplicated code here. How can I reduce the work and write some elegant code instead?
For example, is there any approach to the generation of code similar to the below?
for d in range(0, N):
iterate along the (d+1)th dimension of my_array, denoting the element as x:
do_something(x)

You can use numpy.take and do something like the following. Go through the documentation for reference.
https://docs.scipy.org/doc/numpy/reference/generated/numpy.take.html
N = len(my_array)
for i in range(N):
n = len(my_array(i))
indices = p
indices[i] = x[i]
for j in range(n):
do_something(np.take(my_array,indices))

I don't understand what are d1 d2 d3, but I guess you can do something like this:
def get_list_item_by_indexes_list(in_list, indexes_list):
if len(indexes_list) <= 1:
return in_list[indexes_list[0]]
else:
return get_list_item_by_indexes_list(in_list[indexes_list[0]], indexes_list[1:])
def do_to_each_dimension(multi_list, func, dimensions_lens):
d0_to_dN_list = [l - 1 for l in dimensions_lens] # I dont know what is it
for dimension_index in range(0, len(dimensions_lens)):
dimension_len = dimensions_lens[dimension_index]
for x in range(0, dimension_len):
curr_d0_to_dN_list = d0_to_dN_list.copy()
curr_d0_to_dN_list[dimension_index] = x
func(get_list_item_by_indexes_list(multi_list, curr_d0_to_dN_list))
def do_something(n):
print(n)
dimensions_lens = [3, 5]
my_array = [
[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10],
[11, 12, 13, 14, 15]
]
do_to_each_dimension(my_array, do_something, dimensions_lens)
Output:
5 10 15 11 12 13 14 15
This code iterates through the last column and the last row of a 2d array.
Now, to iterate through the last line of each dimension of 3d array:
dimensions_lens = [2, 4, 3]
my_array = [
[
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
],
[
[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24]
],
]
do_to_each_dimension(my_array, do_something, dimensions_lens)
Output:
12 24 15 18 21 24 22 23 24
(Note: don't use zero-length dimensions with this code)

You could mess with the string representation of your array access (my_arr[d1][d2]...[dN]) and eval that afterwards to get the values you want. This is fairly "hacky", but it will work on arrays with arbitrary dimensions and allows you to supply the indices as a list while handling the nested array access under the hood, allowing for a clean double for loop .
def access_at(arr, point):
# build 'arr[p1][p2]...[pN]'
access_str = 'arr' + ''.join([f'[{p}]' for p in point])
return eval(access_str)
Using this access method is pretty straight forward:
p = [p1, ..., pN]
D = [D1, ..., DN]
for i in range(N):
# deep copy p
pt = p[:]
for x in range(D[i]):
pt[i] = x
do_something(access_at(my_array, pt))

Related

Dividing numpy 2d array into equal sections

I have a 30*30px image and I converted it to a NumPy array. Now I want to divide this 30*30 image into 9 equal pieces (imagine a tic-tak-toe game). I wrote the code below for that purpose but the problem with my code is that it has two nested loops and in python, that means a straight ticket to lower-performance town (specially for large number of datas). So is there a better way of doing this using NumPy and Numpy indexing?
#Factor is saing that the image should be divided into 9 sections 3*3 = 9 (kinda like 3 rows 3 columns)
def section(img , factor = 3):
secs = []
#This basicaly tests if the image can actually get divided into equal sections
if (img.shape[0] % factor != 0):
return False
#number of pixel in each row and column of the sections
pix_num = int(img.shape[0] / factor)
ptr_x_a = 0
ptr_x_b = pix_num -1
for i in range(factor):
ptr_y_a = 0
ptr_y_b = pix_num - 1
for j in range(factor):
secs.append( img[ptr_x_a :ptr_x_b , ptr_y_a : ptr_y_b] )
ptr_y_a += pix_num
ptr_y_b += pix_num
ptr_x_a += pix_num
ptr_x_b += pix_num
return np.array(secs , dtype = "int16"‍‍‍‍‍‍‍)
P.S: Don't mind reading the whole code, just know that it uses pointers to select different areas of the image.
P.S2: See the image below to get an idea of what's happening. It is a 6*6 image divided into 9 pieces (factor = 3)
If you have an array of shape (K * M, K * N), you can transform it into something of shape (K * K, M, N) using reshape and transpose. For example, if you have K = M = N = 3, you want to transform
>>> a = np.arange(81).reshape(9, 9)
into
[[[ 0, 1, 2],
[ 9, 10, 11],
[18, 19, 20]],
[[ 3, 4, 5],
[12, 13, 14],
[21, 22, 23]],
[[ 6, 7, 8],
[15, 16, 17],
[24, 25, 26]],
...
]]]
The idea is that you need to get the elements lined up in memory in the order shown here (i.e. 0, 1, 2, 9, 10, 11, 18, ...). You can do this by adding the appropriate auxiliary dimensions and transposing:
b = a.reshape(K, M, K, N)
c = b.transpose(0, 2, 1, 3)
d = c.reahape(-1, M, N)
As a one-liner:
a.reshape(K, M, K, N).transpose(0, 2 1, 3).reshape(-1, M, N)
The order of the transpose determines the order of the blocks. The first two dimensions, 0, 2, represent the fact that your inner loop iterates the columns faster than the rows. If you wanted to arrange the blocks by column (iterate the rows faster), you could do
c = b.transpose(2, 0, 1, 3)
Reshaping does not change the memory layout of the elements, but transposing copies data if necessary.
In your particular example, K = 3 and M = N = 10. The code above does not change in any way besides that.
As an aside, your loops can be improved by making the ranges directly over the indices you want rather auxiliary quantities, as well as pre-allocating the output:
result = np.zeros(factor * factor, pix_num, pix_num)
n = 0
for r in range(0, img.shape[0], pix_num):
for c in range(0, img.shape[1], pix_num):
result[n, :, :] = img[r:r + pix_num, c:c + pix_num]
n += 1
a = np.arange(36)
a.resize(6, 6)
print(a)
b = list(map(lambda x: np.array_split(x, 3, axis=1), np.array_split(a, 3, axis=0)))
print(np.array(b).reshape(9,2,2))
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]
[24 25 26 27 28 29]
[30 31 32 33 34 35]]
[[[ 0 1]
[ 6 7]]
[[ 2 3]
[ 8 9]]
[[ 4 5]
[10 11]]
[[12 13]
[18 19]]
[[14 15]
[20 21]]
[[16 17]
[22 23]]
[[24 25]
[30 31]]
[[26 27]
[32 33]]
[[28 29]
[34 35]]]
You can do something like this to get one section:
sec = img[:10]
sec = list(zip(*sec))[:10]
sec = list(zip(*sec))
This will pick out the first 10x10 section.

Probability of an element in list

My data
List = [[[12,1,6],[12,1,6],15],[[12,2,6],[12,2,6],18]],[[12,3,6],[12,3,6],24]]
I have a data containing
number of rows having a transition from 12,1,6 to 12,1,6 is 15
number of rows having a transition from 12,2,6 to 12,2,6 is 18
number of rows having a transition from 12,3,6 to 12,3,6 is 24
as list
This data is not generated.There are many other possible combinations are there in my data.The above said is the sample
I want my output to be a list having probabilities of this transition
for example
P1 = the probability of transition from 12,1,6 to 12,1,6
= 15/total length of rows/elements in the list.(In this case it is 3)
P2 = the probability of transition from 12,2,6 to 12,2,6
= 18/total length of rows in the list
my output needs to be
List =[[[12,1,6],[12,1,6],15,P1=(15/3)*100],[[12,2,6],[12,2,6],18,P2]],[[12,3,6],[12,3,6],24,P3]]
Have tried a lot and would be helpful if i get suggestions.
def Sort(sub_li):
sub_li.sort(reverse = True, key = lambda x: x[1])
return sub_li
print(Sort())
List had an extra ] after 18, so I removed it before writing this piece of code, which assumes that the length is the same for all the rows.
List = [[[12,1,6],[12,1,6],15],[[12,2,6],[12,2,6],18],[[12,3,6],[12,3,6],24]]
for i, value in enumerate(List):
value.append(("P%d=%f" % ((i + 1), value[2] / len(value[0]) * 100)))
print(List)
Output:
[[[12, 1, 6], [12, 1, 6], 15, 'P1=500.000000'], [[12, 2, 6], [12, 2, 6], 18, 'P2=600.000000'], [[12, 3, 6], [12, 3, 6], 24, 'P3=800.000000']]

Splitting arrays depending on unique values in an array

I currently have two arrays, one of which has several repeated values and another with unique values.
Eg array 1 : a = [1, 1, 2, 2, 3, 3]
Eg array 2 : b = [10, 11, 12, 13, 14, 15]
I was developing a code in python that looks at the first array and distinguishes the elements that are all the same and remembers the indices. A new array is created that contains the elements of array b at those indices.
Eg: As array 'a' has three unique values at positions 1,2... 3,4... 5,6, then three new arrays would be created such that it contains the elements of array b at positions 1,2... 3,4... 5,6. Thus, the result would be three new arrays:
b1 = [10, 11]
b2 = [12, 13]
b3 = [14, 15]
I have managed to develop a code, however, it only works for when there are three unique values in array 'a'. In the case there are more or less unique values in array 'a', the code has to be physically modified.
import itertools
import numpy as np
import matplotlib.tri as tri
import sys
a = [1, 1, 2, 2, 3, 3]
b = [10, 10, 20, 20, 30, 30]
b_1 = []
b_2 = []
b_3 = []
unique = []
for vals in a:
if vals not in unique:
unique.append(vals)
if len(unique) != 3:
sys.exit("More than 3 'a' values - check dimension")
for j in range(0,len(a)):
if a[j] == unique[0]:
b_1.append(c[j])
elif a[j] == unique[1]:
b_2.append(c[j])
elif a[j] == unique[2]:
b_3.append(c[j])
else:
sys.exit("More than 3 'a' values - check dimension")
print (b_1)
print (b_2)
print (b_3)
I was wondering if there is perhaps a more elegant way to perform this task such that the code is able to cope with an n number of unique values.
Well given that you are also using numpy, here's one way using np.unique. You can set return_index=True to get the indices of the unique values, and use them to split the array b with np.split:
a = np.array([1, 1, 2, 2, 3, 3])
b = np.array([10, 11, 12, 13, 14, 15])
u, s = np.unique(a, return_index=True)
np.split(b,s[1:])
Output
[array([10, 11]), array([12, 13]), array([14, 15])]
You can use the function groupby():
from itertools import groupby
from operator import itemgetter
a = [1, 1, 2, 2, 3, 3]
b = [10, 11, 12, 13, 14, 15]
[[i[1] for i in g] for _, g in groupby(zip(a, b), key=itemgetter(0))]
# [[10, 11], [12, 13], [14, 15]]

Get Maximum Value across rows and columns of a python Matrix

Consider the question:
The grid is:
[ [3, 0, 8, 4],
[2, 4, 5, 7],
[9, 2, 6, 3],
[0, 3, 1, 0] ]
The max viewed from top (i.e. max across columns) is: [9, 4, 8, 7]
The max viewed from left (i.e. max across rows) is: [8, 7, 9, 3]
I know how to define a grid in Python:
maximums = [[0 for x in range(len(grid[0]))] for x in range(len(grid))]
Getting maximum across rows looks easy:
max_top = [max(x) for x in grid]
But how to get maximum across columns?
Further, I need to find a way to do so in linear space O(M+N) where MxN is the size of the Matrix.
Use zip:
result = [max(i) for i in zip(*grid)]
In Python, * is not a pointer, rather, it is used for unpacking a structure passed to an object's parameter or specifying that the object can receive a variable number of items. For instance:
def f(*args):
print(args)
f(434, 424, "val", 233, "another val")
Output:
(434, 424, 'val', 233, 'another val')
Or, given an iterable, each item can be inserted at its corresponding function parameter:
def f(*args):
print(args)
f(*["val", "val3", 23, 23])
>>>('val', 'val3', 23, 23)
zip "transposes" a listing of data i.e each row becomes a column, and vice versa.
You could use numpy:
import numpy as np
x = np.array([ [3, 0, 8, 4],
[2, 4, 5, 7],
[9, 2, 6, 3],
[0, 3, 1, 0] ])
print(x.max(axis=0))
Output:
[9 4 8 7]
You said that you need to do this in O(m+n) space (not using numpy), so here's a solution that doesn't recreate the matrix:
max = x[0]
for i in x:
for j, k in enumerate(i):
if k > max[j]:
max[j] = k
print(max)
Output:
[9, 4, 8, 7]
I figured a shortcut too:
transpose the matrix and then just take maximum over rows:
grid_transposed = [[grid[j][i] for j in range(len(grid[0]))] for i in range(len(grid))]
max_left = [max(x) for x in grid]
But then again this takes O(M*N) space I have to alter the matrix.
I don't want to use numpy as external libraries are not allowed in any assignments.
Easiest way is to use numpy's array max:
array.max(0)
Something like these works both ways and is quite easy to read:
# 1.
maxLR, maxTB = [], []
maxlr, maxtb = 0, 0
# max across rows
for i, x in enumerate(grid):
maxlr = 0
for j, y in enumerate(grid[0]):
maxlr = max(maxlr, grid[i][j])
maxLR.append(maxlr)
# max across columns
for j, y in enumerate(grid[0]):
maxtb = 0
for i, x in enumerate(grid):
maxtb = max(maxtb, grid[i][j])
maxTB.append(maxtb)
# 2.
row_maxes = [max(row) for row in grid]
col_maxes = [max(col) for col in zip(*grid)]

Slice 2d array into smaller 2d arrays

Is there a way to slice a 2d array in numpy into smaller 2d arrays?
Example
[[1,2,3,4], -> [[1,2] [3,4]
[5,6,7,8]] [5,6] [7,8]]
So I basically want to cut down a 2x4 array into 2 2x2 arrays. Looking for a generic solution to be used on images.
There was another question a couple of months ago which clued me in to the idea of using reshape and swapaxes. The h//nrows makes sense since this keeps the first block's rows together. It also makes sense that you'll need nrows and ncols to be part of the shape. -1 tells reshape to fill in whatever number is necessary to make the reshape valid. Armed with the form of the solution, I just tried things until I found the formula that works.
You should be able to break your array into "blocks" using some combination of reshape and swapaxes:
def blockshaped(arr, nrows, ncols):
"""
Return an array of shape (n, nrows, ncols) where
n * nrows * ncols = arr.size
If arr is a 2D array, the returned array should look like n subblocks with
each subblock preserving the "physical" layout of arr.
"""
h, w = arr.shape
assert h % nrows == 0, f"{h} rows is not evenly divisible by {nrows}"
assert w % ncols == 0, f"{w} cols is not evenly divisible by {ncols}"
return (arr.reshape(h//nrows, nrows, -1, ncols)
.swapaxes(1,2)
.reshape(-1, nrows, ncols))
turns c
np.random.seed(365)
c = np.arange(24).reshape((4, 6))
print(c)
[out]:
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
into
print(blockshaped(c, 2, 3))
[out]:
[[[ 0 1 2]
[ 6 7 8]]
[[ 3 4 5]
[ 9 10 11]]
[[12 13 14]
[18 19 20]]
[[15 16 17]
[21 22 23]]]
I've posted an inverse function, unblockshaped, here, and an N-dimensional generalization here. The generalization gives a little more insight into the reasoning behind this algorithm.
Note that there is also superbatfish's
blockwise_view. It arranges the
blocks in a different format (using more axes) but it has the advantage of (1)
always returning a view and (2) being capable of handling arrays of any
dimension.
It seems to me that this is a task for numpy.split or some variant.
e.g.
a = np.arange(30).reshape([5,6]) #a.shape = (5,6)
a1 = np.split(a,3,axis=1)
#'a1' is a list of 3 arrays of shape (5,2)
a2 = np.split(a, [2,4])
#'a2' is a list of three arrays of shape (2,5), (2,5), (1,5)
If you have a NxN image you can create, e.g., a list of 2 NxN/2 subimages, and then divide them along the other axis.
numpy.hsplit and numpy.vsplit are also available.
There are some other answers that seem well-suited for your specific case already, but your question piqued my interest in the possibility of a memory-efficient solution usable up to the maximum number of dimensions that numpy supports, and I ended up spending most of the afternoon coming up with possible method. (The method itself is relatively simple, it's just that I still haven't used most of the really fancy features that numpy supports so most of the time was spent researching to see what numpy had available and how much it could do so that I didn't have to do it.)
def blockgen(array, bpa):
"""Creates a generator that yields multidimensional blocks from the given
array(_like); bpa is an array_like consisting of the number of blocks per axis
(minimum of 1, must be a divisor of the corresponding axis size of array). As
the blocks are selected using normal numpy slicing, they will be views rather
than copies; this is good for very large multidimensional arrays that are being
blocked, and for very large blocks, but it also means that the result must be
copied if it is to be modified (unless modifying the original data as well is
intended)."""
bpa = np.asarray(bpa) # in case bpa wasn't already an ndarray
# parameter checking
if array.ndim != bpa.size: # bpa doesn't match array dimensionality
raise ValueError("Size of bpa must be equal to the array dimensionality.")
if (bpa.dtype != np.int # bpa must be all integers
or (bpa < 1).any() # all values in bpa must be >= 1
or (array.shape % bpa).any()): # % != 0 means not evenly divisible
raise ValueError("bpa ({0}) must consist of nonzero positive integers "
"that evenly divide the corresponding array axis "
"size".format(bpa))
# generate block edge indices
rgen = (np.r_[:array.shape[i]+1:array.shape[i]//blk_n]
for i, blk_n in enumerate(bpa))
# build slice sequences for each axis (unfortunately broadcasting
# can't be used to make the items easy to operate over
c = [[np.s_[i:j] for i, j in zip(r[:-1], r[1:])] for r in rgen]
# Now to get the blocks; this is slightly less efficient than it could be
# because numpy doesn't like jagged arrays and I didn't feel like writing
# a ufunc for it.
for idxs in np.ndindex(*bpa):
blockbounds = tuple(c[j][idxs[j]] for j in range(bpa.size))
yield array[blockbounds]
You question practically the same as this one. You can use the one-liner with np.ndindex() and reshape():
def cutter(a, r, c):
lenr = a.shape[0]/r
lenc = a.shape[1]/c
np.array([a[i*r:(i+1)*r,j*c:(j+1)*c] for (i,j) in np.ndindex(lenr,lenc)]).reshape(lenr,lenc,r,c)
To create the result you want:
a = np.arange(1,9).reshape(2,1)
#array([[1, 2, 3, 4],
# [5, 6, 7, 8]])
cutter( a, 1, 2 )
#array([[[[1, 2]],
# [[3, 4]]],
# [[[5, 6]],
# [[7, 8]]]])
Some minor enhancement to TheMeaningfulEngineer's answer that handles the case when the big 2d array cannot be perfectly sliced into equally sized subarrays
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
Examples:
a = np.arange(25)
a = a.reshape((5,5))
out = blockfy(a, 2, 3)
a->
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
out[0] ->
array([[0., 1., 2.],
[5., 6., 7.]])
out[1]->
array([[3., 4.],
[8., 9.]])
out[-1]->
array([[23., 24.]])
For now it just works when the big 2d array can be perfectly sliced into equally sized subarrays.
The code bellow slices
a ->array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
into this
block_array->
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]]])
p ang q determine the block size
Code
a = arange(24)
a = a.reshape((4,6))
m = a.shape[0] #image row size
n = a.shape[1] #image column size
p = 2 #block row size
q = 3 #block column size
block_array = []
previous_row = 0
for row_block in range(blocks_per_row):
previous_row = row_block * p
previous_column = 0
for column_block in range(blocks_per_column):
previous_column = column_block * q
block = a[previous_row:previous_row+p,previous_column:previous_column+q]
block_array.append(block)
block_array = array(block_array)
If you want a solution that also handles the cases when the matrix is
not equally divided, you can use this:
from operator import add
half_split = np.array_split(input, 2)
res = map(lambda x: np.array_split(x, 2, axis=1), half_split)
res = reduce(add, res)
Here is a solution based on unutbu's answer that handle case where matrix cannot be equally divided. In this case, it will resize the matrix before using some interpolation. You need OpenCV for this. Note that I had to swap ncols and nrows to make it works, didn't figured why.
import numpy as np
import cv2
import math
def blockshaped(arr, r_nbrs, c_nbrs, interp=cv2.INTER_LINEAR):
"""
arr a 2D array, typically an image
r_nbrs numbers of rows
r_cols numbers of cols
"""
arr_h, arr_w = arr.shape
size_w = int( math.floor(arr_w // c_nbrs) * c_nbrs )
size_h = int( math.floor(arr_h // r_nbrs) * r_nbrs )
if size_w != arr_w or size_h != arr_h:
arr = cv2.resize(arr, (size_w, size_h), interpolation=interp)
nrows = int(size_w // r_nbrs)
ncols = int(size_h // c_nbrs)
return (arr.reshape(r_nbrs, ncols, -1, nrows)
.swapaxes(1,2)
.reshape(-1, ncols, nrows))
a = np.random.randint(1, 9, size=(9,9))
out = [np.hsplit(x, 3) for x in np.vsplit(a,3)]
print(a)
print(out)
yields
[[7 6 2 4 4 2 5 2 3]
[2 3 7 6 8 8 2 6 2]
[4 1 3 1 3 8 1 3 7]
[6 1 1 5 7 2 1 5 8]
[8 8 7 6 6 1 8 8 4]
[6 1 8 2 1 4 5 1 8]
[7 3 4 2 5 6 1 2 7]
[4 6 7 5 8 2 8 2 8]
[6 6 5 5 6 1 2 6 4]]
[[array([[7, 6, 2],
[2, 3, 7],
[4, 1, 3]]), array([[4, 4, 2],
[6, 8, 8],
[1, 3, 8]]), array([[5, 2, 3],
[2, 6, 2],
[1, 3, 7]])], [array([[6, 1, 1],
[8, 8, 7],
[6, 1, 8]]), array([[5, 7, 2],
[6, 6, 1],
[2, 1, 4]]), array([[1, 5, 8],
[8, 8, 4],
[5, 1, 8]])], [array([[7, 3, 4],
[4, 6, 7],
[6, 6, 5]]), array([[2, 5, 6],
[5, 8, 2],
[5, 6, 1]]), array([[1, 2, 7],
[8, 2, 8],
[2, 6, 4]])]]
I publish my solution. Notice that this code doesn't' actually create copies of original array, so it works well with big data. Moreover, it doesn't crash if array cannot be divided evenly (but you can easly add condition for that by deleting ceil and checking if v_slices and h_slices are divided without rest).
import numpy as np
from math import ceil
a = np.arange(9).reshape(3, 3)
p, q = 2, 2
width, height = a.shape
v_slices = ceil(width / p)
h_slices = ceil(height / q)
for h in range(h_slices):
for v in range(v_slices):
block = a[h * p : h * p + p, v * q : v * q + q]
# do something with a block
This code changes (or, more precisely, gives you direct access to part of an array) this:
[[0 1 2]
[3 4 5]
[6 7 8]]
Into this:
[[0 1]
[3 4]]
[[2]
[5]]
[[6 7]]
[[8]]
If you need actual copies, Aenaon code is what you are looking for.
If you are sure that big array can be divided evenly, you can use numpy splitting tools.
to add to #Aenaon answer and his blockfy function, if you are working with COLOR IMAGES/ 3D ARRAY here is my pipeline to create crops of 224 x 224 for 3 channel input
def blockfy(a, p, q):
'''
Divides array a into subarrays of size p-by-q
p: block row size
q: block column size
'''
m = a.shape[0] #image row size
n = a.shape[1] #image column size
# pad array with NaNs so it can be divided by p row-wise and by q column-wise
bpr = ((m-1)//p + 1) #blocks per row
bpc = ((n-1)//q + 1) #blocks per column
M = p * bpr
N = q * bpc
A = np.nan* np.ones([M,N])
A[:a.shape[0],:a.shape[1]] = a
block_list = []
previous_row = 0
for row_block in range(bpc):
previous_row = row_block * p
previous_column = 0
for column_block in range(bpr):
previous_column = column_block * q
block = A[previous_row:previous_row+p, previous_column:previous_column+q]
# remove nan columns and nan rows
nan_cols = np.all(np.isnan(block), axis=0)
block = block[:, ~nan_cols]
nan_rows = np.all(np.isnan(block), axis=1)
block = block[~nan_rows, :]
## append
if block.size:
block_list.append(block)
return block_list
then extended above to
for file in os.listdir(path_to_crop): ### list files in your folder
img = io.imread(path_to_crop + file, as_gray=False) ### open image
r = blockfy(img[:,:,0],224,224) ### crop blocks of 224 x 224 for red channel
g = blockfy(img[:,:,1],224,224) ### crop blocks of 224 x 224 for green channel
b = blockfy(img[:,:,2],224,224) ### crop blocks of 224 x 224 for blue channel
for x in range(0,len(r)):
img = np.array((r[x],g[x],b[x])) ### combine each channel into one patch by patch
img = img.astype(np.uint8) ### cast back to proper integers
img_swap = img.swapaxes(0, 2) ### need to swap axes due to the way things were proceesed
img_swap_2 = img_swap.swapaxes(0, 1) ### do it again
Image.fromarray(img_swap_2).save(path_save_crop+str(x)+"bounding" + file,
format = 'jpeg',
subsampling=0,
quality=100) ### save patch with new name etc

Categories