numpy method to join two meshgrids and their result arrays - python
Consider two n-dimensional, possibly overlapping, numpy meshgrids, say
m1 = (x1, y1, z1, ...)
m2 = (x2, y2, z2, ...)
Within m1 and m2 there are no duplicate coordinate tuples. Each meshgrid has a result array, which may result from different functions:
r1 = f1(m1)
r2 = f2(m2)
such that f1(m) != f2(m). Now I would like to join those two meshgrids and their result arrays, e.g. m=m1&m2 and r=r1&r2 (where & would denote some kind of union), such that the coordinate tuples in m are still sorted and the values in r still correspond to the original coordinate tuples. Newly created coordinate tuples should be identifiable (for instance with a special value).
To elaborate on what I'm after, I have two examples that kind of do what I want with simple for and if statements. Here's a 1D example:
x1 = [1, 5, 7]
r1 = [i**2 for i in x1]
x2 = [2, 4, 6]
r2 = [i*3 for i in x2]
x,r = list(zip(*sorted([(i,j) for i,j in zip(x1+x2,r1+r2)],key=lambda x: x[0])))
which gives
x = (1, 2, 4, 5, 6, 7)
r = (1, 6, 12, 25, 18, 49)
For 2D it starts getting quite complicated:
import numpy as np
a1 = [1, 5, 7]
b1 = [2, 5, 6]
x1,y1 = np.meshgrid(a1,b1)
r1 = x1*y1
a2 = [2, 4, 6]
b2 = [1, 3, 8]
x2, y2 = np.meshgrid(a2,b2)
r2 = 2*x2
a = [1, 2, 4, 5, 6, 7]
b = [1, 2, 3, 5, 6, 8]
x,y = np.meshgrid(a,b)
r = np.ones(x.shape)*-1
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if x[i,j] in a1 and y[i,j] in b1:
r[i,j] = r1[a1.index(x[i,j]),b1.index(y[i,j])]
elif x[i,j] in a2 and y[i,j] in b2:
r[i,j] = r2[a2.index(x[i,j]),b2.index(y[i,j])]
This gives the desired result, with new coordinate pairs having the value -1:
x=
[[1 2 4 5 6 7]
[1 2 4 5 6 7]
[1 2 4 5 6 7]
[1 2 4 5 6 7]
[1 2 4 5 6 7]
[1 2 4 5 6 7]]
y=
[[1 1 1 1 1 1]
[2 2 2 2 2 2]
[3 3 3 3 3 3]
[5 5 5 5 5 5]
[6 6 6 6 6 6]
[8 8 8 8 8 8]]
r=
[[ -1. 4. 4. -1. 4. -1.]
[ 2. -1. -1. 5. -1. 6.]
[ -1. 8. 8. -1. 8. -1.]
[ 10. -1. -1. 25. -1. 30.]
[ 14. -1. -1. 35. -1. 42.]
[ -1. 12. 12. -1. 12. -1.]]
but this will also become slow quickly with increasing dimensions and array sizes. So here finally the question: How can this be done using only numpy functions. If it is not possible, what would be the fastest way to implement this in python. If it is anyhow relevant, I prefer using Python 3. Note that the functions I use in the examples are not the actual functions I use.
We can make use of some masking to replace the A in B parts to give us 1D masks. Then, we can use those masks with np.ix_ to extend to desired number of dimensions.
Thus, for a 2D case, it would be something along these lines -
# Initialize o/p array
r_out = np.full([len(a), len(b)],-1)
# Assign for the IF part
mask_a1 = np.in1d(a,a1)
mask_b1 = np.in1d(b,b1)
r_out[np.ix_(mask_b1, mask_a1)] = r1.T
# Assign for the ELIF part
mask_a2 = np.in1d(a,a2)
mask_b2 = np.in1d(b,b2)
r_out[np.ix_(mask_b2, mask_a2)] = r2.T
a could be created, like so -
a = np.concatenate((a1,a2))
a.sort()
Similarly, for b.
Also, we could make use of indices instead of masks for use with np.ix_. For the same, we could use np.searchsorted. Thus, instead of the mask np.in1d(a,a1), we could get corresponding indices with np.searchsorted(a,a1) and so on for the rest of the masks. This should be considerably faster.
For a 3D case, I would assume that we would have another array, say c. Thus, the initialization part would involve using len(c). There would be one more mask/index-array corresponding to c and hence one more term into np.ix_ and there would be transpose of r1 and r2.
Divakar's answer is exactly what I needed. I wanted, however, to still try out the second suggestion in that answer and on top I did some profiling. I thought the results may be interesting to others. Here is the code I used for profiling:
import numpy as np
import timeit
import random
def for_join_2d(x1,y1,r1, x2,y2,r2):
"""
The algorithm from the question.
"""
a = sorted(list(x1[0,:])+list(x2[0,:]))
b = sorted(list(y1[:,0])+list(y2[:,0]))
x,y = np.meshgrid(a,b)
r = np.ones(x.shape)*-1
for i in range(x.shape[0]):
for j in range(x.shape[1]):
if x[i,j] in a1 and y[i,j] in b1:
r[i,j] = r1[a1.index(x[i,j]),b1.index(y[i,j])]
elif x[i,j] in a2 and y[i,j] in b2:
r[i,j] = r2[a2.index(x[i,j]),b2.index(y[i,j])]
return x,y,r
def mask_join_2d(x1,y1,r1,x2,y2,r2):
"""
Divakar's original answer.
"""
a = np.sort(np.concatenate((x1[0,:],x2[0,:])))
b = np.sort(np.concatenate((y1[:,0],y2[:,0])))
# Initialize o/p array
x,y = np.meshgrid(a,b)
r_out = np.full([len(a), len(b)],-1)
# Assign for the IF part
mask_a1 = np.in1d(a,a1)
mask_b1 = np.in1d(b,b1)
r_out[np.ix_(mask_b1, mask_a1)] = r1.T
# Assign for the ELIF part
mask_a2 = np.in1d(a,a2)
mask_b2 = np.in1d(b,b2)
r_out[np.ix_(mask_b2, mask_a2)] = r2.T
return x,y,r_out
def searchsort_join_2d(x1,y1,r1,x2,y2,r2):
"""
Divakar's second suggested solution using searchsort.
"""
a = np.sort(np.concatenate((x1[0,:],x2[0,:])))
b = np.sort(np.concatenate((y1[:,0],y2[:,0])))
# Initialize o/p array
x,y = np.meshgrid(a,b)
r_out = np.full([len(a), len(b)],-1)
#the IF part
ind_a1 = np.searchsorted(a,a1)
ind_b1 = np.searchsorted(b,b1)
r_out[np.ix_(ind_b1,ind_a1)] = r1.T
#the ELIF part
ind_a2 = np.searchsorted(a,a2)
ind_b2 = np.searchsorted(b,b2)
r_out[np.ix_(ind_b2,ind_a2)] = r2.T
return x,y,r_out
##the profiling code:
if __name__ == '__main__':
N1 = 100
N2 = 100
coords_a = [i for i in range(N1)]
coords_b = [i*2 for i in range(N2)]
a1 = random.sample(coords_a, N1//2)
b1 = random.sample(coords_b, N2//2)
a2 = [i for i in coords_a if i not in a1]
b2 = [i for i in coords_b if i not in b1]
x1,y1 = np.meshgrid(a1,b1)
r1 = x1*y1
x2,y2 = np.meshgrid(a2,b2)
r2 = 2*x2
print("original for loop")
print(min(timeit.Timer(
'for_join_2d(x1,y1,r1,x2,y2,r2)',
setup = 'from __main__ import for_join_2d,x1,y1,r1,x2,y2,r2',
).repeat(7,1000)))
print("with masks")
print(min(timeit.Timer(
'mask_join_2d(x1,y1,r1,x2,y2,r2)',
setup = 'from __main__ import mask_join_2d,x1,y1,r1,x2,y2,r2',
).repeat(7,1000)))
print("with searchsort")
print(min(timeit.Timer(
'searchsort_join_2d(x1,y1,r1,x2,y2,r2)',
setup = 'from __main__ import searchsort_join_2d,x1,y1,r1,x2,y2,r2',
).repeat(7,1000)))
For each function I used 7 sets of 1000 iterations and picked the fastest set for evaluation. The results for two 10x10 arrays was:
original for loop
0.5114614190533757
with masks
0.21544912096578628
with searchsort
0.12026709201745689
and for two 100x100 arrays it was:
original for loop
247.88183582702186
with masks
0.5245905339252204
with searchsort
0.2439237720100209
For big matrices the use of numpy functionality unsurprisingly makes a huge difference and indeed searchsort and indexing instead of masking about halves the run time.
Related
Vectorize a for loop based on condition using numpy
I have 2 numpy arrays l1 and l2 as follows: start, jump = 1, 2 L = 8 l1 = np.arange(start, L) l2 = np.arange(start + jump, L+jump) This results in: l1 = [1 2 3 4 5 6 7] l2 = [3 4 5 6 7 8 9] Now, I want 2 resultant arrays r1 and r2 such that while appending elements of l1 and l2 one by one in r1 and r2 respectively, it should check if r2 does not contain $i^{th}$ element of l1. Implementing this using for loop is easy. But I am stuck on how to implement it using only numpy (without using loops) as I am new to it. This is what I tried and want I am expecting: r1 = [] r2 = [] for i in range(len(l1)): if (l1[i] not in r2): r1.append(l1[i]) r2.append(l2[i]) This gives: r1 = [1, 2, 5, 6] r2 = [3, 4, 7, 8] Thanks in advance :)
As suggested by #Chrysoplylaxs in comments, I made a boolean mask and it worked like a charm! mask = np.tile([True]*jump + [False]*jump, len(l1)//jump).astype(bool) r1 = l1[mask[:len(l1)]] r2 = l2[mask[:len(l2)]]
Change Numpy array values in-place
Say when we have a randomly generated 2D 3x2 Numpy array a = np.array(3,2) and I want to change the value of the element on the first row & column (i.e. a[0,0]) to 10. If I do a[0][0] = 10 then it works and a[0,0] is changed to 10. But if I do a[np.arange(1)][0] = 10 then nothing is changed. Why is this? I want to change some columns values of a selected list of rows (that is indicated by a Numpy array) to some other values (like a[row_indices][:,0] = 10) but it doesn't work as I'm passing in an array (or list) that indicates rows.
a[x][y] is wrong. It happens to work in the first case, a[0][0] = 10 because a[0] returns a view, hence doing resul[y] = whatever modifies the original array. However, in the second case, a[np.arange(1)][0] = 10, a[np.arange(1)] returns a copy (because you are using array indexing). You should be using a[0, 0] = 10 or a[np.arange(1), 0] = 10
Advanced indexing always returns a copy as a view cannot be guaranteed. Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view). If you replace np.arange(1) with something that returns a view (or equivalent slicing) then you get back to basic indexing, and hence when you chain two views, the change is reflected into the original array. For example: import numpy as np arr = np.arange(2 * 3).reshape((2, 3)) arr[0][0] = 10 print(arr) # [[10 1 2] # [ 3 4 5]] arr = np.arange(2 * 3).reshape((2, 3)) arr[:1][0] = 10 print(arr) # [[10 10 10] # [ 3 4 5]] arr = np.arange(2 * 3).reshape((2, 3)) arr[0][:1] = 10 print(arr) # [[10 1 2] # [ 3 4 5]] etc. If you have some row indices you want to use, to modify the array you can just use them, but you cannot chain the indexing, e.g: arr = np.arange(5 * 3).reshape((5, 3)) row_indices = (0, 2) arr[row_indices, 0] = 10 print(arr) # [[10 1 2] # [ 3 4 5] # [10 7 8] # [ 9 10 11] # [12 13 14]]
Pad list of arrays with zeros in order all arrays to have the same size
I have created this array(or I think its a list) that consist of many arrays that are different size and that is the reason I put dtype = object . m = [data[a:b] for a, b in zip(z[0:-1:2], z[1:-1:2])] array = np.array(m, dtype=object) I need to pad each array with zero so that they have the same size (lets say size=smax) and become a "proper" array. My definitions are a little off and I am sorry in advance
You can do this using np.pad on each row. For example: import numpy as np data = np.arange(10) z = [0, 2, 1, 4, 6, 10, 8, 9] m = [data[a:b] for a, b in zip(z[0:-1:2], z[1:-1:2])] max_length = max(len(row) for row in m) result = np.array([np.pad(row, (0, max_length-len(row))) for row in m]) print(result) # [[0 1 0 0] # [1 2 3 0] # [6 7 8 9]]
Error on slicing array in Python
I have a problem with slicing an array. The problem is that if I do some operation inside of function and return modified array my array outside of function is also changed. I could not really understand that behavior of numpy slicing opration on array. Here is the code: import numpy as np def do_smth(a): x = np.concatenate((a[..., 1], a[..., 3])) y = np.concatenate((a[..., 2], a[..., 4])) xmin, xmax = np.floor(min(x)), np.ceil(max(x)) ymin, ymax = np.floor(min(y)), np.ceil(max(y)) a[..., 1:3] = a[...,1:3] - np.array([xmin, ymin]) a[..., 3:5] = a[...,3:5] - np.array([xmin, ymin]) return a def main(): old_a = np.array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) new_a = do_smth(old_a) print "new_a:\n", new_a, '\n\n' print "old_a:\n", old_a gives output: new_a: [[ 0 0 0 2 2] [ 5 5 5 7 7] [10 10 10 12 12]] old_a: [[ 0 0 0 2 2] [ 5 5 5 7 7] [10 10 10 12 12]] Can anyone tell why old_a has been changed? And how can I make old_a unchanged? Thank you
The problem is not slicing the array, but the fact that numpy arrays are mutable objects. This means that, whenever you make something like import numpy as np a = np.array([1,2,3,4]) b = a the object b is a view of array a, more precisely, b is just another name for a. Note that, when you pass a mutable python object (like a python list or dict) to a function that do some operation that changes this object, the original object passed is also changed. This is how python (not only numpy) treats mutable objects. to get the feeling you can do something like this: from __future__ import print_funtion import numpy as np a = np.ones(3) b = a c = [1,1,1] d = c print("a before:", a) print("b before:", b) print("c before:", c) print("c before:", d) def change_it(x): x[0] = 10 change_it(a) change_it(c) print("a after:", a) print("b after:", b) print("c after:", c) print("c after:", d) which gives: a before: [ 1. 1. 1.] b before: [ 1. 1. 1.] c before: [1, 1, 1] d before: [1, 1, 1] a after: [ 10. 1. 1.] b after: [ 10. 1. 1.] c after: [10, 1, 1] d after: [10, 1, 1] Note that the function change_it doesn't even return anything and was only used on a and c but also changed b and d If you want to avoid this, you must use an explicit copy of your array. That is easily done by: def do_smth(original_a): # change the argument name a = original_a.copy() # make an explicit copy x = np.concatenate((a[..., 1], a[..., 3])) y = np.concatenate((a[..., 2], a[..., 4])) xmin, xmax = np.floor(min(x)), np.ceil(max(x)) ymin, ymax = np.floor(min(y)), np.ceil(max(y)) a[..., 1:3] = a[...,1:3] - np.array([xmin, ymin]) a[..., 3:5] = a[...,3:5] - np.array([xmin, ymin]) return a
Although values in Python are passed by assignment, the object you are sending in is actually a reference to a memory location. So thus, when you change the object in the function you are altering the same variable you sent into the function. You will want to clone the object first, and then do your operations on the copy.
Slice 2d array into smaller 2d arrays
Is there a way to slice a 2d array in numpy into smaller 2d arrays? Example [[1,2,3,4], -> [[1,2] [3,4] [5,6,7,8]] [5,6] [7,8]] So I basically want to cut down a 2x4 array into 2 2x2 arrays. Looking for a generic solution to be used on images.
There was another question a couple of months ago which clued me in to the idea of using reshape and swapaxes. The h//nrows makes sense since this keeps the first block's rows together. It also makes sense that you'll need nrows and ncols to be part of the shape. -1 tells reshape to fill in whatever number is necessary to make the reshape valid. Armed with the form of the solution, I just tried things until I found the formula that works. You should be able to break your array into "blocks" using some combination of reshape and swapaxes: def blockshaped(arr, nrows, ncols): """ Return an array of shape (n, nrows, ncols) where n * nrows * ncols = arr.size If arr is a 2D array, the returned array should look like n subblocks with each subblock preserving the "physical" layout of arr. """ h, w = arr.shape assert h % nrows == 0, f"{h} rows is not evenly divisible by {nrows}" assert w % ncols == 0, f"{w} cols is not evenly divisible by {ncols}" return (arr.reshape(h//nrows, nrows, -1, ncols) .swapaxes(1,2) .reshape(-1, nrows, ncols)) turns c np.random.seed(365) c = np.arange(24).reshape((4, 6)) print(c) [out]: [[ 0 1 2 3 4 5] [ 6 7 8 9 10 11] [12 13 14 15 16 17] [18 19 20 21 22 23]] into print(blockshaped(c, 2, 3)) [out]: [[[ 0 1 2] [ 6 7 8]] [[ 3 4 5] [ 9 10 11]] [[12 13 14] [18 19 20]] [[15 16 17] [21 22 23]]] I've posted an inverse function, unblockshaped, here, and an N-dimensional generalization here. The generalization gives a little more insight into the reasoning behind this algorithm. Note that there is also superbatfish's blockwise_view. It arranges the blocks in a different format (using more axes) but it has the advantage of (1) always returning a view and (2) being capable of handling arrays of any dimension.
It seems to me that this is a task for numpy.split or some variant. e.g. a = np.arange(30).reshape([5,6]) #a.shape = (5,6) a1 = np.split(a,3,axis=1) #'a1' is a list of 3 arrays of shape (5,2) a2 = np.split(a, [2,4]) #'a2' is a list of three arrays of shape (2,5), (2,5), (1,5) If you have a NxN image you can create, e.g., a list of 2 NxN/2 subimages, and then divide them along the other axis. numpy.hsplit and numpy.vsplit are also available.
There are some other answers that seem well-suited for your specific case already, but your question piqued my interest in the possibility of a memory-efficient solution usable up to the maximum number of dimensions that numpy supports, and I ended up spending most of the afternoon coming up with possible method. (The method itself is relatively simple, it's just that I still haven't used most of the really fancy features that numpy supports so most of the time was spent researching to see what numpy had available and how much it could do so that I didn't have to do it.) def blockgen(array, bpa): """Creates a generator that yields multidimensional blocks from the given array(_like); bpa is an array_like consisting of the number of blocks per axis (minimum of 1, must be a divisor of the corresponding axis size of array). As the blocks are selected using normal numpy slicing, they will be views rather than copies; this is good for very large multidimensional arrays that are being blocked, and for very large blocks, but it also means that the result must be copied if it is to be modified (unless modifying the original data as well is intended).""" bpa = np.asarray(bpa) # in case bpa wasn't already an ndarray # parameter checking if array.ndim != bpa.size: # bpa doesn't match array dimensionality raise ValueError("Size of bpa must be equal to the array dimensionality.") if (bpa.dtype != np.int # bpa must be all integers or (bpa < 1).any() # all values in bpa must be >= 1 or (array.shape % bpa).any()): # % != 0 means not evenly divisible raise ValueError("bpa ({0}) must consist of nonzero positive integers " "that evenly divide the corresponding array axis " "size".format(bpa)) # generate block edge indices rgen = (np.r_[:array.shape[i]+1:array.shape[i]//blk_n] for i, blk_n in enumerate(bpa)) # build slice sequences for each axis (unfortunately broadcasting # can't be used to make the items easy to operate over c = [[np.s_[i:j] for i, j in zip(r[:-1], r[1:])] for r in rgen] # Now to get the blocks; this is slightly less efficient than it could be # because numpy doesn't like jagged arrays and I didn't feel like writing # a ufunc for it. for idxs in np.ndindex(*bpa): blockbounds = tuple(c[j][idxs[j]] for j in range(bpa.size)) yield array[blockbounds]
You question practically the same as this one. You can use the one-liner with np.ndindex() and reshape(): def cutter(a, r, c): lenr = a.shape[0]/r lenc = a.shape[1]/c np.array([a[i*r:(i+1)*r,j*c:(j+1)*c] for (i,j) in np.ndindex(lenr,lenc)]).reshape(lenr,lenc,r,c) To create the result you want: a = np.arange(1,9).reshape(2,1) #array([[1, 2, 3, 4], # [5, 6, 7, 8]]) cutter( a, 1, 2 ) #array([[[[1, 2]], # [[3, 4]]], # [[[5, 6]], # [[7, 8]]]])
Some minor enhancement to TheMeaningfulEngineer's answer that handles the case when the big 2d array cannot be perfectly sliced into equally sized subarrays def blockfy(a, p, q): ''' Divides array a into subarrays of size p-by-q p: block row size q: block column size ''' m = a.shape[0] #image row size n = a.shape[1] #image column size # pad array with NaNs so it can be divided by p row-wise and by q column-wise bpr = ((m-1)//p + 1) #blocks per row bpc = ((n-1)//q + 1) #blocks per column M = p * bpr N = q * bpc A = np.nan* np.ones([M,N]) A[:a.shape[0],:a.shape[1]] = a block_list = [] previous_row = 0 for row_block in range(bpc): previous_row = row_block * p previous_column = 0 for column_block in range(bpr): previous_column = column_block * q block = A[previous_row:previous_row+p, previous_column:previous_column+q] # remove nan columns and nan rows nan_cols = np.all(np.isnan(block), axis=0) block = block[:, ~nan_cols] nan_rows = np.all(np.isnan(block), axis=1) block = block[~nan_rows, :] ## append if block.size: block_list.append(block) return block_list Examples: a = np.arange(25) a = a.reshape((5,5)) out = blockfy(a, 2, 3) a-> array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) out[0] -> array([[0., 1., 2.], [5., 6., 7.]]) out[1]-> array([[3., 4.], [8., 9.]]) out[-1]-> array([[23., 24.]])
For now it just works when the big 2d array can be perfectly sliced into equally sized subarrays. The code bellow slices a ->array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11], [12, 13, 14, 15, 16, 17], [18, 19, 20, 21, 22, 23]]) into this block_array-> array([[[ 0, 1, 2], [ 6, 7, 8]], [[ 3, 4, 5], [ 9, 10, 11]], [[12, 13, 14], [18, 19, 20]], [[15, 16, 17], [21, 22, 23]]]) p ang q determine the block size Code a = arange(24) a = a.reshape((4,6)) m = a.shape[0] #image row size n = a.shape[1] #image column size p = 2 #block row size q = 3 #block column size block_array = [] previous_row = 0 for row_block in range(blocks_per_row): previous_row = row_block * p previous_column = 0 for column_block in range(blocks_per_column): previous_column = column_block * q block = a[previous_row:previous_row+p,previous_column:previous_column+q] block_array.append(block) block_array = array(block_array)
If you want a solution that also handles the cases when the matrix is not equally divided, you can use this: from operator import add half_split = np.array_split(input, 2) res = map(lambda x: np.array_split(x, 2, axis=1), half_split) res = reduce(add, res)
Here is a solution based on unutbu's answer that handle case where matrix cannot be equally divided. In this case, it will resize the matrix before using some interpolation. You need OpenCV for this. Note that I had to swap ncols and nrows to make it works, didn't figured why. import numpy as np import cv2 import math def blockshaped(arr, r_nbrs, c_nbrs, interp=cv2.INTER_LINEAR): """ arr a 2D array, typically an image r_nbrs numbers of rows r_cols numbers of cols """ arr_h, arr_w = arr.shape size_w = int( math.floor(arr_w // c_nbrs) * c_nbrs ) size_h = int( math.floor(arr_h // r_nbrs) * r_nbrs ) if size_w != arr_w or size_h != arr_h: arr = cv2.resize(arr, (size_w, size_h), interpolation=interp) nrows = int(size_w // r_nbrs) ncols = int(size_h // c_nbrs) return (arr.reshape(r_nbrs, ncols, -1, nrows) .swapaxes(1,2) .reshape(-1, ncols, nrows))
a = np.random.randint(1, 9, size=(9,9)) out = [np.hsplit(x, 3) for x in np.vsplit(a,3)] print(a) print(out) yields [[7 6 2 4 4 2 5 2 3] [2 3 7 6 8 8 2 6 2] [4 1 3 1 3 8 1 3 7] [6 1 1 5 7 2 1 5 8] [8 8 7 6 6 1 8 8 4] [6 1 8 2 1 4 5 1 8] [7 3 4 2 5 6 1 2 7] [4 6 7 5 8 2 8 2 8] [6 6 5 5 6 1 2 6 4]] [[array([[7, 6, 2], [2, 3, 7], [4, 1, 3]]), array([[4, 4, 2], [6, 8, 8], [1, 3, 8]]), array([[5, 2, 3], [2, 6, 2], [1, 3, 7]])], [array([[6, 1, 1], [8, 8, 7], [6, 1, 8]]), array([[5, 7, 2], [6, 6, 1], [2, 1, 4]]), array([[1, 5, 8], [8, 8, 4], [5, 1, 8]])], [array([[7, 3, 4], [4, 6, 7], [6, 6, 5]]), array([[2, 5, 6], [5, 8, 2], [5, 6, 1]]), array([[1, 2, 7], [8, 2, 8], [2, 6, 4]])]]
I publish my solution. Notice that this code doesn't' actually create copies of original array, so it works well with big data. Moreover, it doesn't crash if array cannot be divided evenly (but you can easly add condition for that by deleting ceil and checking if v_slices and h_slices are divided without rest). import numpy as np from math import ceil a = np.arange(9).reshape(3, 3) p, q = 2, 2 width, height = a.shape v_slices = ceil(width / p) h_slices = ceil(height / q) for h in range(h_slices): for v in range(v_slices): block = a[h * p : h * p + p, v * q : v * q + q] # do something with a block This code changes (or, more precisely, gives you direct access to part of an array) this: [[0 1 2] [3 4 5] [6 7 8]] Into this: [[0 1] [3 4]] [[2] [5]] [[6 7]] [[8]] If you need actual copies, Aenaon code is what you are looking for. If you are sure that big array can be divided evenly, you can use numpy splitting tools.
to add to #Aenaon answer and his blockfy function, if you are working with COLOR IMAGES/ 3D ARRAY here is my pipeline to create crops of 224 x 224 for 3 channel input def blockfy(a, p, q): ''' Divides array a into subarrays of size p-by-q p: block row size q: block column size ''' m = a.shape[0] #image row size n = a.shape[1] #image column size # pad array with NaNs so it can be divided by p row-wise and by q column-wise bpr = ((m-1)//p + 1) #blocks per row bpc = ((n-1)//q + 1) #blocks per column M = p * bpr N = q * bpc A = np.nan* np.ones([M,N]) A[:a.shape[0],:a.shape[1]] = a block_list = [] previous_row = 0 for row_block in range(bpc): previous_row = row_block * p previous_column = 0 for column_block in range(bpr): previous_column = column_block * q block = A[previous_row:previous_row+p, previous_column:previous_column+q] # remove nan columns and nan rows nan_cols = np.all(np.isnan(block), axis=0) block = block[:, ~nan_cols] nan_rows = np.all(np.isnan(block), axis=1) block = block[~nan_rows, :] ## append if block.size: block_list.append(block) return block_list then extended above to for file in os.listdir(path_to_crop): ### list files in your folder img = io.imread(path_to_crop + file, as_gray=False) ### open image r = blockfy(img[:,:,0],224,224) ### crop blocks of 224 x 224 for red channel g = blockfy(img[:,:,1],224,224) ### crop blocks of 224 x 224 for green channel b = blockfy(img[:,:,2],224,224) ### crop blocks of 224 x 224 for blue channel for x in range(0,len(r)): img = np.array((r[x],g[x],b[x])) ### combine each channel into one patch by patch img = img.astype(np.uint8) ### cast back to proper integers img_swap = img.swapaxes(0, 2) ### need to swap axes due to the way things were proceesed img_swap_2 = img_swap.swapaxes(0, 1) ### do it again Image.fromarray(img_swap_2).save(path_save_crop+str(x)+"bounding" + file, format = 'jpeg', subsampling=0, quality=100) ### save patch with new name etc