split a numpy array both horizontally and vertically - python

What is the most pythonic way of splitting a NumPy matrix (a 2-D array) into equal chunks both vertically and horizontally?
For example :
aa = np.reshape(np.arange(270),(18,15)) # a 18x15 matrix
then a "function" like
ab = np.split2d(aa,(2,3))
would result in a list of 6 matrices shaped (9,5) each. The first guess is combine hsplit, map and vsplit, but how the mar has to be applied if there are two parameters to define for it, like :
map(np.vsplit(#,3),np.hsplit(aa,2))

Here's one approach staying within NumPy environment -
def view_as_blocks(arr, BSZ):
# arr is input array, BSZ is block-size
m,n = arr.shape
M,N = BSZ
return arr.reshape(m//M, M, n//N, N).swapaxes(1,2).reshape(-1,M,N)
Sample runs
1) Actual big case to verify shapes :
In [41]: aa = np.reshape(np.arange(270),(18,15))
In [42]: view_as_blocks(aa, (9,5)).shape
Out[42]: (6, 9, 5)
2) Small case to manually verify values:
In [43]: aa = np.reshape(np.arange(36),(6,6))
In [44]: aa
Out[44]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [45]: view_as_blocks(aa, (2,3)) # Blocks of shape (2,3)
Out[45]:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]],
[[24, 25, 26],
[30, 31, 32]],
[[27, 28, 29],
[33, 34, 35]]])
If you are willing to work with other libraries, scikit-image could be of use here, like so -
from skimage.util import view_as_blocks as viewB
out = viewB(aa, tuple(BSZ)).reshape(-1,*BSZ)
Runtime test -
In [103]: aa = np.reshape(np.arange(270),(18,15))
# #EFT's soln
In [99]: %timeit split_2d(aa, (2,3))
10000 loops, best of 3: 23.3 µs per loop
# #glegoux's soln-1
In [100]: %timeit list(get_chunks(aa, 2,3))
100000 loops, best of 3: 3.7 µs per loop
# #glegoux's soln-2
In [111]: %timeit list(get_chunks2(aa, 9, 5))
100000 loops, best of 3: 3.39 µs per loop
# Proposed in this post
In [101]: %timeit view_as_blocks(aa, (9,5))
1000000 loops, best of 3: 1.86 µs per loop
Please note that I have used (2,3) for split_2d and get_chunks as by their definitions, they are using that as the number of blocks. In my case with view_as_blocks, I have the parameter BSZ indicating the block size. So, I have (9,5) there. get_chunks2 follows the same format as view_as_blocks. The outputs should represent the same there.

You could use np.split & np.concatenate, the latter to allow the second split to be conducted in a single step:
def split_2d(array, splits):
x, y = splits
return np.split(np.concatenate(np.split(array, y, axis=1)), x*y)
ab = split_2d(aa,(2,3))
ab[0].shape
Out[95]: (9, 5)
len(ab)
Out[96]: 6
This also seems like it should be relatively straightforward to generalize to the n-dim case, though I haven't followed that thought all the way through just yet.
Edit:
For a single array as output, just add np.stack:
np.stack(ab).shape
Out[99]: (6, 9, 5)

To cut, this matrix (18,15) :
+-+-+-+
+ +
+-+-+-+
in 2x3 blocks (9,5) like it :
+-+-+-+
+-+-+-+
+-+-+-+
Do:
from pprint import pprint
import numpy as np
M = np.reshape(np.arange(18*15),(18,15))
def get_chunks(M, n, p):
n = len(M)//n
p = len(M[0])//p
for i in range(0, len(M), n):
for j in range(0, len(M[0]), p):
yield M[i:i+n,j:j+p]
def get_chunks2(M, n, p):
for i in range(0, len(M), n):
for j in range(0, len(M[0]), p):
yield M[i:i+n,j:j+p]
# list(get_chunks2(M, 9, 5)) same result more faster
chunks = list(get_chunks(M, 2, 3))
pprint(chunks)
Output:
[array([[ 0, 1, 2, 3, 4],
[ 15, 16, 17, 18, 19],
[ 30, 31, 32, 33, 34],
[ 45, 46, 47, 48, 49],
[ 60, 61, 62, 63, 64],
[ 75, 76, 77, 78, 79],
[ 90, 91, 92, 93, 94],
[105, 106, 107, 108, 109],
[120, 121, 122, 123, 124]]),
array([[ 5, 6, 7, 8, 9],
[ 20, 21, 22, 23, 24],
[ 35, 36, 37, 38, 39],
[ 50, 51, 52, 53, 54],
[ 65, 66, 67, 68, 69],
[ 80, 81, 82, 83, 84],
[ 95, 96, 97, 98, 99],
[110, 111, 112, 113, 114],
[125, 126, 127, 128, 129]]),
array([[ 10, 11, 12, 13, 14],
[ 25, 26, 27, 28, 29],
[ 40, 41, 42, 43, 44],
[ 55, 56, 57, 58, 59],
[ 70, 71, 72, 73, 74],
[ 85, 86, 87, 88, 89],
[100, 101, 102, 103, 104],
[115, 116, 117, 118, 119],
[130, 131, 132, 133, 134]]),
array([[135, 136, 137, 138, 139],
[150, 151, 152, 153, 154],
[165, 166, 167, 168, 169],
[180, 181, 182, 183, 184],
[195, 196, 197, 198, 199],
[210, 211, 212, 213, 214],
[225, 226, 227, 228, 229],
[240, 241, 242, 243, 244],
[255, 256, 257, 258, 259]]),
array([[140, 141, 142, 143, 144],
[155, 156, 157, 158, 159],
[170, 171, 172, 173, 174],
[185, 186, 187, 188, 189],
[200, 201, 202, 203, 204],
[215, 216, 217, 218, 219],
[230, 231, 232, 233, 234],
[245, 246, 247, 248, 249],
[260, 261, 262, 263, 264]]),
array([[145, 146, 147, 148, 149],
[160, 161, 162, 163, 164],
[175, 176, 177, 178, 179],
[190, 191, 192, 193, 194],
[205, 206, 207, 208, 209],
[220, 221, 222, 223, 224],
[235, 236, 237, 238, 239],
[250, 251, 252, 253, 254],
[265, 266, 267, 268, 269]])]

For a simpler solution, I used np.array_split together with transforming the matrices. So let's say that I want it split into 3 equal chunks vertically and 2 equal chunks horizontally, then:
# Create your matrix
matrix = np.reshape(np.arange(270),(18,15)) # a 18x15 matrix
# Container for your final matrices
final_matrices = []
# Then split into 3 equal chunks vertically
vertically_split_matrices = np.array_split(matrix)
for v_m in vertically_split_matrices:
# Then split the transformed matrices equally
m1, m2 = np.array_split(v_m.T, 2)
# And transform the matrices back
final_matrices.append(m1.T)
final_matrices.append(m2.T)
So I end up with 6 chunks, all of which are the same height and the same width.

Related

Advanced 3d numpy array slicing with alternation

So, I want to slice my 3d array to skip the first 2 arrays and then return the next two arrays. And I want the slice to keep following this pattern, alternating skipping 2 and giving 2 arrays etc.. I have found a solution, but I was wondering if there is a more elegant way to go about this? Preferably without having to reshape?
arr = np.arange(1, 251).reshape((10, 5, 5))
sliced_array = np.concatenate((arr[2::4], arr[3::4]), axis=1).ravel().reshape((4, 5, 5))
You can use boolean indexing using a mask that repeats [False, False, True, True, ...]:
import numpy as np
arr = np.arange(1, 251).reshape((10, 5, 5))
mask = np.arange(arr.shape[0]) % 4 >= 2
out = arr[mask]
out:
array([[[ 51, 52, 53, 54, 55],
[ 56, 57, 58, 59, 60],
[ 61, 62, 63, 64, 65],
[ 66, 67, 68, 69, 70],
[ 71, 72, 73, 74, 75]],
[[ 76, 77, 78, 79, 80],
[ 81, 82, 83, 84, 85],
[ 86, 87, 88, 89, 90],
[ 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100]],
[[151, 152, 153, 154, 155],
[156, 157, 158, 159, 160],
[161, 162, 163, 164, 165],
[166, 167, 168, 169, 170],
[171, 172, 173, 174, 175]],
[[176, 177, 178, 179, 180],
[181, 182, 183, 184, 185],
[186, 187, 188, 189, 190],
[191, 192, 193, 194, 195],
[196, 197, 198, 199, 200]]])
Since you want to select, and skip, the same numbers, reshaping works.
For a 1d array:
In [97]: np.arange(10).reshape(5,2)[1::2]
Out[97]:
array([[2, 3],
[6, 7]])
which can then be ravelled.
Generalizing to more dimensions:
In [98]: x = np.arange(100).reshape(10,10)
In [99]: x.reshape(5,2,10)[1::2,...].reshape(-1,10)
Out[99]:
array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
I won't go on to 3d because the display will be longer, but it should be straight forward.

Efficient way to pick first 'n' non-repeating elements in every row of a 2d numpy array

I have a 2d numpy array of integers and I want to pick the first 5 unique elements in every row.
a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92],
[ 1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134],
[110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
Notice the repeating elements in the first and second rows. The repeating elements appear next to each other. The output should be
array([[193, 64, 139, 180, 104], [1, 36, 156, 152, 37], [110, 96, 52, 53, 35]])
This is a sample array and the actual array has 20,000 rows. I'm looking for an efficient way to do this without the use of loops. Thanks in advance.
Update
To get rid of the for loop (which I used because the program would be more efficient with a break statement in place), you can use the itertools.takewhile() method to act as a break statement within the list comprehension, thus making the program more efficient (I tested 2 versions of the code, one with the itertools.takewhile() method and one without; the former turned out faster):
import numpy as np
from itertools import groupby, takewhile
a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92],
[ 1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134],
[110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
result = [[k[0] for i, k in takewhile(lambda x: x[0] != 5, enumerate(groupby(row)))] for row in a]
print(np.array(result))
Output:
[[193 64 139 180 104]
[ 1 36 156 152 37]
[110 96 52 53 35]]
(Using for loops)
You can try using the built-in enumerate() function along with the itertools.groupby() method:
import numpy as np
from itertools import groupby
a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92],
[ 1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134],
[110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
def get_unique(a, amt):
for row in a:
r = []
for i, k in enumerate(groupby(row)):
if i == amt:
break
r.append(k[0])
yield r
for row in get_unique(a, 5):
print(row)
Output:
[193, 64, 139, 180, 104]
[1, 36, 156, 152, 37]
[110, 96, 52, 53, 35]
Omitting the function:
import numpy as np
from itertools import groupby
a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92],
[ 1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134],
[110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
result = []
for row in a:
r = []
for i, k in enumerate(groupby(row)):
if i == 5:
break
r.append(k[0])
result.append(r)
print(np.array(result))
Output:
[[193 64 139 180 104]
[ 1 36 156 152 37]
[110 96 52 53 35]]
Try with groupby:
from itertools import groupby
>>> np.array([np.array([k for k, g in groupby(row)])[:5] for row in a])
array([[193, 64, 139, 180, 104],
[ 1, 36, 156, 152, 37],
[110, 96, 52, 53, 35]])
Using numpy alone you can vectorize the unique function but then it also needs to be padded, and also to preserve order. Then just get the first 5 columns of the result:
np.apply_along_axis(lambda x: np.pad(u := x[np.sort(np.unique(x, return_index=1)[1])], (0, a[0].size-u.size)), 1, a)[:,:5]
>>> a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92], [ 1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134], [110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
>>> np.apply_along_axis(lambda x: np.pad(u := x[np.sort(np.unique(x, return_index=1)[1])], (0, a[0].size-u.size)), 1, a)[:,:5]
[[193 64 139 180 104]
[ 1 36 156 152 37]
[110 96 52 53 35]]
Try numpy.apply_along_axis + itertools.groupby + itertools.islice:
import numpy as np
from itertools import groupby, islice
a = np.array([[193, 64, 64, 139, 180, 180, 104, 152, 69, 22, 192, 92],
[1, 36, 156, 152, 152, 37, 46, 143, 141, 114, 25, 134],
[110, 96, 52, 53, 35, 147, 3, 116, 20, 11, 137, 5]])
first_5_unique = lambda x: [k for k, _ in islice(groupby(x), 5)]
res = np.apply_along_axis(first_5_unique, axis=1, arr=a)
print(res)
Output
[[193 64 139 180 104]
[ 1 36 156 152 37]
[110 96 52 53 35]]
Or, a numpy only using numpy.argpartition and numpy.argsort:
def first_k_unique(arr, k, axis=1):
val = (np.diff(arr) != 0) * np.arange(start=10, stop=-1, step=-1) * -1
ind = np.argpartition(val, k, axis=axis)[:, :k]
res = np.take_along_axis(arr, indices=ind, axis=axis)
return np.take_along_axis(res, np.take_along_axis(val, indices=ind, axis=axis).argsort(axis), axis)
print(first_k_unique(a, 5))
Output
[[193 64 139 180 104]
[ 1 36 156 152 37]
[110 96 52 53 35]]
The core explanation of the numpy only solution, can be found here.

How to convert different numpy arrays to sets?

I have one numpy array that looks like this:
array([ 0, 1, 2, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16,
18, 19, 20, 22, 27, 28, 29, 32, 33, 34, 36, 37, 38,
39, 42, 43, 44, 45, 47, 48, 51, 52, 54, 55, 56, 60,
65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 77, 78, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 94, 95, 97,
98, 100, 101, 102, 105, 106, 108, 109, 113, 114, 117, 118, 119,
121, 123, 124, 126, 127, 128, 129, 131, 132, 133, 134, 135, 137,
138, 141, 142, 143, 144, 145, 147, 148, 149, 152, 154, 156, 157,
159, 160, 161, 163, 165, 166, 167, 168, 169, 170, 172, 176, 177,
179, 180, 182, 183, 185, 186, 187, 188, 191, 192, 194, 196, 197,
199, 200, 201, 202, 204, 205, 206, 207, 208])
I'm able to convert this to a set using set() no problem
However, I have another numpy array that looks like:
array([[ 2],
[ 4],
[ 10],
[ 10],
[ 12],
[ 13],
[ 14],
[ 16],
[ 19],
[ 21],
[ 21],
[ 22],
[ 29],
[209]])
When I try to use set() this gives me an error: TypeError: unhashable type: 'numpy.ndarray'
How can I convert my second numpy array to look like the first array and so I will be able to use set()?
For reference my second array is converted from a PySpark dataframe column using:
np.array(data2.select('row_num').collect())
And both arrays are used with set() in:
count = sorted(set(range(data1)) - set(np.array(data2.select('row_num').collect())))
As mentioned, use ravel to return a contiguous flattened array.
import numpy as np
arr = np.array(
[[2], [4], [10], [10], [12], [13], [14], [16], [19], [21], [21], [22], [29], [209]]
)
print(set(arr.ravel()))
Outputs:
{2, 4, 10, 12, 13, 14, 16, 209, 19, 21, 22, 29}
This is somewhat equivalent to doing a reshape with a single dimension being the array size:
print(set(arr.reshape(arr.size)))

How to do mapping of 3d arrays

I have a numpy array of shape (20,512,512).
I want to get a list of arrays of length 20 matching the array at points (:,100,100) (:,105,100), (:,110,100)
and so on.
I understand i need to use map for it, but how do I do it exactly?
many thanks
Yuval
you could achieve this by index slicing.
here's an example for a 1D array.
a = np.arange(200)
a[100::5] # from index 100 to end with increments of 5
>>> array([100, 105, 110, 115, 120, 125, 130, 135, 140, 145, 150, 155, 160,
165, 170, 175, 180, 185, 190, 195])
for your case
shape = [20,512,512]
arr = np.random.randint(0, 100, shape) # creating an array with the required shape
arr[:, 100::5, 100] # index slicing
>>> array([[88, 74, 45, ..., 72, 33, 63],
[88, 26, 53, ..., 47, 78, 16],
[26, 54, 85, ..., 89, 81, 66],
...,
[76, 1, 11, ..., 3, 74, 7],
[93, 34, 84, ..., 84, 73, 79],
[77, 10, 61, ..., 4, 21, 19]])
arr[:, 100::5, 100].shape
>>> (20, 83)

Reshaping a 1D bytes object into a 3D numpy array

I'm using FFmpeg to decode a video, and am piping the RGB24 raw data into python.
So the format of the binary data is:
RGBRGBRGBRGB...
I need to convert this into a (640, 360, 3) numpy array, and was wondering if I could use reshape for this and, especially, how.
If rgb is a bytearray with 3 * 360 * 640 bytes, all you need is :
np.array(rgb).reshape(640, 360, 3)
As an example:
>>> import random
>>> import numpy as np
>>> bytearray(random.getrandbits(8) for _ in range(3 * 4 * 4))
bytearray(b'{)jg\xba\xbe&\xd1\xb9\xdd\xf9#\xadL?GV\xca\x19\xfb\xbd\xad\xc2C\xa8,+\x8aEGpo\x04\x89=e\xc3\xef\x17H#\x90]\xd5^\x94~/')
>>> rgb = bytearray(random.getrandbits(8) for _ in range(3 * 4 * 4))
>>> np.array(rgb)
array([112, 68, 7, 41, 175, 109, 124, 111, 116, 6, 124, 168, 146,
60, 125, 133, 1, 74, 251, 194, 79, 14, 72, 236, 188, 56,
52, 145, 125, 236, 86, 108, 235, 9, 215, 49, 190, 16, 90,
9, 114, 43, 214, 65, 132, 128, 145, 214], dtype=uint8)
>>> np.array(rgb).reshape(4,4,3)
array([[[112, 68, 7],
[ 41, 175, 109],
[124, 111, 116],
[ 6, 124, 168]],
[[146, 60, 125],
[133, 1, 74],
[251, 194, 79],
[ 14, 72, 236]],
[[188, 56, 52],
[145, 125, 236],
[ 86, 108, 235],
[ 9, 215, 49]],
[[190, 16, 90],
[ 9, 114, 43],
[214, 65, 132],
[128, 145, 214]]], dtype=uint8)
You might want to look at existing numpy and scipy methods for image processing. misc.imread could be interesting.

Categories