Reshaping a 1D bytes object into a 3D numpy array - python

I'm using FFmpeg to decode a video, and am piping the RGB24 raw data into python.
So the format of the binary data is:
RGBRGBRGBRGB...
I need to convert this into a (640, 360, 3) numpy array, and was wondering if I could use reshape for this and, especially, how.

If rgb is a bytearray with 3 * 360 * 640 bytes, all you need is :
np.array(rgb).reshape(640, 360, 3)
As an example:
>>> import random
>>> import numpy as np
>>> bytearray(random.getrandbits(8) for _ in range(3 * 4 * 4))
bytearray(b'{)jg\xba\xbe&\xd1\xb9\xdd\xf9#\xadL?GV\xca\x19\xfb\xbd\xad\xc2C\xa8,+\x8aEGpo\x04\x89=e\xc3\xef\x17H#\x90]\xd5^\x94~/')
>>> rgb = bytearray(random.getrandbits(8) for _ in range(3 * 4 * 4))
>>> np.array(rgb)
array([112, 68, 7, 41, 175, 109, 124, 111, 116, 6, 124, 168, 146,
60, 125, 133, 1, 74, 251, 194, 79, 14, 72, 236, 188, 56,
52, 145, 125, 236, 86, 108, 235, 9, 215, 49, 190, 16, 90,
9, 114, 43, 214, 65, 132, 128, 145, 214], dtype=uint8)
>>> np.array(rgb).reshape(4,4,3)
array([[[112, 68, 7],
[ 41, 175, 109],
[124, 111, 116],
[ 6, 124, 168]],
[[146, 60, 125],
[133, 1, 74],
[251, 194, 79],
[ 14, 72, 236]],
[[188, 56, 52],
[145, 125, 236],
[ 86, 108, 235],
[ 9, 215, 49]],
[[190, 16, 90],
[ 9, 114, 43],
[214, 65, 132],
[128, 145, 214]]], dtype=uint8)
You might want to look at existing numpy and scipy methods for image processing. misc.imread could be interesting.

Related

Advanced 3d numpy array slicing with alternation

So, I want to slice my 3d array to skip the first 2 arrays and then return the next two arrays. And I want the slice to keep following this pattern, alternating skipping 2 and giving 2 arrays etc.. I have found a solution, but I was wondering if there is a more elegant way to go about this? Preferably without having to reshape?
arr = np.arange(1, 251).reshape((10, 5, 5))
sliced_array = np.concatenate((arr[2::4], arr[3::4]), axis=1).ravel().reshape((4, 5, 5))
You can use boolean indexing using a mask that repeats [False, False, True, True, ...]:
import numpy as np
arr = np.arange(1, 251).reshape((10, 5, 5))
mask = np.arange(arr.shape[0]) % 4 >= 2
out = arr[mask]
out:
array([[[ 51, 52, 53, 54, 55],
[ 56, 57, 58, 59, 60],
[ 61, 62, 63, 64, 65],
[ 66, 67, 68, 69, 70],
[ 71, 72, 73, 74, 75]],
[[ 76, 77, 78, 79, 80],
[ 81, 82, 83, 84, 85],
[ 86, 87, 88, 89, 90],
[ 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100]],
[[151, 152, 153, 154, 155],
[156, 157, 158, 159, 160],
[161, 162, 163, 164, 165],
[166, 167, 168, 169, 170],
[171, 172, 173, 174, 175]],
[[176, 177, 178, 179, 180],
[181, 182, 183, 184, 185],
[186, 187, 188, 189, 190],
[191, 192, 193, 194, 195],
[196, 197, 198, 199, 200]]])
Since you want to select, and skip, the same numbers, reshaping works.
For a 1d array:
In [97]: np.arange(10).reshape(5,2)[1::2]
Out[97]:
array([[2, 3],
[6, 7]])
which can then be ravelled.
Generalizing to more dimensions:
In [98]: x = np.arange(100).reshape(10,10)
In [99]: x.reshape(5,2,10)[1::2,...].reshape(-1,10)
Out[99]:
array([[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]])
I won't go on to 3d because the display will be longer, but it should be straight forward.

How to convert different numpy arrays to sets?

I have one numpy array that looks like this:
array([ 0, 1, 2, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16,
18, 19, 20, 22, 27, 28, 29, 32, 33, 34, 36, 37, 38,
39, 42, 43, 44, 45, 47, 48, 51, 52, 54, 55, 56, 60,
65, 66, 67, 68, 69, 70, 71, 73, 74, 75, 77, 78, 80,
81, 82, 83, 84, 85, 86, 87, 88, 89, 92, 94, 95, 97,
98, 100, 101, 102, 105, 106, 108, 109, 113, 114, 117, 118, 119,
121, 123, 124, 126, 127, 128, 129, 131, 132, 133, 134, 135, 137,
138, 141, 142, 143, 144, 145, 147, 148, 149, 152, 154, 156, 157,
159, 160, 161, 163, 165, 166, 167, 168, 169, 170, 172, 176, 177,
179, 180, 182, 183, 185, 186, 187, 188, 191, 192, 194, 196, 197,
199, 200, 201, 202, 204, 205, 206, 207, 208])
I'm able to convert this to a set using set() no problem
However, I have another numpy array that looks like:
array([[ 2],
[ 4],
[ 10],
[ 10],
[ 12],
[ 13],
[ 14],
[ 16],
[ 19],
[ 21],
[ 21],
[ 22],
[ 29],
[209]])
When I try to use set() this gives me an error: TypeError: unhashable type: 'numpy.ndarray'
How can I convert my second numpy array to look like the first array and so I will be able to use set()?
For reference my second array is converted from a PySpark dataframe column using:
np.array(data2.select('row_num').collect())
And both arrays are used with set() in:
count = sorted(set(range(data1)) - set(np.array(data2.select('row_num').collect())))
As mentioned, use ravel to return a contiguous flattened array.
import numpy as np
arr = np.array(
[[2], [4], [10], [10], [12], [13], [14], [16], [19], [21], [21], [22], [29], [209]]
)
print(set(arr.ravel()))
Outputs:
{2, 4, 10, 12, 13, 14, 16, 209, 19, 21, 22, 29}
This is somewhat equivalent to doing a reshape with a single dimension being the array size:
print(set(arr.reshape(arr.size)))

Combine numpy subarrays of varying dimensions

I have a nested numpy array (dtype=object), it contains 333 arrays that increase consistently from size 52x1 to size 52x333
I would like to effectively extract and concatenate these arrays so that I have a single 52x55611 array
I imagine this may be straightforward but my attempts using numpy.reshape have been unsuccesful
If you want to stack them along the second axis, you can use numpy.hstack.
list_of_arrays = [ array_1, ..., array_n] #all these arrays have same shape[0]
big_array = np.hstack( list_of_arrays)
if I have understood you correctly, you could use numpy.concatenate.
>>> import numpy as np
>>> a = np.array([range(52)])
>>> b = np.array([range(52,104), range(104, 156)])
>>> np.concatenate((a,b))
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51],
[ 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
[104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116,
117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129,
130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155]])
>>>

split a numpy array both horizontally and vertically

What is the most pythonic way of splitting a NumPy matrix (a 2-D array) into equal chunks both vertically and horizontally?
For example :
aa = np.reshape(np.arange(270),(18,15)) # a 18x15 matrix
then a "function" like
ab = np.split2d(aa,(2,3))
would result in a list of 6 matrices shaped (9,5) each. The first guess is combine hsplit, map and vsplit, but how the mar has to be applied if there are two parameters to define for it, like :
map(np.vsplit(#,3),np.hsplit(aa,2))
Here's one approach staying within NumPy environment -
def view_as_blocks(arr, BSZ):
# arr is input array, BSZ is block-size
m,n = arr.shape
M,N = BSZ
return arr.reshape(m//M, M, n//N, N).swapaxes(1,2).reshape(-1,M,N)
Sample runs
1) Actual big case to verify shapes :
In [41]: aa = np.reshape(np.arange(270),(18,15))
In [42]: view_as_blocks(aa, (9,5)).shape
Out[42]: (6, 9, 5)
2) Small case to manually verify values:
In [43]: aa = np.reshape(np.arange(36),(6,6))
In [44]: aa
Out[44]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35]])
In [45]: view_as_blocks(aa, (2,3)) # Blocks of shape (2,3)
Out[45]:
array([[[ 0, 1, 2],
[ 6, 7, 8]],
[[ 3, 4, 5],
[ 9, 10, 11]],
[[12, 13, 14],
[18, 19, 20]],
[[15, 16, 17],
[21, 22, 23]],
[[24, 25, 26],
[30, 31, 32]],
[[27, 28, 29],
[33, 34, 35]]])
If you are willing to work with other libraries, scikit-image could be of use here, like so -
from skimage.util import view_as_blocks as viewB
out = viewB(aa, tuple(BSZ)).reshape(-1,*BSZ)
Runtime test -
In [103]: aa = np.reshape(np.arange(270),(18,15))
# #EFT's soln
In [99]: %timeit split_2d(aa, (2,3))
10000 loops, best of 3: 23.3 µs per loop
# #glegoux's soln-1
In [100]: %timeit list(get_chunks(aa, 2,3))
100000 loops, best of 3: 3.7 µs per loop
# #glegoux's soln-2
In [111]: %timeit list(get_chunks2(aa, 9, 5))
100000 loops, best of 3: 3.39 µs per loop
# Proposed in this post
In [101]: %timeit view_as_blocks(aa, (9,5))
1000000 loops, best of 3: 1.86 µs per loop
Please note that I have used (2,3) for split_2d and get_chunks as by their definitions, they are using that as the number of blocks. In my case with view_as_blocks, I have the parameter BSZ indicating the block size. So, I have (9,5) there. get_chunks2 follows the same format as view_as_blocks. The outputs should represent the same there.
You could use np.split & np.concatenate, the latter to allow the second split to be conducted in a single step:
def split_2d(array, splits):
x, y = splits
return np.split(np.concatenate(np.split(array, y, axis=1)), x*y)
ab = split_2d(aa,(2,3))
ab[0].shape
Out[95]: (9, 5)
len(ab)
Out[96]: 6
This also seems like it should be relatively straightforward to generalize to the n-dim case, though I haven't followed that thought all the way through just yet.
Edit:
For a single array as output, just add np.stack:
np.stack(ab).shape
Out[99]: (6, 9, 5)
To cut, this matrix (18,15) :
+-+-+-+
+ +
+-+-+-+
in 2x3 blocks (9,5) like it :
+-+-+-+
+-+-+-+
+-+-+-+
Do:
from pprint import pprint
import numpy as np
M = np.reshape(np.arange(18*15),(18,15))
def get_chunks(M, n, p):
n = len(M)//n
p = len(M[0])//p
for i in range(0, len(M), n):
for j in range(0, len(M[0]), p):
yield M[i:i+n,j:j+p]
def get_chunks2(M, n, p):
for i in range(0, len(M), n):
for j in range(0, len(M[0]), p):
yield M[i:i+n,j:j+p]
# list(get_chunks2(M, 9, 5)) same result more faster
chunks = list(get_chunks(M, 2, 3))
pprint(chunks)
Output:
[array([[ 0, 1, 2, 3, 4],
[ 15, 16, 17, 18, 19],
[ 30, 31, 32, 33, 34],
[ 45, 46, 47, 48, 49],
[ 60, 61, 62, 63, 64],
[ 75, 76, 77, 78, 79],
[ 90, 91, 92, 93, 94],
[105, 106, 107, 108, 109],
[120, 121, 122, 123, 124]]),
array([[ 5, 6, 7, 8, 9],
[ 20, 21, 22, 23, 24],
[ 35, 36, 37, 38, 39],
[ 50, 51, 52, 53, 54],
[ 65, 66, 67, 68, 69],
[ 80, 81, 82, 83, 84],
[ 95, 96, 97, 98, 99],
[110, 111, 112, 113, 114],
[125, 126, 127, 128, 129]]),
array([[ 10, 11, 12, 13, 14],
[ 25, 26, 27, 28, 29],
[ 40, 41, 42, 43, 44],
[ 55, 56, 57, 58, 59],
[ 70, 71, 72, 73, 74],
[ 85, 86, 87, 88, 89],
[100, 101, 102, 103, 104],
[115, 116, 117, 118, 119],
[130, 131, 132, 133, 134]]),
array([[135, 136, 137, 138, 139],
[150, 151, 152, 153, 154],
[165, 166, 167, 168, 169],
[180, 181, 182, 183, 184],
[195, 196, 197, 198, 199],
[210, 211, 212, 213, 214],
[225, 226, 227, 228, 229],
[240, 241, 242, 243, 244],
[255, 256, 257, 258, 259]]),
array([[140, 141, 142, 143, 144],
[155, 156, 157, 158, 159],
[170, 171, 172, 173, 174],
[185, 186, 187, 188, 189],
[200, 201, 202, 203, 204],
[215, 216, 217, 218, 219],
[230, 231, 232, 233, 234],
[245, 246, 247, 248, 249],
[260, 261, 262, 263, 264]]),
array([[145, 146, 147, 148, 149],
[160, 161, 162, 163, 164],
[175, 176, 177, 178, 179],
[190, 191, 192, 193, 194],
[205, 206, 207, 208, 209],
[220, 221, 222, 223, 224],
[235, 236, 237, 238, 239],
[250, 251, 252, 253, 254],
[265, 266, 267, 268, 269]])]
For a simpler solution, I used np.array_split together with transforming the matrices. So let's say that I want it split into 3 equal chunks vertically and 2 equal chunks horizontally, then:
# Create your matrix
matrix = np.reshape(np.arange(270),(18,15)) # a 18x15 matrix
# Container for your final matrices
final_matrices = []
# Then split into 3 equal chunks vertically
vertically_split_matrices = np.array_split(matrix)
for v_m in vertically_split_matrices:
# Then split the transformed matrices equally
m1, m2 = np.array_split(v_m.T, 2)
# And transform the matrices back
final_matrices.append(m1.T)
final_matrices.append(m2.T)
So I end up with 6 chunks, all of which are the same height and the same width.

How to convert a set of bytes in an array from hexa to decimal (python)?

I have a file which contains in each line a set of bytes for example:
4655d16c690f2789c2d3e803e388637f
16161b1504137217336d403e2a03c669
fa79c5ffe35d112915f0f3243fc68fb4
87d57d0a63e52b6df869eb5c0aac4328
640c2eefb7829d863f7aa686bc513acc
4024767c463558b7c7cd0ffd4f0aaa6d
18ee0b17f5b5206df0443e658b105990
7b40bf42d2cfc290eed4c4edcb9d3e91
b57dad9833c3e174e05a5ae75cac70ed
I want to convert line in an array,then convert byte in decimal, for example:
4655d16c690f2789c2d3e803e388637f
The result is:
46 55 d1 6c 69 0f 27 89 c2 d3 e8 03 e3 88 63 7f
Then convert each byte in decimal:
[70,85,209,108,105,15,39,137,194,211,232,3,136,227,99,127]
I try by using this code ,
with open(Srcpath, 'r') as f:
with open(Destpath, 'w') as fp:
for key in f:
key_Separated=[key[i:i+2] for i in range(0, len(key), 2)]
rejoined = ' '.join(key_Separated)
Decimal= [i for i, b in enumerate(rejoined ) if b=='1']
print(Decimal)
fp.write(str(Decimal))
so it gives this wrong results:
[43]
[21, 45]
[27, 42]
[13, 31, 37, 42]
[16, 21, 28]
[13, 36]
[12, 43, 46]
[0, 6, 18, 27, 37]
How could I correct them please?
This should do the trick:
import re
data = '''4655d16c690f2789c2d3e803e388637f
16161b1504137217336d403e2a03c669
fa79c5ffe35d112915f0f3243fc68fb4
87d57d0a63e52b6df869eb5c0aac4328
640c2eefb7829d863f7aa686bc513acc
4024767c463558b7c7cd0ffd4f0aaa6d
18ee0b17f5b5206df0443e658b105990
7b40bf42d2cfc290eed4c4edcb9d3e91
b57dad9833c3e174e05a5ae75cac70ed'''
data = [re.findall('..', item) for item in data.split('\n') if item]
result = [[int(x, 16) for x in item] for item in data]
Python has a built-in library for the conversion you want. Given your data as data.txt:
#!python3
from binascii import unhexlify
from pprint import pprint
with open('data.txt') as f:
pprint([list(unhexlify(line.strip())) for line in f])
[[70, 85, 209, 108, 105, 15, 39, 137, 194, 211, 232, 3, 227, 136, 99, 127],
[22, 22, 27, 21, 4, 19, 114, 23, 51, 109, 64, 62, 42, 3, 198, 105],
[250, 121, 197, 255, 227, 93, 17, 41, 21, 240, 243, 36, 63, 198, 143, 180],
[135, 213, 125, 10, 99, 229, 43, 109, 248, 105, 235, 92, 10, 172, 67, 40],
[100, 12, 46, 239, 183, 130, 157, 134, 63, 122, 166, 134, 188, 81, 58, 204],
[64, 36, 118, 124, 70, 53, 88, 183, 199, 205, 15, 253, 79, 10, 170, 109],
[24, 238, 11, 23, 245, 181, 32, 109, 240, 68, 62, 101, 139, 16, 89, 144],
[123, 64, 191, 66, 210, 207, 194, 144, 238, 212, 196, 237, 203, 157, 62, 145],
[181, 125, 173, 152, 51, 195, 225, 116, 224, 90, 90, 231, 92, 172, 112, 237]]
If using Python 2 a byte string doesn't convert into a list of integers, so there is another loop to do the conversion:
#!python2
from binascii import unhexlify
from pprint import pprint
with open('data.txt') as f:
pprint([[ord(b) for b in unhexlify(line.strip())] for line in f])

Categories