Effienctly unpack mono12packed bitstring format with python - python

I have raw data from a camera, which is in the mono12packed format. This is an interlaced bit format, to store 2 12bit integers in 3 bytes to eliminate overhead. Explicitly the memory layout for each 3 bytes looks like this:
Byte 1 = Pixel0 Bits 11-4
Byte 2 = Pixel1 Bits 3-0 + Pixel0 Bits 3-0
Byte 3 = Pixel1 Bits 11-4
I have a file, where all the bytes can be read from using binary read, let's assume it is called binfile.
To get the pixeldata from the file I do:
from bitstring import BitArray as Bit
f = open(binfile, 'rb')
bytestring = f.read()
f.close()
a = []
for i in range(len(bytestring)/3): #reading 2 pixels = 3 bytes at a time
s = Bit(bytes = bytestring[i*3:i*3+3], length = 24)
p0 = s[0:8]+s[12:16]
p1 = s[16:]+s[8:12]
a.append(p0.unpack('uint:12'))
a.append(p1.unpack('uint:12'))
which works, but is horribly slow and I would like to do that more efficiently, because I have to do that for a huge amount of data.
My idea is, that by reading more than 3 bytes at a time I could spare some time in the conversion step, but I can't figure a way how to do that.
Another idea is, since the bits come in packs of 4, maybe there is a way to work on nibbles rather than on bits.
Data example:
The bytes
'\x07\x85\x07\x05\x9d\x06'
lead to the data
[117, 120, 93, 105]

Have you tried bitwise operators? Maybe that's a faster way:
with open('binfile.txt', 'rb') as binfile:
bytestring = list(bytearray(binfile.read()))
a = []
for i in range(0, len(bytestring), 3):
px_bytes = bytestring[i:i+3]
p0 = (px_bytes[0] << 4) | (px_bytes[1] & 0x0F)
p1 = (px_bytes[2] << 4) | (px_bytes[1] >> 4 & 0x0F)
a.append(p0)
a.append(p1)
print a
This also outputs:
[117, 120, 93, 105]
Hope it helps!

Related

Little to big endian buffer at once python [duplicate]

This question already has answers here:
Efficient way to swap bytes in python
(5 answers)
Closed 4 months ago.
I've created a buffer of words represented in little endian(Assuming each word is 2 bytes):
A000B000FF0A
I've separated the buffer to 3 words(2 bytes each)
A000
B000
FF0A
and after that converted to big endian representation:
00A0
00B0
0AFF
Is there a way instead of split into words to represent the buffer in big endian at once?
Code:
buffer='A000B000FF0A'
for i in range(0, len(buffer), 4):
value = endian(int(buffer[i:i + 4], 16))
def endian(num):
p = '{{:0{}X}}'.format(4)
hex = p.format(num)
bin = bytearray.fromhex(hex).reverse()
l = ''.join(format(x, '02x') for x in bin)
return int(l, 16)
Using the struct or array libraries are probably the easiest ways to do this.
Converting the hex string to bytes first is needed.
Here is an example of how it could be done:
from array import array
import struct
hex_str = 'A000B000FF0A'
raw_data = bytes.fromhex(hex_str)
print("orig string: ", hex_str.casefold())
# With array lib
arr = array('h')
arr.frombytes(raw_data)
# arr = array('h', [160, 176, 2815])
arr.byteswap()
array_str = arr.tobytes().hex()
print(f"Swap using array: ", array_str)
# With struct lib
arr2 = [x[0] for x in struct.iter_unpack('<h', raw_data)]
# arr2 = [160, 176, 2815]
struct_str = struct.pack(f'>{len(arr2) * "h"}', *arr2).hex()
print("Swap using struct:", struct_str)
Gives transcript:
orig string: a000b000ff0a
Swap using array: 00a000b00aff
Swap using struct: 00a000b00aff
You can use the struct to interpret your bytes as big or little endian. Then you can use the hex() method of the bytearray object to have a nice string representation.
Docs for struct.
import struct
# little endian
a = struct.pack("<HHH",0xA000,0xB000,0xFF0A)
# bih endian
b = struct.pack(">HHH",0xA000,0xB000,0xFF0A)
print(a)
print(b)
# convert back to string
print( a.hex() )
print( b.hex() )
Which gives:
b'\x00\xa0\x00\xb0\n\xff'
b'\xa0\x00\xb0\x00\xff\n'
00a000b00aff
a000b000ff0a

Building up a 64-bit int with predetermined structure with Python

I have a 64-bit int which encodes some data in binary. I need to be able to reproduce this data so I can build test data streams.
I'm trying to use the bitstruct library to pack/unpack the data. Bitstruct seems to work well when the parts of the 64 bit int are multiples of 8, but less so when they are not. For example,
expected = 12987458926396440779
Within this data set are the following:
f = a constant. Its always 11. bits 63, 60
e = 17355 bits 59, 44
d = 9301 bits 43, 30
c = 45 bits 29, 20
b = 6 bits 19, 16
a = 203 bits 15, 0
In binary, this 64-bit int looks like this:
f e d c b a
1011 0100001111001011 10010001010101 0000101101 0110 0000000011001011
which you can get with :
>>> bin(expected)
What I'm trying to do is create a format string that reproduces this format so that I can both read and write binary using your tool. So far, the best I've been able to come up with is:
import bitstruct
from binascii import hexlify
rep = bitstruct.pack(
"u16 u4 u10 u14 u16 u4",
203,
6,
45,
9301,
17355,
11,
)
x = int.from_bytes(rep, byteorder="little")
print(rep, x, hexlify(rep), bin(x))
which outputs:
b'\x00\xcb`\xb6ET<\xbb' 13491751242084436736 b'00cb60b645543cbb' 0b1011101100111100010101000100010110110110011000001100101100000000
Clearly 13491751242084436736 != 12987458926396440779 so I've done something wrong. Could anybody suggest what?
Also: I'm using this strategy because it provides an elegant solution for both packing and unpacking - i.e. once I have the format string then it can be used for the round trip. I'm open to other solutions, however.
Thanks in advance.

How to get multiple 32bit values with byte array?

I need to extract some number values out of a binary data stream.
the code below is working for me, but for sure there is a more suitable way to do this in python. Especially I was struggling a lot to find a better way to iterate over the array and get 4 byte as byte arrays from the buffer.
some hint for me?
outfile = io.BytesIO()
outfile.writelines(some binary data stream)
buf = outfile.getvalue()
blen = int(len(buf) / 4 );
for i in range(blen):
a = bytearray([0,0,0,0])
a[0] = buf[i*4]
a[1] = buf[i*4+1]
a[2] = buf[i*4+2]
a[3] = buf[i*4+3]
data = struct.unpack('<l', a)[0]
do something with data
Your question and accompanying pseudo-code are somewhat hazy in my opinion, but here's something that uses slices of buf to obtain the each group of 4 bytes needed—so if nothing else it's at least a bit more succinct (assuming I've correctly interpreted what you're asking):
import io
import struct
outfile = io.BytesIO()
outfile.writelines([b'\x00\x01\x02\x03',
b'\x04\x05\x06\x07'])
buf = outfile.getvalue()
for i in range(0, len(buf), 4):
data = struct.unpack('<l', buf[i:i+4])[0]
print(hex(data))
Output:
0x3020100
0x7060504

Python - Efficient way to flip bytes in a file?

I've got a folder full of very large files that need to be byte flipped by a power of 4. So essentially, I need to read the files as a binary, adjust the sequence of bits, and then write a new binary file with the bits adjusted.
In essence, what I'm trying to do is read a hex string hexString that looks like this:
"00112233AABBCCDD"
And write a file that looks like this:
"33221100DDCCBBAA"
(i.e. every two characters is a byte, and I need to flip the bytes by a power of 4)
I am very new to python and coding in general, and the way I am currently accomplishing this task is extremely inefficient. My code currently looks like this:
import binascii
with open(myFile, 'rb') as f:
content = f.read()
hexString = str(binascii.hexlify(content))
flippedBytes = ""
inc = 0
while inc < len(hexString):
flippedBytes += file[inc + 6:inc + 8]
flippedBytes += file[inc + 4:inc + 6]
flippedBytes += file[inc + 2:inc + 4]
flippedBytes += file[inc:inc + 2]
inc += 8
..... write the flippedBytes to file, etc
The code I pasted above accurately accomplishes what I need (note, my actual code has a few extra lines of: "hexString.replace()" to remove unnecessary hex characters - but I've left those out to make the above easier to read). My ultimate problem is that it takes EXTREMELY long to run my code with larger files. Some of my files I need to flip are almost 2gb in size, and the code was going to take almost half a day to complete one single file. I've got dozens of files I need to run this on, so that timeframe simply isn't practical.
Is there a more efficient way to flip the HEX values in a file by a power of 4?
.... for what it's worth, there is a tool called WinHEX that can do this manually, and only takes a minute max to flip the whole file.... I was just hoping to automate this with python so we didn't have to manually use WinHEX each time
You want to convert your 4-byte integers from little-endian to big-endian, or vice-versa. You can use the struct module for that:
import struct
with open(myfile, 'rb') as infile, open(myoutput, 'wb') as of:
while True:
d = infile.read(4)
if not d:
break
le = struct.unpack('<I', d)
be = struct.pack('>I', *le)
of.write(be)
Here is a little struct awesomeness to get you started:
>>> import struct
>>> s = b'\x00\x11\x22\x33\xAA\xBB\xCC\xDD'
>>> a, b = struct.unpack('<II', s)
>>> s = struct.pack('>II', a, b)
>>> ''.join([format(x, '02x') for x in s])
'33221100ddccbbaa'
To do this at full speed for a large input, use struct.iter_unpack

How to byte-swap a 32-bit integer in python?

Take this example:
i = 0x12345678
print("{:08x}".format(i))
# shows 12345678
i = swap32(i)
print("{:08x}".format(i))
# should print 78563412
What would be the swap32-function()? Is there a way to byte-swap an int in python, ideally with built-in tools?
One method is to use the struct module:
def swap32(i):
return struct.unpack("<I", struct.pack(">I", i))[0]
First you pack your integer into a binary format using one endianness, then you unpack it using the other (it doesn't even matter which combination you use, since all you want to do is swap endianness).
Big endian means the layout of a 32 bit int has the most significant byte first,
e.g. 0x12345678 has the memory layout
msb lsb
+------------------+
| 12 | 34 | 56 | 78|
+------------------+
while on little endian, the memory layout is
lsb msb
+------------------+
| 78 | 56 | 34 | 12|
+------------------+
So you can just convert between them with some bit masking and shifting:
def swap32(x):
return (((x << 24) & 0xFF000000) |
((x << 8) & 0x00FF0000) |
((x >> 8) & 0x0000FF00) |
((x >> 24) & 0x000000FF))
From python 3.2 you can define function swap32() as the following:
def swap32(x):
return int.from_bytes(x.to_bytes(4, byteorder='little'), byteorder='big', signed=False)
It uses array of bytes to represent the value and reverses order of bytes by changing endianness during conversion back to integer.
Maybe simpler use the socket library.
from socket import htonl
swapped = htonl (i)
print (hex(swapped))
that's it.
this library also works in the other direction with ntohl
The array module provides a byteswap() method for fixed sized items.
The array module appears to be in versions back to Python 2.7
array.byteswap()
“Byteswap” all items of the array. This is only supported for values which are 1, 2, 4, or 8 bytes in size;
Along with the fromfile() and tofile() methods, this module is quite easy to use:
import array
# Open a data file.
input_file = open( 'my_data_file.bin' , 'rb' )
# Create an empty array of unsigned 4-byte integers.
all_data = array.array( 'L' )
# Read all the data from the file.
all_data.fromfile( input_file , 16000 ) # assumes the size of the file
# Swap the bytes in all the data items.
all_data.byteswap( )
# Write all the data to a new file.
output_file = open( filename[:-4] + '.new' , 'wb' ) # assumes a three letter extension
all_data.tofile( output_file )
# Close the files.
input_file.close( )
output_file_close( )
The above code worked for me since I have fixed-size data files. There are more Pythonic ways to handle variable length files.

Categories