Building up a 64-bit int with predetermined structure with Python - python

I have a 64-bit int which encodes some data in binary. I need to be able to reproduce this data so I can build test data streams.
I'm trying to use the bitstruct library to pack/unpack the data. Bitstruct seems to work well when the parts of the 64 bit int are multiples of 8, but less so when they are not. For example,
expected = 12987458926396440779
Within this data set are the following:
f = a constant. Its always 11. bits 63, 60
e = 17355 bits 59, 44
d = 9301 bits 43, 30
c = 45 bits 29, 20
b = 6 bits 19, 16
a = 203 bits 15, 0
In binary, this 64-bit int looks like this:
f e d c b a
1011 0100001111001011 10010001010101 0000101101 0110 0000000011001011
which you can get with :
>>> bin(expected)
What I'm trying to do is create a format string that reproduces this format so that I can both read and write binary using your tool. So far, the best I've been able to come up with is:
import bitstruct
from binascii import hexlify
rep = bitstruct.pack(
"u16 u4 u10 u14 u16 u4",
203,
6,
45,
9301,
17355,
11,
)
x = int.from_bytes(rep, byteorder="little")
print(rep, x, hexlify(rep), bin(x))
which outputs:
b'\x00\xcb`\xb6ET<\xbb' 13491751242084436736 b'00cb60b645543cbb' 0b1011101100111100010101000100010110110110011000001100101100000000
Clearly 13491751242084436736 != 12987458926396440779 so I've done something wrong. Could anybody suggest what?
Also: I'm using this strategy because it provides an elegant solution for both packing and unpacking - i.e. once I have the format string then it can be used for the round trip. I'm open to other solutions, however.
Thanks in advance.

Related

I have problem with the unpack function in python

excuse me because my englisch is not good.
I am a tring to decode somme binary messages with unpack in python. but i have a problem
the first message look like this
from struct import *
firstMessage = b'\x00\x00\x00\x00\xff\xff\xff\x00' #without tags
decodeFirstMessage = unpack('1q',firstMessage)
print(decodeFirstMessage[0])
the second message look like this
from struct import *
secondMessage = b'*xxyyzz \x03 \x00\x00\x00\x00\xff\xff\xff\x00 tago1;' #with tags
decodeSecondMessage = unpack('7s1s1B1sq1s6s',firstMessage)
print(decodeSecondMessage [0])
for the first code i get :
72057589742960640
as answer.
for the second code i get:
unpack requires a buffer of 31 bytes
as answer.
i have tried to verify the value of format in the function unpack with this code
print(calcsize('1q'))
print(calcsize('7s1s1B1sq1s6s'))
i get:
8
and
31
I calculated the bytes myself and get
8
and
25
when i change q with b or h in "format" i get the correct value of 18 Bytes or 19 bytes with calcsize()
but for l and q i have problem
what ist wrong in my function or how can is solve this please ?
The reason for this is padding.
Read the whole doc section Byte Order, Size, and Alignment
An example:
>>> print(calcsize('1s1q'))
16
>>> print(calcsize('=1s1q'))
9
The short version is, use this for format instead:
"=7s1s1B1s1q1s6s"
The longer version is, alignment. When using default # meaning "native" for "Byte order", "Size", and "Alignment". The format is interpreted to match what layout of corresponding C struct on the platform would be. Using = the format switches to standard sizes and turns of alignment.

Effienctly unpack mono12packed bitstring format with python

I have raw data from a camera, which is in the mono12packed format. This is an interlaced bit format, to store 2 12bit integers in 3 bytes to eliminate overhead. Explicitly the memory layout for each 3 bytes looks like this:
Byte 1 = Pixel0 Bits 11-4
Byte 2 = Pixel1 Bits 3-0 + Pixel0 Bits 3-0
Byte 3 = Pixel1 Bits 11-4
I have a file, where all the bytes can be read from using binary read, let's assume it is called binfile.
To get the pixeldata from the file I do:
from bitstring import BitArray as Bit
f = open(binfile, 'rb')
bytestring = f.read()
f.close()
a = []
for i in range(len(bytestring)/3): #reading 2 pixels = 3 bytes at a time
s = Bit(bytes = bytestring[i*3:i*3+3], length = 24)
p0 = s[0:8]+s[12:16]
p1 = s[16:]+s[8:12]
a.append(p0.unpack('uint:12'))
a.append(p1.unpack('uint:12'))
which works, but is horribly slow and I would like to do that more efficiently, because I have to do that for a huge amount of data.
My idea is, that by reading more than 3 bytes at a time I could spare some time in the conversion step, but I can't figure a way how to do that.
Another idea is, since the bits come in packs of 4, maybe there is a way to work on nibbles rather than on bits.
Data example:
The bytes
'\x07\x85\x07\x05\x9d\x06'
lead to the data
[117, 120, 93, 105]
Have you tried bitwise operators? Maybe that's a faster way:
with open('binfile.txt', 'rb') as binfile:
bytestring = list(bytearray(binfile.read()))
a = []
for i in range(0, len(bytestring), 3):
px_bytes = bytestring[i:i+3]
p0 = (px_bytes[0] << 4) | (px_bytes[1] & 0x0F)
p1 = (px_bytes[2] << 4) | (px_bytes[1] >> 4 & 0x0F)
a.append(p0)
a.append(p1)
print a
This also outputs:
[117, 120, 93, 105]
Hope it helps!

How to interpret (read) signed 24bits data from 32bits

I have raw file containing signed 24bits data packed into 32bits
example:
00 4D 4A FF
00 FF FF FF
I would like read those data and get signed integer between [-2^23 and 2^23-1]
for now I write
int32_1 = file1.read(4)
val1 = (( unpack('=l', int32_1)[0] & 0xFFFFFF00)>>8
but how to take the 2-complement into account to interpret 00FFFFFF as -1 ?
Your code is making things more complicated than they need to be. However, you really should specify the endian type correctly in the unpack format string.
from binascii import hexlify
from struct import unpack
data = ('\x00\x03\x02\x01', '\x00\x4D\x4A\xFF', '\x00\xFF\xFF\xFF')
for b in data:
i = unpack('<l', b)[0] >> 8
print hexlify(b), i
output
00030201 66051
004d4aff -46515
00ffffff -1
FWIW, here's a version that works in Python 3 or Python 2; the output is slightly different in Python 3, since normal strings in Python 3 are Unicode; byte strings are "special".
from __future__ import print_function
from binascii import hexlify
from struct import unpack
data = (b'\x00\x03\x02\x01', b'\x00\x4D\x4A\xFF', b'\x00\xFF\xFF\xFF')
for b in data:
i = unpack('<l', b)[0] >> 8
print(hexlify(b), i)
Python 3 output
b'00030201' 66051
b'004d4aff' -46515
b'00ffffff' -1
And here's a version that only runs on Python 3:
from binascii import hexlify
data = (b'\x00\x03\x02\x01', b'\x00\x4D\x4A\xFF', b'\x00\xFF\xFF\xFF')
for b in data:
i = int.from_bytes(b[1:], 'little', signed=True)
print(hexlify(b), i)
you can shift 8 bits to the left, take the result as a signed 32bit integer (use ctypes library), and divide by 256
>>> import ctypes
>>> i = 0x00ffffff
>>> i
16777215
>>> i<<8
4294967040
>>> ctypes.c_int32(i<<8).value
-256
>>> ctypes.c_int32(i<<8).value//256
-1

Base 64 decode from raw binary

I am trying to decode base64 from raw binary:
As input, I have 4 6-bit values
010000 001010 000000 011001
which I convert to decimal, giving
16 10 0 25
and finally decode using the base 64 table, giving
Q K A Z
This is verified to be the correct result.
I would like to use Python's base64 module to automate this, but using
import base64
base64.b64decode( bytearray([16,10,0,25]) )
returns an empty string.
What is the proper way to use this library with the given inputs?
[16, 10, 0, 25] isn't a base64 string, really; I don't think base64 has any functions for converting numeric representations of the base64 alphabet to their alphabetic representations. It's not difficult to roll your own, though:
def to_characters(numeric_arr):
target = b'ABCDEFGHIJKLMNOPQRSTUVWXYZ' + b'abcdefghijklmnopqrstuvwxyz' + b'0123456789' + b'+/'
return bytes(target[n] for n in numeric_arr)
Then:
>>> to_characters(bytearray([16, 10, 0, 25]))
b'QKAZ'
>>> to_characters([16, 10, 0, 25]) # <- or just this
b'QKAZ'
You can now pass this bytes object to base64.b64decode:
>>> base64.b64decode(b'QKAZ')
b'#\xa0\x19'
(Note that you had a syntax issue in your example use of bytearray - don't do bytearray[...]; do bytearray([...]). Python doesn't use C-like int array[size] syntax.)

reorder byte order in hex string (python)

I want to build a small formatter in python giving me back the numeric
values embedded in lines of hex strings.
It is a central part of my formatter and should be reasonable fast to
format more than 100 lines/sec (each line about ~100 chars).
The code below should give an example where I'm currently blocked.
'data_string_in_orig' shows the given input format. It has to be
byte swapped for each word. The swap from 'data_string_in_orig' to
'data_string_in_swapped' is needed. In the end I need the structure
access as shown. The expected result is within the comment.
Thanks in advance
Wolfgang R
#!/usr/bin/python
import binascii
import struct
## 'uint32 double'
data_string_in_orig = 'b62e000052e366667a66408d'
data_string_in_swapped = '2eb60000e3526666667a8d40'
print data_string_in_orig
packed_data = binascii.unhexlify(data_string_in_swapped)
s = struct.Struct('<Id')
unpacked_data = s.unpack_from(packed_data, 0)
print 'Unpacked Values:', unpacked_data
## Unpacked Values: (46638, 943.29999999943209)
exit(0)
array.arrays have a byteswap method:
import binascii
import struct
import array
x = binascii.unhexlify('b62e000052e366667a66408d')
y = array.array('h', x)
y.byteswap()
s = struct.Struct('<Id')
print(s.unpack_from(y))
# (46638, 943.2999999994321)
The h in array.array('h', x) was chosen because it tells array.array to regard the data in x as an array of 2-byte shorts. The important thing is that each item be regarded as being 2-bytes long. H, which signifies 2-byte unsigned short, works just as well.
This should do exactly what unutbu's version does, but might be slightly easier to follow for some...
from binascii import unhexlify
from struct import pack, unpack
orig = unhexlify('b62e000052e366667a66408d')
swapped = pack('<6h', *unpack('>6h', orig))
print unpack('<Id', swapped)
# (46638, 943.2999999994321)
Basically, unpack 6 shorts big-endian, repack as 6 shorts little-endian.
Again, same thing that unutbu's code does, and you should use his.
edit Just realized I get to use my favorite Python idiom for this... Don't do this either:
orig = 'b62e000052e366667a66408d'
swap =''.join(sum([(c,d,a,b) for a,b,c,d in zip(*[iter(orig)]*4)], ()))
# '2eb60000e3526666667a8d40'
The swap from 'data_string_in_orig' to 'data_string_in_swapped' may also be done with comprehensions without using any imports:
>>> d = 'b62e000052e366667a66408d'
>>> "".join([m[2:4]+m[0:2] for m in [d[i:i+4] for i in range(0,len(d),4)]])
'2eb60000e3526666667a8d40'
The comprehension works for swapping byte order in hex strings representing 16-bit words. Modifying it for a different word-length is trivial. We can make a general hex digit order swap function also:
def swap_order(d, wsz=4, gsz=2 ):
return "".join(["".join([m[i:i+gsz] for i in range(wsz-gsz,-gsz,-gsz)]) for m in [d[i:i+wsz] for i in range(0,len(d),wsz)]])
The input params are:
d : the input hex string
wsz: the word-size in nibbles (e.g for 16-bit words wsz=4, for 32-bit words wsz=8)
gsz: the number of nibbles which stay together (e.g for reordering bytes gsz=2, for reordering 16-bit words gsz = 4)
import binascii, tkinter, array
from tkinter import *
infile_read = filedialog.askopenfilename()
with open(infile, 'rb') as infile_:
infile_read = infile_.read()
x = (infile_read)
y = array.array('l', x)
y.byteswap()
swapped = (binascii.hexlify(y))
This is a 32 bit unsigned short swap i achieved with code very much the same as "unutbu's" answer just a little bit easier to understand. And technically binascii is not needed for the swap. Only array.byteswap is needed.

Categories