Reading 4 bytes with struct.unpack - python

I have a file whose bytes #11-15 hold an integer that is 4 bytes long. Using struct.unpack, I want to read it as a 4-byte integer. Right now, with PACK_FORMAT set to 8s2s4B2s16B96s40B40B, I read 4 separate bytes:
PACK_FORMAT = '8s2s4B2s16B96s40B40B'
fd = open('./myfile', 'r')
hdrBytes = fd.read(208)
print(repr(hdrBytes))
foo = struct.unpack(PACK_FORMAT, hdrBytes)
(Pdb) foo[0]
'MAGICSTR'
(Pdb) foo[1]
'01'
(Pdb) foo[2:6]
(48, 50, 48, 48)
(Pdb) print repr(hdrBytes)
'MAGICSTR010200a0000000001e100010........`
Now I can convert these 4 bytes to an int as:
(Pdb) int(''.join([chr(x) for x in foo[2:6]]), 16)
512
When I modified PACK_FORMAT to use using i instead of 4B to read 4 bytes, but always get an error:
foo = struct.unpack(PACK_FORMAT, hdrBytes)
error: unpack requires a string argument of length 210

It looks like you're running afoul of the alignment requirement: integers must be on a 4-byte boundary on your machine.
You can turn off alignment by starting your format string with an equals sign:
PACK_FORMAT = '=8s2si2s16B96s40B40B'

It has to do with alignment — see the docs.
import struct
PACK_FORMAT1 = '8s 2s 4B 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT1).size) # -> 208
PACK_FORMAT2 = '8s 2s i 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT2).size) # -> 210
PACK_FORMAT3 = '=8s 2s i 2s 16B 96s 40B 40B'
print(struct.Struct(PACK_FORMAT3).size) # -> 208

Related

How to convert the string between numpy.array and bytes

I want to convert the string to bytes first, and then convert it to numpy array:
utf8 string -> bytes -> numpy.array
And then:
numpy.array -> bytes -> utf8 string
Here is the test:
import numpy as np
string = "any_string_in_utf8: {}".format(123456)
test = np.frombuffer(bytes(string, 'utf-8'))
print(test)
Here is the output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/tmp/ipykernel_9055/3077694159.py in <cell line: 5>()
3 string = "any_string_in_utf8: {}".format(123456)
4
----> 5 test = np.frombuffer(bytes(string, 'utf-8'))
6 print(test)
ValueError: buffer size must be a multiple of element size
How to convert the string between numpy.array and bytes?
Solution :
Main Problem in your Code is that you haven't mentioned dtype. By default dtype was set as Float and we generally started Conversion from String that's why it was throwing ValueError: buffer size must be a multiple of element size.
But If we convert the same into unsigned int then it will work because it can't interpret Object. For more refer to the Code Snippet given below: -
# Import all the Important Modules
import numpy as np
# Initialize String
utf_8_string = "any_string_in_utf8: {}".format(123456)
# utf8 string -> bytes -> numpy.array
np_array = np.frombuffer(bytes(utf_8_string, 'utf-8'), dtype=np.uint8)
# Print 'np_array'
print("Numpy Array after Bytes Conversion : -")
print(np_array)
# numpy.array -> bytes -> utf8 string
result_str = np.ndarray.tobytes(np_array).decode("utf-8")
# Print Result for the Verification of 'Data Loss'
print("\nOriginal String After Conversion : - \n" + result_str)
To Know more about np.frombuffer(): - Click Here !
# Output of the Above Code: -
Numpy Array after Bytes Conversion : -
[ 97 110 121 95 115 116 114 105 110 103 95 105 110 95 117 116 102 56
58 32 49 50 51 52 53 54]
Original String After Conversion : -
any_string_in_utf8: 123456

Define a BitStruct inside a BitStruct in python using construct package

I am trying to read the following format (Intel -> Little-Endian):
X: 0 -> 31, size 32 bits
Offset: 32 -> 43, size 12 bits
Index: 44 -> 47, size 4 bits
Time: 48 -> 55, size 8 bits
Radius: 56 -> 63, size 8 bits
For this parser I defined:
from construct import Bitwise, BitStruct, BitsInteger
from construct import Int32sl, Int8ul
BitStruct( "X" / Bytewise(Int32sl),
"Offset" / BitsInteger(12),
"Index" / BitsInteger(4),
"Time" / Bytewise(Int8ul),
"Radius" / Bytewise(Int8ul),
)
from the folloing bytes:
bytearray(b'\xca\x11\x01\x00\x00\x07\xffu')
What I get is:
Container:
X = 70090
Offset= 0
Index = 7
Time = 255
Radius = 117
What I should have gotten is:
Container:
X = 70090
Offset = 1792
Index = 0
Time = 255
Radius= 117
As you can see, the values of Offset and Index that I get do not match with the expected values, the rest is correct.
From what I saw, i need to swap the two byes, which contains the Offset and Index values.
How could I define a struct inside a struct and swap the two bytes as well?
BitsInteger treats as Big Endian by default.
From the documentation on BitsInteger
Note that little-endianness is only defined for multiples of 8 bits.
You must set default parameter swapped to True.
swapped – bool, whether to swap byte order (little endian), default is False (big endian)
As such :
BitStruct( "X" / Bytewise(Int32sl),
"Offset" / BitsInteger(12, swapped=True),
"Index" / BitsInteger(4, swapped=True),
"Time" / Bytewise(Int8ul),
"Radius" / Bytewise(Int8ul),
)
BUT you are not using multiples of 8 so you should just swap around the initial byte array and be done with it.
By swapping the bytes in bytearray and order of the variables in the BitStruct I am able to get the correct values.
from construct import Bitwise, BitStruct, BitsInteger, Bytewise
from construct import Int32sb, Int8ul
data = bytearray(b'\xca\x11\x01\x00\x00\x07\xffu')
data_reverse = data[::-1]
format = BitStruct( "Radius" / Bytewise(Int8ul),
"Time" / Bytewise(Int8ul),
"Index" / BitsInteger(4),
"Offset" / BitsInteger(12),
"X" / Bytewise(Int32sb),
)
print(format.parse(data_reverse))
return:
Container:
Radius = 117
Time = 255
Index = 0
Offset = 1792
X = 70090
If someone have a better solution I would be more then happy to hear.

Using strings and byte-like objects compatibly in code to run in both Python 2 & 3

I'm trying to modify the code shown far below, which works in Python 2.7.x, so it will also work unchanged in Python 3.x. However I'm encountering the following problem I can't solve in the first function, bin_to_float() as shown by the output below:
float_to_bin(0.000000): '0'
Traceback (most recent call last):
File "binary-to-a-float-number.py", line 36, in <module>
float = bin_to_float(binary)
File "binary-to-a-float-number.py", line 9, in bin_to_float
return struct.unpack('>d', bf)[0]
TypeError: a bytes-like object is required, not 'str'
I tried to fix that by adding a bf = bytes(bf) right before the call to struct.unpack(), but doing so produced its own TypeError:
TypeError: string argument without an encoding
So my questions are is it possible to fix this issue and achieve my goal? And if so, how? Preferably in a way that would work in both versions of Python.
Here's the code that works in Python 2:
import struct
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack('>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
bytes = []
for _ in range(nbytes):
bytes.append(chr(n & 0xff))
n >>= 8
if minlen > 0 and len(bytes) < minlen: # zero pad?
bytes.extend((minlen-len(bytes)) * '0')
return ''.join(reversed(bytes)) # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack('>d', f)
ba = bytearray(ba)
s = ''.join('{:08b}'.format(b) for b in ba)
s = s.lstrip('0') # strip leading zeros
return s if s else '0' # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print('float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print('bin_to_float(%r): %f' % (binary, float))
print('')
To make portable code that works with bytes in both Python 2 and 3 using libraries that literally use the different data types between the two, you need to explicitly declare them using the appropriate literal mark for every string (or add from __future__ import unicode_literals to top of every module doing this). This step is to ensure your data types are correct internally in your code.
Secondly, make the decision to support Python 3 going forward, with fallbacks specific for Python 2. This means overriding str with unicode, and figure out methods/functions that do not return the same types in both Python versions should be modified and replaced to return the correct type (being the Python 3 version). Do note that bytes is a reserved word, too, so don't use that.
Putting this together, your code will look something like this:
import struct
import sys
if sys.version_info < (3, 0):
str = unicode
chr = unichr
def bin_to_float(b):
""" Convert binary string to a float. """
bf = int_to_bytes(int(b, 2), 8) # 8 bytes needed for IEEE 754 binary64
return struct.unpack(b'>d', bf)[0]
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
ba = bytearray(b'')
for _ in range(nbytes):
ba.append(n & 0xff)
n >>= 8
if minlen > 0 and len(ba) < minlen: # zero pad?
ba.extend((minlen-len(ba)) * b'0')
return u''.join(str(chr(b)) for b in reversed(ba)).encode('latin1') # high bytes at beginning
# tests
def float_to_bin(f):
""" Convert a float into a binary string. """
ba = struct.pack(b'>d', f)
ba = bytearray(ba)
s = u''.join(u'{:08b}'.format(b) for b in ba)
s = s.lstrip(u'0') # strip leading zeros
return (s if s else u'0').encode('latin1') # but leave at least one
for f in 0.0, 1.0, -14.0, 12.546, 3.141593:
binary = float_to_bin(f)
print(u'float_to_bin(%f): %r' % (f, binary))
float = bin_to_float(binary)
print(u'bin_to_float(%r): %f' % (binary, float))
print(u'')
I used the latin1 codec simply because that's what the byte mappings are originally defined, and it seems to work
$ python2 foo.py
float_to_bin(0.000000): '0'
bin_to_float('0'): 0.000000
float_to_bin(1.000000): '11111111110000000000000000000000000000000000000000000000000000'
bin_to_float('11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): '1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float('1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): '100000000101001000101111000110101001111110111110011101101100100'
bin_to_float('100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): '100000000001001001000011111101110000010110000101011110101111111'
bin_to_float('100000000001001001000011111101110000010110000101011110101111111'): 3.141593
Again, but this time under Python 3.5)
$ python3 foo.py
float_to_bin(0.000000): b'0'
bin_to_float(b'0'): 0.000000
float_to_bin(1.000000): b'11111111110000000000000000000000000000000000000000000000000000'
bin_to_float(b'11111111110000000000000000000000000000000000000000000000000000'): 1.000000
float_to_bin(-14.000000): b'1100000000101100000000000000000000000000000000000000000000000000'
bin_to_float(b'1100000000101100000000000000000000000000000000000000000000000000'): -14.000000
float_to_bin(12.546000): b'100000000101001000101111000110101001111110111110011101101100100'
bin_to_float(b'100000000101001000101111000110101001111110111110011101101100100'): 12.546000
float_to_bin(3.141593): b'100000000001001001000011111101110000010110000101011110101111111'
bin_to_float(b'100000000001001001000011111101110000010110000101011110101111111'): 3.141593
It's a lot more work, but in Python3 you can more clearly see that the types are done as proper bytes. I also changed your bytes = [] to a bytearray to more clearly express what you were trying to do.
I had a different approach from #metatoaster's answer. I just modified int_to_bytes to use and return a bytearray:
def int_to_bytes(n, minlen=0): # helper function
""" Int/long to byte string. """
nbits = n.bit_length() + (1 if n < 0 else 0) # plus one for any sign bit
nbytes = (nbits+7) // 8 # number of whole bytes
b = bytearray()
for _ in range(nbytes):
b.append(n & 0xff)
n >>= 8
if minlen > 0 and len(b) < minlen: # zero pad?
b.extend([0] * (minlen-len(b)))
return bytearray(reversed(b)) # high bytes at beginning
This seems to work without any other modifications under both Python 2.7.11 and Python 3.5.1.
Note that I zero padded with 0 instead of '0'. I didn't do much testing, but surely that's what you meant?
In Python 3, integers have a to_bytes() method that can perform the conversion in a single call. However, since you asked for a solution that works on Python 2 and 3 unmodified, here's an alternative approach.
If you take a detour via hexadecimal representation, the function int_to_bytes() becomes very simple:
import codecs
def int_to_bytes(n, minlen=0):
hex_str = format(n, "0{}x".format(2 * minlen))
return codecs.decode(hex_str, "hex")
You might need some special case handling to deal with the case when the hex string gets an odd number of characters.
Note that I'm not sure this works with all versions of Python 3. I remember that pseudo-encodings weren't supported in some 3.x version, but I don't remember the details. I tested the code with Python 3.5.

How I can convert this value to hex? (amf value)

the context:
I decode a amf response from an flex app with python.
With pyamf I can decode all the response, but one value got my attention.
this value \xa2C is transformed to 4419
#\xa2C -> 4419
#\xddI -> 11977
I know \x is related with a hex value, but I cant get the function to transform 4419 to \xa2C.
the 4419 is an integer.
--- Update 1
This original value, are not hex.
because I transform this value \xa2I to 4425.
So what kind of value is \xa2I ???
Thanks!
-- Update 2.
DJ = 5834
0F = 15
0G = error
1F = 31
a1f = 4294
adI = 5833
adg = 5863
adh = 5864
Is strange some time accept values after F and in other situation show an error. But are not hex value that is for sure.
What you're seeing is the string representation of the bytes of an AmfInteger. The first example, \xa2C consists of two bytes: 0xa2 aka 162, and C, which is the ASCII representation of 67:
>>> ord("\xa2C"[0])
162
>>> ord("\xa2C"[1])
67
To convert this into an AmfInteger, we have to follow the AMF3 specifications, section 1.3.1 (the format of an AmfInteger is the same in AMF0 and AMF3, so it doesn't matter what specification we look at).
In that section, a U29 (variable length unsigned 29-bit integer, which is what AmfIntegers use internally to represent the value) is defined as either a 1-, 2-, 3- or 4-byte sequence. Each byte encodes information about the value itself, as well as whether another byte follows. To figure out whether another byte follows the current one, one just needs to check whether the most significant bit is set:
>>> (162 & 0x80) == 0x80
True
>>> (67 & 0x80) == 0x80
False
So we now confirmed that the byte sequence you see is indeed a full U29: the first byte has its high bit set, to indicate that it's followed by another byte. The second byte has the bit unset, to indicate the end of the sequence. To get the actual value from those bytes, we now only need to combine their values, while masking out the high bit of the first byte:
>>> 162 & 0x7f
34
>>> 34 << 7
4352
>>> 4352 | 67
4419
From this, it should be easy to figure out why the other values give the results you observe.
For completeness sake, here's also a Python snippet with an example implementation that parses a U29, including all corner cases:
def parse_u29(byte_sequence):
value = 0
# Handle the initial bytes
for byte in byte_sequence[:-1]:
# Ensure it has its high bit set.
assert ord(byte) & 0x80
# Extract the value and add it to the accumulator.
value <<= 7
value |= ord(byte) & 0x7F
# Handle the last byte.
value <<= 8 if len(byte_sequence) > 3 else 7
value |= ord(byte_sequence[-1])
# Handle sign.
value = (value + 2**28) % 2**29 - 2**28
return value
print parse_u29("\xa2C"), 4419
print parse_u29(map(chr, [0x88, 0x00])), 1024
print parse_u29(map(chr, [0xFF, 0xFF, 0x7E])), 0x1ffffe
print parse_u29(map(chr, [0x80, 0xC0, 0x80, 0x00])), 0x200000
print parse_u29(map(chr, [0xBF, 0xFF, 0xFF, 0xFE])), 0xffffffe
print parse_u29(map(chr, [0xC0, 0x80, 0x80, 0x01])), -268435455
print parse_u29(map(chr, [0xFF, 0xFF, 0xFF, 0x81])), -127

Why is deviceID getting set to an empty string by the time I print it?

This one is driving me crazy:
I defined these two functions to convert between bytes and int resp. bytes and bits:
def bytes2int(bytes) : return int(bytes.encode('hex'), 16)
def bytes2bits(bytes) : # returns the bits as a 8-zerofilled string
return ''.join('{0:08b}'.format(bytes2int(i)) for i in bytes)
This works as expected:
>>> bytes2bits('\x06')
'00000110'
Now I'm reading in a binary file (with a well defined data structure) and printing out some values, which works too. Here is an example:
The piece of code which reads the bytes from the file:
dataItems = f.read(dataSize)
for i in range(10) : // now only the first 10 items
dataItemBits = bytes2bits(dataItems[i*6:(i+1)*6]) // each item is 6 bytes long
dataType = dataItemBits[:3]
deviceID = dataItemBits[3:8]
# and here printing out the strings...
# ...
print(" => data type: %8s" % (dataType))
print(" => device ID: %8s" % (deviceID))
# ...
with this output:
-----------------------
Item #9: 011000010000000000111110100101011111111111111111
97 bits: 01100001
0 bits: 00000000
62 bits: 00111110
149 bits: 10010101
255 bits: 11111111
255 bits: 11111111
=> data type: 011 // first 3 bits
=> device ID: 00001 // next 5 bits
My problem is, that I'm unable to convert the 'bit-strings' to decimal numbers; if I try to print this
print int(deviceID, 2)
it gives me an the ValueError
ValueError: invalid literal for int() with base 2: ''
although deviceID is definitely a string (I'm using it before as string and printing it out) '00001', so it's not ''.
I also checked deviceID and dataType with __doc__ and they are strings.
This one works well in the console:
>>> int('000010', 2)
2
>>> int('000110', 2)
6
What is going on here?
UPDATE:
This is really weird: when I wrap it in a try/except block, it prints out the correct value.
try :
print int(pmtNumber, 2) // prints the correct value
except :
print "ERROR!" // no exception, so this is never printed
Any ideas?

Categories