I am trying to read one short and long from a binary file using python struct.
But the
print(struct.calcsize("hl")) # o/p 16
which is wrong, It should have been 2 bytes for short and 8 bytes for long. I am not sure i am using the struct module the wrong way.
When i print the value for each it is
print(struct.calcsize("h")) # o/p 2
print(struct.calcsize("l")) # o/p 8
Is there a way to force python to maintain the precision on datatypes?
By default struct alignment rules, 16 is the correct answer. Each field is aligned to match its size, so you end up with a short for two bytes, then six bytes of padding (to reach the next address aligned to a multiple of eight bytes), then eight bytes for the long.
You can use a byte order prefix (any of them disable padding), but they also disable machine native sizes (so struct.calcsize("=l") will be a fixed 4 bytes on all systems, and struct.calcsize("=hl") will be 6 bytes on all systems, not 10, even on systems with 8 byte longs).
If you want to compute struct sizes for arbitrary structures using machine native types with non-default padding rules, you'll need to go to the ctypes module, define your ctypes.Structure subclass with the desired _pack_ setting, then use ctypes.sizeof to check the size, e.g.:
from ctypes import Structure, c_long, c_short, sizeof
class HL(Structure):
_pack_ = 1 # Disables padding for field alignment
# Defines (unnamed) fields, a short followed by long
_fields_ = [("", c_short),
("", c_long)]
print(sizeof(HL))
which outputs 10 as desired.
This could be factored out as a utility function if needed (this is a simplified example that doesn't handle all struct format codes, but you can expand if needed):
from ctypes import *
FMT_TO_TYPE = dict(zip("cb?hHiIlLqQnNfd",
(c_char, c_byte, c_bool, c_short, c_ushort, c_int, c_uint,
c_long, c_ulong, c_longlong, c_ulonglong,
c_ssize_t, c_size_t, c_float, c_double)))
def calcsize(fmt, pack=None):
'''Compute size of a format string with arbitrary padding (defaults to native)'''
class _(Structure):
if pack is not None:
_pack_ = pack
_fields_ = [("", FMT_TO_TYPE[c]) for c in fmt]
return sizeof(_)
which, once defined, lets you compute sizes padded or unpadded like so:
>>> calcsize("hl") # Defaults to native "natural" alignment padding
16
>>> calcsize("hl", 1) # pack=1 means no alignment padding between members
10
This is what the doc says:
By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment
Changing it from standard to native is pretty easy: you just append the prefix = before the format characters.
print(struct.calcsize("=hl"))
EDIT
Since from the native to standard mode, some default sizes are changed, you have two options:
keeping the native mode, but switching the format characters, in this way: struct.calcsize("lh"). In C even the order of your variable inside the struct is important. Here the padding is 8 bytes, it means that every variable has to be referenced at multiple of 8 bytes.
Using the format characters of the standard mode, so: struct.calcsize("=hq")
Related
I running this code in python:
import struct
res = struct.pack('hhl', 1, 2, 3)
print(res)
and I get the following output:
b'\x01\x00\x02\x00\x00\x00\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00'
but I don't understand why this is the output? after all, the format h means 2 bytes, and the format l means 4 bytes. so why i get this output in this case?
From the struct doc,
To handle platform-independent data formats or omit implicit pad
bytes, use standard size and alignment instead of native size and
alignment
By default h may be padded, depending on the platform you are running on. Select "big-endian" (>), "little-endian" (<) or the alternate native style (=) to remove padding. For example,
>>> struct.Struct('hhl').size
16
>>> struct.Struct('<hhl').size
8
>>> struct.Struct('>hhl').size
8
>>> struct.Struct('=hhl').size
8
You would choose one depending on what pattern you are trying to match. If its a C structure for a natively compiled app, it depends on native memory layout (e.g., 16 bit architectures) and whether the compiler packs or pads. If a network protocol, big or little endian, depending on its specification.
In Ruby, I could easily pack an array representing some sequence into a binary string:
# for int
# "S*!" directive means format for 16-bit int, and using native endianess
# 16-bit int, so each digit was represented by two bytes. "\x01\x00" and "\x02\x00"
# here the native endianess is "little endian", so you should
# look at it backwards, "\x01\x00" becomes 0001, and "\x02\x00" becomes 0002
"\x01\x00\x02\x00".unpack("S!*")
# [1, 2]
# for hex
# "H*" means every element in the array is a digit for the hexstream
["037fea0651b358c361de"].pack("H*")
# "\x03\x7F\xEA\x06Q\xB3X\xC3a\xDE"
API doc for pack and unpack.
I couldn't find an uniform and equivalent way of transforming sequence to bytes (or vice versa) in python.
While struct provides methods for packing into bytes objects, the format string available has no option for hexstream.
EDIT: What I really want is something as versatile as Ruby's arr.pack and str.unpack, which supports multiple formatting and endianess control.
for a string in the utf-8 range it would be:
from binascii import unhexlify
strg = "464F4F"
unhexlify(strg).decode() # FOO (str)
if your content is just binary
strg = "037fea0651b358c361de"
unhexlify(strg) # b'\x03\x7f\xea\x06Q\xb3X\xc3a\xde' (bytes)
also bytes.fromhex (as in Davis Herring's answer) may be worth checking out.
struct does only fixed-width encodings that correspond to a memory dump of something like a C struct. You want bytes.fromhex or binascii.unhexlify, depending on the source type (which is never a list).
After any such conversion, you can use struct.unpack on a byte string containing any number of “records” corresponding to the format string; each is decoded into an element of the returned tuple. The format string supports the usual integer sizes and endianness choices; it is of course possible to construct a format dynamically to do things like read a matrix whose dimensions are chosen at runtime:
mat=struct.unpack("%dd"%cols,buf) # rows determined from len(buf)
It’s also possible to construct a lower-memory array if the element type is primitive; then you can follow up with byteswap as Alec A mentioned. NumPy offers similar facilities.
Try memoryview.cast, which allows you to change the endianness of an array or byte object.
Storing values as arrays makes things easier, as you can use the byteswap function.
I'm trying to read binary files containing a stream of float and int16 values. Those values are stored alternating.
[float][int16][float][int16]... and so on
now I want to read this data file by a python program using the struct functions.
For reading a block of say one such float-int16-pairs I assume the format string would be "fh".
Following output makes sense, the total size is 6 bytes
In [73]: struct.calcsize('fh')
Out[73]: 6
Now I'd like to read larger blocks at once to speed up the program...
In [74]: struct.calcsize('fhfh')
Out[74]: 14
Why is this not returning 12?
Quoting the documentation:
Note By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment: see Byte Order, Size, and Alignment for details.
https://docs.python.org/2/library/struct.html
If you want calcsize('fhfh') to be exactly twice calcsize('fh'), then you'll need to specify an alignment character.
Try '<fhfh' or '>fhfh', instead.
You have to specify the byte order or Endianness as size and alignment are based off of that So if you try this:
>>> struct.calcsize('fhfh')
>>> 14
>>> struct.calcsize('>fhfh')
>>> 12
The reason why is because in struct not specifying an endian defaults to native
for more details check here: https://docs.python.org/3.0/library/struct.html#struct.calcsize
I'm having trouble using the struct.pack() for packing an integer.
With
struct.pack("BIB", 1, 0x1234, 0)
I'm expecting
'\x01\x00\x00\x034\x12\x00'
but instead I got
'\x01\x00\x00\x004\x12\x00\x00\x00'
I'm probably missing something here. Please help.
'\x01\x00\x00\x004\x12\x00\x00\x00'
^ this '4' is not part of a hex escape
is actually the same as:
'\x01\x00\x00\x00\x34\x12\x00\x00\x00'
Because the ASCII code for "4" is 0x34.
Because you used the default (native) format, Python used native alignment for the data, so the second field was aligned to offset 4 and 3 zeroes were added before it.
To get a result more like what you wanted, use the format >BIB or <BIB (for big-endian or little-endian respectively) This gives you '\x01\x00\x00\x12\x34\x00' or '\x01\x34\x12\x00\x00\x00'. Neither of those are exactly what you specified, because the example you gave was not proper big-endian or little-endian representation of 0x1234.
See also: section Byte Order, Size, and Alignment in the documentation.
From the docs
Note By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types
involved; similarly, alignment is taken into account when unpacking.
This behavior is chosen so that the bytes of a packed struct
correspond exactly to the layout in memory of the corresponding C
struct. To handle platform-independent data formats or omit implicit
pad bytes, use standard size and alignment instead of native size and
alignment: see Byte Order, Size, and Alignment for details.
You can get your desired result by forcing the byte order. (chr(0x34) == '4')
>>> struct.pack(">BIB", 1, 0x1234, 0)
'\x01\x00\x00\x124\x00'
I need to create/send binary data in python using a given protocol.
The protocol calls for fixed width fields , with space padding thrown in.
Using python's struct.pack, the only thing I can think of is, calculating the space padding and adding it in myself.
Is there a better way to achieve this?
thanks
struct has a placeholder (x) for a padding byte you can use:
# pack 2 16 bit values plus one pad byte
from struct import pack
packedStrWithOneBytePad = pack("hhx", 1000, 2000)
For a 64bit based CPU, use '0l' to align the bytes with a repeat count of zero.
Example:
bytes = struct.pack('???0l',1,2,3)
print(len(bytes)) // will print 8