I need to create/send binary data in python using a given protocol.
The protocol calls for fixed width fields , with space padding thrown in.
Using python's struct.pack, the only thing I can think of is, calculating the space padding and adding it in myself.
Is there a better way to achieve this?
thanks
struct has a placeholder (x) for a padding byte you can use:
# pack 2 16 bit values plus one pad byte
from struct import pack
packedStrWithOneBytePad = pack("hhx", 1000, 2000)
For a 64bit based CPU, use '0l' to align the bytes with a repeat count of zero.
Example:
bytes = struct.pack('???0l',1,2,3)
print(len(bytes)) // will print 8
Related
What is a pad byte (x) in the python struct type? Is it an unsigned char with value 0, or what exactly does it look like / why is it one of the types that are available in the struct object?
For example, what would be the difference between doing one of the following:
>>> struct.pack('BB', 0, ord('a'))
b'\x00a'
>>> struct.pack('xB', ord('a'))
b'\x00a'
It's useful for matching required length of another system.
For example. In my work, there is a server that sends a fixed sized header and expects fixed sized messages. This guarantees that, lets say the first 20 bytes are a header, with bytes 0-8 being the message size.
It doesn't really matter what type the pad is. It's basically junk data. unsigned char 0 is a good choice though and the one that struct.pack uses.
I am trying to read one short and long from a binary file using python struct.
But the
print(struct.calcsize("hl")) # o/p 16
which is wrong, It should have been 2 bytes for short and 8 bytes for long. I am not sure i am using the struct module the wrong way.
When i print the value for each it is
print(struct.calcsize("h")) # o/p 2
print(struct.calcsize("l")) # o/p 8
Is there a way to force python to maintain the precision on datatypes?
By default struct alignment rules, 16 is the correct answer. Each field is aligned to match its size, so you end up with a short for two bytes, then six bytes of padding (to reach the next address aligned to a multiple of eight bytes), then eight bytes for the long.
You can use a byte order prefix (any of them disable padding), but they also disable machine native sizes (so struct.calcsize("=l") will be a fixed 4 bytes on all systems, and struct.calcsize("=hl") will be 6 bytes on all systems, not 10, even on systems with 8 byte longs).
If you want to compute struct sizes for arbitrary structures using machine native types with non-default padding rules, you'll need to go to the ctypes module, define your ctypes.Structure subclass with the desired _pack_ setting, then use ctypes.sizeof to check the size, e.g.:
from ctypes import Structure, c_long, c_short, sizeof
class HL(Structure):
_pack_ = 1 # Disables padding for field alignment
# Defines (unnamed) fields, a short followed by long
_fields_ = [("", c_short),
("", c_long)]
print(sizeof(HL))
which outputs 10 as desired.
This could be factored out as a utility function if needed (this is a simplified example that doesn't handle all struct format codes, but you can expand if needed):
from ctypes import *
FMT_TO_TYPE = dict(zip("cb?hHiIlLqQnNfd",
(c_char, c_byte, c_bool, c_short, c_ushort, c_int, c_uint,
c_long, c_ulong, c_longlong, c_ulonglong,
c_ssize_t, c_size_t, c_float, c_double)))
def calcsize(fmt, pack=None):
'''Compute size of a format string with arbitrary padding (defaults to native)'''
class _(Structure):
if pack is not None:
_pack_ = pack
_fields_ = [("", FMT_TO_TYPE[c]) for c in fmt]
return sizeof(_)
which, once defined, lets you compute sizes padded or unpadded like so:
>>> calcsize("hl") # Defaults to native "natural" alignment padding
16
>>> calcsize("hl", 1) # pack=1 means no alignment padding between members
10
This is what the doc says:
By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment
Changing it from standard to native is pretty easy: you just append the prefix = before the format characters.
print(struct.calcsize("=hl"))
EDIT
Since from the native to standard mode, some default sizes are changed, you have two options:
keeping the native mode, but switching the format characters, in this way: struct.calcsize("lh"). In C even the order of your variable inside the struct is important. Here the padding is 8 bytes, it means that every variable has to be referenced at multiple of 8 bytes.
Using the format characters of the standard mode, so: struct.calcsize("=hq")
I'm trying to read binary files containing a stream of float and int16 values. Those values are stored alternating.
[float][int16][float][int16]... and so on
now I want to read this data file by a python program using the struct functions.
For reading a block of say one such float-int16-pairs I assume the format string would be "fh".
Following output makes sense, the total size is 6 bytes
In [73]: struct.calcsize('fh')
Out[73]: 6
Now I'd like to read larger blocks at once to speed up the program...
In [74]: struct.calcsize('fhfh')
Out[74]: 14
Why is this not returning 12?
Quoting the documentation:
Note By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use standard size and alignment instead of native size and alignment: see Byte Order, Size, and Alignment for details.
https://docs.python.org/2/library/struct.html
If you want calcsize('fhfh') to be exactly twice calcsize('fh'), then you'll need to specify an alignment character.
Try '<fhfh' or '>fhfh', instead.
You have to specify the byte order or Endianness as size and alignment are based off of that So if you try this:
>>> struct.calcsize('fhfh')
>>> 14
>>> struct.calcsize('>fhfh')
>>> 12
The reason why is because in struct not specifying an endian defaults to native
for more details check here: https://docs.python.org/3.0/library/struct.html#struct.calcsize
Can someone tell me how to add a padding to the data to make it acceptable for AES256 encryption algorithm in pycrypto library (Python).
Thanks a lot in advance.. :)
Looking at the documentation, it seems that it's up to you, the library user, to pad the data yourself. The documentation states that the block size for AES is always 16 bytes, so you need to pad the data to a multiple of 16 bytes.
How the padding is done depends on the type of the data. For strings the best approach is probably to encode the string to a specific encoding and then take the length of that encoding. That way you're not relying on all characters being represented by an 8-bit codepoint:
plaintext = data.encode('utf-8')
l = len(plaintext)
ciphertext = cipher.encrypt(plaintext + ((16 - len%16) * PADDING_BYTE))
A similar approach will work when you're data is an array of bytes.
0 should work fine as the PADDING_BYTE, but you need to take care to remove the padding when you're decrypting the data. It might be worth while including the length of the data in the ciphertext, e.g. prepend the length of the data to the plaintext before encryption, but then you need to jump through some hoops to make sure the padding is generated correctly.
Edit: oh yes, just like the RFC GregS links to mentions, the standard way of handling the length problem is the use the length of the padding as the padding byte. I.e. if you need 6 bytes of padding the padding byte is 0x06. Note that if you don't need any padding, you to add a whole block of padding bytes (16 bytes of 0xa0) so that you can recover the message correctly.
Use a standard padding scheme, such as the scheme outlined in PKCS-5, section 6.1.1 step #4 (replace the 8 in that example with 16 if you are using AES).
serial.write() method in pyserial seems to only send string data. I have arrays like [0xc0,0x04,0x00] and want to be able to send/receive them via the serial port? Are there any separate methods for raw I/O?
I think I might need to change the arrays to ['\xc0','\x04','\x00'], still, null character might pose a problem.
An alternative method, without using the array module:
def a2s(arr):
""" Array of integer byte values --> binary string
"""
return ''.join(chr(b) for b in arr)
You need to convert your data to a string
"\xc0\x04\x00"
Null characters are not a problem in Python -- strings are not null-terminated the zero byte behaves just like another byte "\x00".
One way to do this:
>>> import array
>>> array.array('B', [0xc0, 0x04, 0x00]).tostring()
'\xc0\x04\x00'
I faced a similar (but arguably worse) issue, having to send control bits through a UART from a python script to test an embedded device. My data definition was "field1: 8 bits , field2: 3 bits, field3 7 bits", etc. It turns out you can build a robust and clean interface for this using the BitArray library. Here's a snippet (minus the serial set-up)
from bitstring import BitArray
cmdbuf = BitArray(length = 50) # 50 byte BitArray
cmdbuf.overwrite('0xAA', 0) # Init the marker byte at the head
Here's where it gets flexible. The command below replaces the 4 bits at
bit position 23 with the 4 bits passed. Note that it took a binary
bit value, given in string form. I can set/clear any bits at any location
in the buffer this way, without having to worry about stepping on
values in adjacent bytes or bits.
cmdbuf.overwrite('0b0110', 23)
# To send on the (previously opened) serial port
ser.write( cmdbuf )