Unspecified byte lengths in Python

Unspecified byte lengths in Python - python

I'm writing a client for a P2P application at the minute and the spec for the protocol says that the header for each packet should have each field with a particular byte length like so:
Version: 1 Byte
Type: 1 Byte
Length: 2 Bytes
And then the data
I've got the way of packing and unpacking the header fields (I think) like this:
packed = struct.pack('cch' , '1' , '1' , 26)
This constructs a header for a packet with a data length of 26, but when it comes to unpacking the data I'm unsure how to go about getting the rest of the data afterwards. To unpack we need to know the size of all the fields, unless I'm missing something? I guess to pack the data I'd use a format indicator 'cch26s' meaning:
1 Byte char
1 Byte char
2 Byte short
26 Byte char array
But how do I unpack the data when I don't know how much data will be included in the packet first?

The way you're describing the protocol, you should unpack the first four bytes first, and extract Length (a 16-bit int). This tells you how many bytes to unpack in a second step.
version, type, length = struct.unpack("cch", packed[:4])
content, = struct.unpack("%ds" % length, packed[4:])
This is if everything checks out. unpack() requires that the packed buffer contain exactly as much data as you unpack. Also, check whether the 4 header bytes are included in the length count.

You can surmise the number of characters to unpack by inspecting len(data).
Here is a helper function which does this for you:
def unpack(fmt, astr):
"""
Return struct.unpack(fmt, astr) with the optional single * in fmt replaced with
the appropriate number, given the length of astr.
"""
# http://stackoverflow.com/a/7867892/190597
try:
return struct.unpack(fmt, astr)
except struct.error:
flen = struct.calcsize(fmt.replace('*', ''))
alen = len(astr)
idx = fmt.find('*')
before_char = fmt[idx-1]
n = (alen-flen)/struct.calcsize(before_char)+1
fmt = ''.join((fmt[:idx-1], str(n), before_char, fmt[idx+1:]))
return struct.unpack(fmt, astr)
You can use it like this:
unpack('cchs*', data)

Related

load hex file with intelhex format in python

I saved one data set(200 double data values) from Keil, it turns to be a .hex file with IntelHex format, I installed IntelHex in python and load it. Now the problem is I do not know how to interpret it, for example, this post
tells you to use dict, but it does not work for hex file including double data. my code:
from intelhex import IntelHex
ih = IntelHex() # create empty object
ih.loadhex('output.hex')
ihdict = ih.todict()
datastr = ""
startAddress = 536871952
while ihdict.get(startAddress) != None:
datastr += str("%0.2X" %ihdict.get(startAddress))
startAddress += 1
the file output.hex:
:020000042000DA
:0802A8003FB7809F5BC03F409F
:1002B000DFB56EEF5AB73F407F717CBF38BE3F401D
:1002C000DFD369EFE9B43F407F717CBF38BE3F4068
:1002D0003F895E9F44AF3F401F706A0F38B53F4073
:1002E0009F20584F10AC3F405F5F72AF2FB93F4027
:1002F000DFB56EEF5AB73F40DF5B7DEFADBE3F40ED
:10030000BFA364DF51B23F40DF62676FB1B33F40CC
:100310001F9E8C0F4FC63F405F0C6B2F86B53F4032
:10032000BF7542DF3AA13F403F4D689F26B43F4032
:100330009F2742CF13A13F40DF2D5BEF96AD3F409B
:100340009F915ACF48AD3F40DF874CEF43A63F40D7
:100350007FD2573FE9AB3F40FD721E7F398F3F4050
:10036000FF892FFFC4973F409D5311CFA9883F407D
:100370001F706A0F38B53F407F78663F3CB33F40FF
:100380001DFD148F7E8A3F401F954F8FCAA73F40A7
:100390005FC04D2FE0A63F401F0D3C8F069E3F40A3
:1003A0007F4A443F25A23F40DFE13DEFF09E3F40C2
:1003B0003F185C1F0CAE3F403F79379FBC9B3F40CE
:1003C000FF2F3EFF179F3F40DFBC586F5EAC3F40A2
:1003D000FD36287F1B943F403F3D419F9EA03F40FC
:1003E000FFFA317FFD983F409FF2354FF99A3F4029
:1003F0007D0511BF82883F40DF703B6FB89D3F4055
:10040000FF1143FF88A13F40DD60146F308A3F40F9
:100410001F49328F24993F407D230CBF11863F40F6
:100420009DBD29CFDE943F40BFED2EDF76973F4044
:10043000DDBA056FDD823F407D58183F2C8C3F4070
:100440007F3333BF99993F40DD9C0A6F4E853F4013
:100450007F3333BF99993F403DBC171FDE8B3F4030
:10046000BDF4185F7A8C3F403D16091F8B843F40D6
:100470003DE1FC9E707E3F40DD0D0DEF86863F40E6
:100480003F1F469F0FA33F403DFFF79EFF7B3F402E
:10049000DD42196FA18C3F40BDC6F65E637B3F40D5
:1004A0009D5AFB4EAD7D3F40FD4BE6FE25733F4020
:1004B0001D12D30E89693F403D8EF51EC77A3F401D
:1004C0003DBC171FDE8B3F409D7FE0CE3F703F401D
:1004D000BD6C055FB6823F40DD4903EFA4813F401C
:1004E000DD14F76E8A7B3F40DDF6FB6EFB7D3F40FF
:1004F000BD20E85E10743F40DD6EE86E37743F400B
:100500001DB1F78ED87B3F407DFCD33EFE693F4056
:100510009D4C274FA6933F407DD7EEBE6B773F4063
:10052000DDAADE6E556F3F401D55B38EAA593F4080
:100530009DF7CCCE7B663F40DDCFC3EEE7613F4009
:10054000BDF2C55EF9623F40BD7AD95EBD6C3F40E9
:100550005D90D82E486C3F40BD7AD95EBD6C3F405F
:100560003D67BD9EB35E3F403D0DCC9E06663F405D
:10057000BD88AD5EC4563F40BDA6A85E53543F4003
:10058000BDD4CA5E6A653F40FD95B0FE4A583F4003
:100590003DA3B39ED1593F405DF89D2EFC4E3F4098
:1005A000FD69E1FEB4703F40FD59BAFE2C5D3F404D
:1005B000BD63C8DE31643F407DB0B63E585B3F400E
:1005C0001DCD9F8EE64F3F405DEAC92EF5643F404A
:1005D000FD4993FEA4493F405D8E852EC7423F40B2
:1005E0007D4D88BE26443F401D3EA20E1F513F4018
:1005F0003D938C9E49463F40FD7E9F7EBF4F3F40CE
:10060000DDB1C8EE58643F40BD7F70DE3F383F40EB
:100610005DBCA72EDE533F409D4197CEA04B3F408F
:100620003D0B799E853C3F403D0B799E853C3F408C
:10063000BD7F70DE3F383F407D6B83BEB5413F409C
:10064000FD32827E19413F409D2A864E15433F4030
:10065000FDEFA1FEF7503F401DC4620E62313F40E6
:100660003D476F9EA3373F401D98930ECC493F40B6
:100670001D53608E29303F403DF4671EFA333F40E2
:100680003D048F1E82473F407D726D3EB9363F402C
:10069000FDF68B7EFB453F40FD5767FEAB333F4089
:1006A0005D7774AE3B3A3F40BD2C695E96343F4067
:1006B0009DF579CEFA3C3F40DD4C476EA6233F4086
:1006C0001D1E540E0F2A3F407DE36FBEF1373F40A1
:1006D000FD7C4C7E3E263F40DD8153EEC0293F40ED
:1006E0009D034ECE01273F409D1375CE893A3F4072
:1006F000DDC4336EE2193F407D444B3EA2253F40AE
:100700005DD84F2EEC273F407D0F3FBE871F3F40F7
:100710007DF143BEF8213F40BDE04B5EF0253F40F8
:100720005D8548AE42243F405DD84F2EEC273F40C8
:100730007D5B5CBE2D2E3F40FD40567E202B3F4012
:10074000BD514EDE28273F40FD2945FE94223F4003
:100750001DD9208E6C103F409DE552CE72293F403E
:100760003D91399EC81C3F407D16293E8B143F4069
:100770007D6930BE34183F409D9935CECC1A3F403C
:100780005DFD34AE7E1A3F409DB730CE5B183F40D2
:100790003DAF349E571A3F405D8C322E46193F4084
:1007A0003D81129E40093F405DC8282E64143F40A1
:1007B0007D34243E1A123F405DC13EAE601F3F4073
:1007C0003D2E0B1E97053F405D6E372EB71B3F40F9
:1007D0003D09269E04133F403DCD2F9EE6173F4026
:1007E000BD6F49DEB7243F403D5C2D1EAE163F4035
:1007F0001DE00A0E70053F403DBD089E5E043F406F
:100800009D890ECE44073F401D86190EC30C3F4004
:10081000BDD70EDE6B073F401DBB258EDD123F406E
:100820001DBB258EDD123F407D06023E03013F4089
:10083000BDD0245E68123F40FDA81B7ED40D3F4012
:100840005D14462E0A233F40FD12347E091A3F40B4
:100850009D180C4E0C063F401D681E0E340F3F4085
:100860001D510D8EA8063F403DD4191EEA0C3F4095
:100870001B94ED0DCAF63E405D40152EA00A3F4088
:100880009D64294EB2143F407DF82D3EFC163F403A
:100890001DD9208E6C103F405DE6232EF3113F40A2
:1008A0007DA526BE52133F40BDB913DEDC093F4093
:1008B0005D2904AE14023F40BBE5E2DD72F13E402B
:1008C0009D180C4E0C063F40BD84075EC2033F409E
:1008D0005D221A2E110D3F40FDC6167E630B3F4070
:0108E0009D7A
:00000001FF

Assuming the data represents a list of 64-bit floating point numbers that you want to decode, the process is to collect the appropriate number of octets and decode them as a double.
Reusing the structure you presented:
from intelhex import IntelHex
import struct
ih = IntelHex()
ih.loadhex('output.hex')
ihdict = ih.todict()
# Read all the data into a long list of int octets
data = []
startAddress = 536871952
while ihdict.get(startAddress) is not None:
data.append(ihdict.get(startAddress))
startAddress += 1
# slice the list into 8-byte bytearrays
bin_arr = [bytearray(data[n:n+8]) for n in range(0, len(data), 8)]
# unpack each bytearray as a double
# Filter for 8 byte arrays because len(data) is not divisible by 8.
# Is the data properly aligned?
doubles_list = [struct.unpack('d', b) for b in bin_arr if len(b) == 8]
print(doubles_list)
It may be worth mentioning that the above assumes a big endian byte ordering. I believe you can use < as part of the format definition to assume a little endian ordering. More information is available in the struct.unpack docs.

Get int value from each two bytes

I am trying to read bytes from an image, and get all the int (16 bit) values from that image.
After I parsed the image header, I got to the pixel values. The values that I get when the pair of bytes are like b"\xd4\x00" is incorrect. In this case it should be 54272, not 3392.
This are parts of the code:
I use a generator to get the bytes:
import itertools
def osddef_generator(in_file):
with open(in_file, mode='rb') as f:
dat = f.read()
for byte in dat:
yield byte
def take_slice(in_generator, size):
return ''.join(str(chr(i)) for i in itertools.islice(in_generator, size))
def take_single_pixel(in_generator):
pix = itertools.islice(in_generator, 2)
hex_list = [hex(i) for i in pix]
hex_str = "".join(hex_list)[2:].replace("0x", '')
intval = int(hex_str, 16)
print("hex_list: ", hex_list)
print("hex_str: ", hex_str)
print("intval: ", intval)
After I get the header correctly using the take_slice method, I get to the part with the pixel values, where I use the take_single_pixel method.
Here, I get the bad results.
This is what I get:
hex_list: ['0xd4', '0x0']
hex_str: d40
intval: 3392
But the actual sequence of bytes that should be interpreted is: \xd4\x00, which equals to 54272, so that my hex_list = ['0xd4', '0x00'] and hex_str = d400.
Something happens when I have a sequence of bytes when the second one is \x00.
Got any ideas? Thanks!

There are much better ways of converting bytes to integters:
int.from_bytes() takes bytes input, and a byte order argument:
>>> int.from_bytes(b"\xd4\x00", 'big')
54272
>>> int.from_bytes(b"\xd4\x00", 'little')
212
The struct.unpack() function lets you convert a whole series of bytes to integers following a pattern:
>>> import struct
>>> struct.unpack('!4H', b'\xd4\x00\xd4\x00\xd4\x00\xd4\x00')
(54272, 54272, 54272, 54272)
The array module lets you read binary data representing homogenous integer data into a memory structure efficiently:
>>> array.array('H', fileobject)
However, array can't be told what byte order to use. You'd have to determine the current architecture byte order and call arr.byteswap() to reverse order if the machine order doesn't match the file order.
When reading image data, it is almost always preferable to use the struct module to do the parsing. You generally then use file.read() calls with specific sizes; if the header consists of 10 bytes, use:
headerinfo = struct.unpack('<expected header pattern for 10 bytes>', f.read(10))
and go from there. For examples, look at the Pillow / PIL image plugins source code; here is how the Blizzard Mipmap image format header is read:
def _read_blp_header(self):
self._blp_compression, = struct.unpack("<i", self.fp.read(4))
self._blp_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_depth, = struct.unpack("<b", self.fp.read(1))
self._blp_alpha_encoding, = struct.unpack("<b", self.fp.read(1))
self._blp_mips, = struct.unpack("<b", self.fp.read(1))
self._size = struct.unpack("<II", self.fp.read(8))
if self.magic == b"BLP1":
# Only present for BLP1
self._blp_encoding, = struct.unpack("<i", self.fp.read(4))
self._blp_subtype, = struct.unpack("<i", self.fp.read(4))
self._blp_offsets = struct.unpack("<16I", self.fp.read(16 * 4))
self._blp_lengths = struct.unpack("<16I", self.fp.read(16 * 4))
Because struct.unpack() always returns tuples, you can assign individual elements in a tuple to name1, name2, ... names on the left-hand size, including single_name, = assignments to extract a single result.
The separate set of read calls above could also be compressed into fewer calls:
comp, enc, adepth, aenc, mips, *size = struct.unpack("<i4b2I", self.fp.read(16))
if self.magic == b"BLP1":
# Only present for BLP1
enc, subtype = struct.unpack("<2i", self.fp.read(8))
followed by specific attribute assignments.

Python struct as networking data packets (uknown byte sequence)

I am working on a server engine in Python, for my game made in GameMaker Studio 2. I'm currently having some issues with making and sending a packet.
I've successfully managed to establish a connection and send the first packet, but I can't find a solution for sending data in a sequence of which if the first byte in the packed struct is equal to a value, then unpack other data into a given sequence.
Example:
types = 'hhh' #(message_id, x, y) example
message_id = 0
x = 200
y = 200
buffer = pack(types, 0,x, y)
On the server side:
data = conn.recv(BUFFER_SIZE)
mid = unpack('h', data)[0]
if not data: break
if mid == 0:
sequnce = 'hhh'
x = unpack(sequnce, data)[1]
y = unpack(sequnce, data)[2]

It looks like your subsequent decoding is going to vary based on the message ID?
If so, you will likely want to use unpack_from which allows you to pull only the first member from the data (as written now, your initial unpack call will generate an exception because the buffer you're handing it is not the right size). You can then have code that varies the unpacking format string based on the message ID. That code could look something like this:
from struct import pack, unpack, unpack_from
while True:
data = conn.recv(BUFFER_SIZE)
# End of file, bail out of loop
if not data: break
mid = unpack_from('!h', data)[0]
if mid == 0:
# Message ID 0
types = '!hhh'
_, x, y = unpack(types, data)
# Process message type 0
...
elif mid == 1:
types = '!hIIq'
_, v, w, z = unpack(types, data)
# Process message type 1
...
elif mid == 2:
...
Note that we're unpacking the message ID again in each case along with the ID-specific parameters. You could avoid that if you like by using the optional offset argument to unpack_from:
x, y = unpack_from('!hh', data, offset=2)
One other note of explanation: If you are sending messages between two different machines, you should consider the "endianness" (byte ordering). Not all machines are "little-endian" like x86. Accordingly it's conventional to send integers and other structured numerics in a certain defined byte order - traditionally that has been "network byte order" (which is big-endian) but either is okay as long as you're consistent. You can easily do that by prepending each format string with '!' or '<' as shown above (you'll need to do that for every format string on both sides).
Finally, the above code probably works fine for a simple "toy" application but as your program increases in scope and complexity, you should be aware that there is no guarantee that your single recv call actually receives all the bytes that were sent and no other bytes (such as bytes from a subsequently sent buffer). In other words, it's often necessary to add a buffering layer, or otherwise ensure that you have received and are operating on exactly the number of bytes you intended.

Could you unpack whole data to list, and then check its elements in the loop? What is the reason to unpack it 3 times? I guess, you could unpack it once, and then work with that list - check its length first, if not empty -> check first element -> if equal to special one, continue on list parsing. Did you try like that?

How to convert b'\xc8\x00' to float?

I actually get a value (b'\xc8\x00') from a temperature sensor. I want to convert it to a float value. Is it right, that I need to decode it?
Here is my function:
def ToFloat(data):
s = data.decode('utf-8')
print(s)
return s
But when I try to compile it, I get the error:
'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation byte

You seem to be having packed bytes not unicode objects. Use struct.unpack:
In [3]: import struct
In [4]: struct.unpack('h', b'\xc8\x00')[0]
Out[4]: 200
Format h specifies a short value (2 bytes). If your temperature values will always be positive, you can use H for unsigned short:
import struct
def to_float(data):
return float(struct.unpack('H', data)[0])

Notice that ToFloat() is a bit irritating as it returns a float but interpretes the data as integer values. If the bytes are representing a float, it would be necessary to know in which format the float is packed into these two bytes (usually float takes more than two bytes).
data = b'\xc8\x00'
def ToFloat(data):
byte0 = int(data[0])
print(byte0)
byte1 = int(data[1])
print(byte1)
number = byte0 + 256*byte1
print(number)
return float(number)
returns: 200.0 what seems to be reasonable. If not, just see what the data mean and process accordingly.

How does "\x"+* work?

I'm trying to make a filesharing program, so I open the files in readbinary, and read it, establish a connection, and I try to send byte for byte.
How can I send b"\x" + (encoded bytes of the int from dataread[i])?
I always gives me an error, also, if it won't work, how can I read exactly a byte? So that I don't get an int? (like dataread[0], if the value is "\x01", I get 1).
My code:
for g in range(len(datar)):
esc = str(datar[g])
if len(esc) == 1:
esc = "0"+esc
esc = "\x"+bytes(esc,"utf8")
c.send(esc)
c.recv(500)
print(g,"Bytes von",len(datar),"gesendet")

The '\xhh' notation only works in string or byte literals. If you have an integer, just pass this to a bytes() object in a list:
bytes(dataread) # if dataread is a list of integers
or
bytes([dataread]) # if dataread is a single integer
bytes objects are sequences of integer values, each limited to the range 0-255.
To send individual bytes from datar, that translates to:
for byte in datar:
c.send(bytes([esc]))
c.recv(500)
print(g,"Bytes von",len(datar),"gesendet")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unspecified byte lengths in Python - python

Related

load hex file with intelhex format in python

Get int value from each two bytes

Python struct as networking data packets (uknown byte sequence)

How to convert b'\xc8\x00' to float?

How does "\x"+* work?

Categories

Resources