unpacking a struct - python

I have a list of arithmetic expressions that I encode and pack into a struct, as follows:
expressions = ["1+12", "16-5", "1-3+4", "12-5", "16+4"] # 5 expressions; but could be any number
s = struct.pack('h', len(expressions)) # Prepend the number of expressions to the struct
for e in expressions:
s += struct.pack(f'h{len(e)}s', len(e), e.encode('utf-8'))
Later, I want to undo this process, and unpack them two bytes a time. But I can't seem to get this right. Here's my unpacking and decoding code:
d = []
for n in range(0, len(expressions), 2): # Iterate over the struct
s = struct.unpack('h', expressions[n])
s = s.decode('utf-8')
d.append(s)
Python is giving me the following error at the unpack call:
TypeError: a bytes-like object is required, not 'int'
I don't understand the reason for the error. I thought that I'd already encoded the elements into Unicode, so they should be bytes-like. My goal is to build up the original list of expressions again. How can I do this?

Related

How can I get a variable containing a byte sequence of several fields (unicode character + 32 bits integer + unicode string)

I want to get a variable containing a byte sequence of several fields (they will be later be transmitted via socket).
The byte sequence will include the following three fields:
Character SOH (ANSI code 0x01)
32bits integer
Unicode string 'Straße'
I have tried:
# -*- coding: UTF-8 -*-
message = b''
soh = u'\0001'
a = 1143
c = u'Straße'
message = message + soh + a + c
print(type(message))
But I get:
TypeError: can't concat str to bytes
I am also not sure that soh = u'\0001' is the right way to define the SOH character.
I am using Python 3.7
Binary data for transfer over a socket connection is best combined using the struct module.
The struct module provides a pack function to create the data structure. You need to provide a format string that describes the data being packed. It's worth studying the format string documentation to ensure that the data is unpacked as expected on the receiving side.
>>> soh = b'\x01'
>>> a = 1143
>>> c = u'Straße'
>>> import struct
>>> pattern = 'ci7s' # 1 byte, 1 int, 1 bytestring of length 7
>>> packed = struct.pack(pattern, soh, a, c.encode('utf-8'))
>>> packed
b'\x01\x00\x00\x00w\x04\x00\x00Stra\xc3\x9fe'
The module provides an unpack function to reverse the packing:
>>> soh_, a_, c_ = struct.unpack(pattern, packed)
>>> soh_
b'\x01'
>>> a
1143
>>> a_
1143
>>> c_.decode('utf-8')
'Straße'
Because a is an int so you cannot concatenate it with str.
What you should do is try using .encode() on all soh, a and c and then concatenate them to message (.encode makes the type from str to bytes)
(In python 3.x unicode type doesn't exist anymore (it's the same as str) so you have to use either str or bytes)
Just in case it is helpful for anyone else, I finally did this:
message = soh.encode('utf-8') + a.to_bytes(4, 'big') + c.encode('utf-8')
struct.pack is really interesting solution but I did not manage to force the integer to be 32 bits and in my particular format the field structure is not known in advance (hence a mechanism to share it between client and server would be needed anyway).
I therefore mixed .to_bytes with .encode for unicode strings.

How to pack a character and a number correctly?

I'm learning about client-server communication in python, and I want to send some packed structures.I want to pack a mathematical sign and a number. I tried like this:
idx = 50
value1 = "<"
value2 = idx
packer = struct.Struct('1s I')
packed_data = packer.pack(*value1, *value2)
But I got the error:
packed_data = packer.pack(*value1, *value2)
TypeError: 'int' object is not iterable
or this error:
packed_data = packer.pack(*value1, *value2)
struct.error: argument for 's' must be a bytes object
If I try like this:
value2 = [idx]
I don't know how to do this correctly.
The first problem is that you are unnecessarily trying to (sequence-)unpack your arguments. The Struct format expects a bytes and an int, and you (almost) already have them.
The second problem is that "<" is a Unicode string, and pack expects bytes instead. You need to properly encode the string first.
packed_data = packer.pack(value1.encode('utf-8'), value2)
The particular encoding you use doesn't matter, as long as you use the same one to unpack the data.
Note that if you did have a Unicode character that couldn't be encoded in one byte, your string format would be wrong. The struct module doesn't handle variable-length strings by itself, so it would probably be simpler to just encode the int by itself and concatenated that with your encoded string.
value =
packed_data = value1.encode('utf-8') + struct.pack("I", value2)

Divide an extremely long byte stream into smaller bytes

So i need to unpack an extremely long byte stream (from USB) into 4 byte values.
Currently i got it working, but i feel there's a better way to do this.
Currently i got:
l=[]
for i in range(int(len(mybytes)/4)):
l.append(struct.unpack_from('>i',mybytes,i*4))
So this feels like very resource expensive, and im doing this for 16k bytes A LOT.
I also feel like this has probably been asked before i just don't really know how to word it for searching
You could also try the array module which has the ability to load directly from binary data:
import array
arr = array.array("I",mybytes) # "I" stands for unsigned integer
arr.byteswap() # only if you're reading endian coding different from your platform
l = list(arr)
You can specify a size for the integers to unpack (Python 3.6+):
>>> import struct
>>> mybytes = bytes([1,2,3,4,5,6,7,8])
>>> struct.unpack(f'>2i',mybytes)
(16909060, 84281096)
>>> n = len(mybytes) // 4
>>> struct.unpack(f'>{n}i',mybytes) # Python 3.6+ f-strings
(16909060, 84281096)
>>> struct.unpack('>{}i'.format(n),mybytes) # Older Pythons
(16909060, 84281096)
>>> [hex(i) for i in _]
['0x1020304', '0x5060708']
Wrap it in a BytesIO object, then use iter to call its read method until it returns an empty bytes value.
>>> import io, struct
>>> bio = io.BytesIO(b'abcdefgh')
>>> int_fmt = struct.Struct(">i")
>>> list(map(int_fmt.unpack, iter(lambda: bio.read(4), b'')))
[(1633837924,), (1701209960,)]
You can tweak this to extract the single int value from each tuple, or switch to the from_bytes class method.
>>> bio = io.BytesIO(b'abcdefgh')
>>> list(map(lambda i: int.from_bytes(i, 'big'), iter(lambda: bio.read(4), b'')))
[1633837924, 1701209960]

Python 3 compatibility issue

Description of problem
I have to migrate some code to Python 3. The compilation terminated with success. But I have a problem on the runtime:
static PyObject* Parser_read(PyObject * const self, PyObject * unused0, PyObject * unused1) {
//Retrieve bytes from the underlying data stream.
//In this case, an iterator
PyObject * const i = PyIter_Next(self->readIterator);
//If the iterator returns NULL, then no more data is available.
if(i == NULL)
{
Py_RETURN_NONE;
}
//Treat the returned object as just bytes
PyObject * const bytes = PyObject_Bytes(i);
Py_DECREF(i);
if( not bytes )
{
//fprintf(stderr, "try to read %s\n", PyObject_Str(bytes));
PyErr_SetString(PyExc_ValueError, "iterable must return bytes like objects");
return NULL;
}
....
}
In my python code, I have something like that:
for data in Parser(open("file.txt")):
...
The code works well on Python 2. But on Python 3, I got:
ValueError: iterable must return bytes like objects
Update
The solution of #casevh works well in all test cases except one: when I wrap the stream:
def wrapper(stream):
for data in stream:
for i in data:
yield i
for data in Parser(wrapper(open("file.txt", "rb"))):
...
and I got:
ValueError: iterable must return bytes like objects
One option is to open the file in binary mode:
open("file.txt", "rb")
That should create an iterator that returns a sequence of bytes.
Python 3 strings are assumed to be Unicode and without proper encoding/decoding, they shouldn't be interpreted as a sequence of bytes. If you are reading plain ASCII text, and not a binary data stream, you could also convert from Unicode to ASCII. See PyUnicode_AsASCIIString() and related functions.
As noted by #casevh, in Python you need to decide whether your data is binary or text. The fact that you are iterating lines makes me think that the latter is the case.
def wrapper(stream):
for data in stream:
for i in data:
yield i
works in Python 2, because iterating a str will yield 1-character strings; in Python 3, iterating over a bytes object will yield individual bytes that are integers in range 0 - 255. You can get the the code work identically in Python 2 and 3 (and identically to the Python 2 behaviour of the code above) by using range and slicing 1 byte/character at a time:
def wrapper(stream):
for data in stream:
for i in range(len(data)):
yield data[i:i + 1]
P.S. You also have a mistake in your C extension code: Parser_read takes 3 arguments, 2 of which are named unused_x. Only a method annotated with METH_KEYWORDS takes 3 arguments (PyCFunctionWithKeywords); all others, including METH_NOARGS must be functions taking 2 arguments (PyCFunction).

Python 3.3 binary to hex function

def bintohex(path):
hexvalue = []
file = open(path,'rb')
while True:
buffhex = pkmfile.read(16)
bufflen = len(buffhex)
if bufflen == 0: break
for i in range(bufflen):
hexvalue.append("%02X" % (ord(buffhex[i])))
I am making a function that will return a list of hex values of a specific file. However, this function doesn't work properly in Python 3.3. How should I modify this code?
File "D:\pkmfile_web\pkmtohex.py", line 12, in bintohex hexvalue.append("%02X" % (ord(buffhex[i]))) TypeError: ord() expected string of length 1, but int found
There's a module for that :-)
>>> import binascii
>>> binascii.hexlify(b'abc')
'616263'
In Python 3, indexing a bytes object returns the integer value; there is no need to call ord:
hexvalue.append("%02X" % buffhex[i])
Additionally, there is no need to be manually looping over the indices. Just loop over the bytes object. I've also modified it to use format rather than %:
buffhex = pkmfile.read(16)
if not buffhex:
for byte in buffhex:
hexvalue.append(format(byte, '02X'))
You may want to even make bintohex a generator. To do that, you could start yielding values:
yield format(byte, '02X')

Categories