How to pack a character and a number correctly? - python

I'm learning about client-server communication in python, and I want to send some packed structures.I want to pack a mathematical sign and a number. I tried like this:
idx = 50
value1 = "<"
value2 = idx
packer = struct.Struct('1s I')
packed_data = packer.pack(*value1, *value2)
But I got the error:
packed_data = packer.pack(*value1, *value2)
TypeError: 'int' object is not iterable
or this error:
packed_data = packer.pack(*value1, *value2)
struct.error: argument for 's' must be a bytes object
If I try like this:
value2 = [idx]
I don't know how to do this correctly.

The first problem is that you are unnecessarily trying to (sequence-)unpack your arguments. The Struct format expects a bytes and an int, and you (almost) already have them.
The second problem is that "<" is a Unicode string, and pack expects bytes instead. You need to properly encode the string first.
packed_data = packer.pack(value1.encode('utf-8'), value2)
The particular encoding you use doesn't matter, as long as you use the same one to unpack the data.
Note that if you did have a Unicode character that couldn't be encoded in one byte, your string format would be wrong. The struct module doesn't handle variable-length strings by itself, so it would probably be simpler to just encode the int by itself and concatenated that with your encoded string.
value =
packed_data = value1.encode('utf-8') + struct.pack("I", value2)

Related

What is `readUInt32BE` of nodeJS in Python?

I am translating a source of NodeJS to Python. However there is a function readUInt32BE that I do not quite understand how it works
Original Code
const buf = Buffer.from("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==", 'base64');
const appId = parseInt(buf.slice(0, 4).toString('hex'), 16);
const serverIdLength = buf.slice(4, 8).readUInt32BE(0);
Here is what I have tried so far in Python
encodeToken = base64.b64decode("vgEAAwAAAA1kZXYubG9yaW90LmlvzXTUl6ESlOrvJST-gsL_xQ==")
appId = encodeToken[:4]
appId = appId.hex()
serverIdLength = ......
If possible, can you write a function that works the same as readUInt32BE(0) and explain it for me ? Thanks
I'm assuming from the name that the function interpreters an arbitrary sequence of 4 bytes as an unsigned 32-bit (big endian) integer.
The corresponding Python function would be struct.unpack with an appropriate format string.
import struct
appId = encodeToken[:4]
serverIdLength = struct.unpack(">I", appId)[0]
# ">" means "big-endian"
# "I" means 4-byte unsigned integer
No need to to get the a hex representation of the bytes first. unpack always returns a tuple, even if only one value is created by the format string, so you need to take the first element of that tuple as the final value.

unpacking a struct

I have a list of arithmetic expressions that I encode and pack into a struct, as follows:
expressions = ["1+12", "16-5", "1-3+4", "12-5", "16+4"] # 5 expressions; but could be any number
s = struct.pack('h', len(expressions)) # Prepend the number of expressions to the struct
for e in expressions:
s += struct.pack(f'h{len(e)}s', len(e), e.encode('utf-8'))
Later, I want to undo this process, and unpack them two bytes a time. But I can't seem to get this right. Here's my unpacking and decoding code:
d = []
for n in range(0, len(expressions), 2): # Iterate over the struct
s = struct.unpack('h', expressions[n])
s = s.decode('utf-8')
d.append(s)
Python is giving me the following error at the unpack call:
TypeError: a bytes-like object is required, not 'int'
I don't understand the reason for the error. I thought that I'd already encoded the elements into Unicode, so they should be bytes-like. My goal is to build up the original list of expressions again. How can I do this?

How can I get a variable containing a byte sequence of several fields (unicode character + 32 bits integer + unicode string)

I want to get a variable containing a byte sequence of several fields (they will be later be transmitted via socket).
The byte sequence will include the following three fields:
Character SOH (ANSI code 0x01)
32bits integer
Unicode string 'Straße'
I have tried:
# -*- coding: UTF-8 -*-
message = b''
soh = u'\0001'
a = 1143
c = u'Straße'
message = message + soh + a + c
print(type(message))
But I get:
TypeError: can't concat str to bytes
I am also not sure that soh = u'\0001' is the right way to define the SOH character.
I am using Python 3.7
Binary data for transfer over a socket connection is best combined using the struct module.
The struct module provides a pack function to create the data structure. You need to provide a format string that describes the data being packed. It's worth studying the format string documentation to ensure that the data is unpacked as expected on the receiving side.
>>> soh = b'\x01'
>>> a = 1143
>>> c = u'Straße'
>>> import struct
>>> pattern = 'ci7s' # 1 byte, 1 int, 1 bytestring of length 7
>>> packed = struct.pack(pattern, soh, a, c.encode('utf-8'))
>>> packed
b'\x01\x00\x00\x00w\x04\x00\x00Stra\xc3\x9fe'
The module provides an unpack function to reverse the packing:
>>> soh_, a_, c_ = struct.unpack(pattern, packed)
>>> soh_
b'\x01'
>>> a
1143
>>> a_
1143
>>> c_.decode('utf-8')
'Straße'
Because a is an int so you cannot concatenate it with str.
What you should do is try using .encode() on all soh, a and c and then concatenate them to message (.encode makes the type from str to bytes)
(In python 3.x unicode type doesn't exist anymore (it's the same as str) so you have to use either str or bytes)
Just in case it is helpful for anyone else, I finally did this:
message = soh.encode('utf-8') + a.to_bytes(4, 'big') + c.encode('utf-8')
struct.pack is really interesting solution but I did not manage to force the integer to be 32 bits and in my particular format the field structure is not known in advance (hence a mechanism to share it between client and server would be needed anyway).
I therefore mixed .to_bytes with .encode for unicode strings.

Byte formatting in python 3 [duplicate]

This question already has answers here:
Python 3 bytes formatting
(6 answers)
Closed 8 years ago.
I know this question has been asked before, but couldn't get it working for me though.
What I want to do is sent a prefix with my message like so:
msg = pickle.dumps(message)
prefix = b'{:0>5d}'.format(len(msg))
message = prefix + msg
This gives me
AttributeError: 'bytes' object has no attribute 'format'
I tried formatting with % and encoding but none of them worked.
You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.
msg = 'hi there'
prefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str
str_message = prefix + msg # still a str
encoded_message = str_message.encode('utf-8') # or whatever encoding
print(encoded_message) # prints: b'00008hi there'
Or if you're a fan of one-liners:
encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')
According your comment on #Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:
msg = 'ü' # len(msg) is 1 character
encoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes
encoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')
full_message = encoded_prefix + encoded_msg # both are bytes, so we can concat
print(full_message) # prints: b'00002\xc3\xbc'
Edit: I think I misunderstood your question. Your issue is that you can't get the length into a bytes object, right?
Okay, you would usually use the struct module for that, in this fashion:
struct.pack("!i", len(bindata)) + bindata
This writes the length of the (binary!) message into a four byte integer object. The return value of pack() is this object (of type bytes). For decoding this on the receiving end you need to read exactly the first 4 bytes of your message into a bytes object. Let's call this first_four_bytes. Decoding is done using struct.unpack, using the same format specifier (!i) in this case:
messagesize, = struct.unpack("!i", first_four_bytes)
Then you know exactly how many of the following bytes belong to the message: messagesize. Read exactly that many bytes, and decode the message.
Old answer:
In Python 3, the __add__ operator returns what we want:
>>> a = b"\x61"
>>> b = b"\x62"
>>> a + b
b'ab'

Python 3.3 binary to hex function

def bintohex(path):
hexvalue = []
file = open(path,'rb')
while True:
buffhex = pkmfile.read(16)
bufflen = len(buffhex)
if bufflen == 0: break
for i in range(bufflen):
hexvalue.append("%02X" % (ord(buffhex[i])))
I am making a function that will return a list of hex values of a specific file. However, this function doesn't work properly in Python 3.3. How should I modify this code?
File "D:\pkmfile_web\pkmtohex.py", line 12, in bintohex hexvalue.append("%02X" % (ord(buffhex[i]))) TypeError: ord() expected string of length 1, but int found
There's a module for that :-)
>>> import binascii
>>> binascii.hexlify(b'abc')
'616263'
In Python 3, indexing a bytes object returns the integer value; there is no need to call ord:
hexvalue.append("%02X" % buffhex[i])
Additionally, there is no need to be manually looping over the indices. Just loop over the bytes object. I've also modified it to use format rather than %:
buffhex = pkmfile.read(16)
if not buffhex:
for byte in buffhex:
hexvalue.append(format(byte, '02X'))
You may want to even make bintohex a generator. To do that, you could start yielding values:
yield format(byte, '02X')

Categories