Byte formatting in python 3 [duplicate]

Byte formatting in python 3 [duplicate] - python

This question already has answers here:
Python 3 bytes formatting
(6 answers)
Closed 8 years ago.
I know this question has been asked before, but couldn't get it working for me though.
What I want to do is sent a prefix with my message like so:
msg = pickle.dumps(message)
prefix = b'{:0>5d}'.format(len(msg))
message = prefix + msg
This gives me
AttributeError: 'bytes' object has no attribute 'format'
I tried formatting with % and encoding but none of them worked.

You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.
msg = 'hi there'
prefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str
str_message = prefix + msg # still a str
encoded_message = str_message.encode('utf-8') # or whatever encoding
print(encoded_message) # prints: b'00008hi there'
Or if you're a fan of one-liners:
encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')
According your comment on #Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:
msg = 'ü' # len(msg) is 1 character
encoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes
encoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')
full_message = encoded_prefix + encoded_msg # both are bytes, so we can concat
print(full_message) # prints: b'00002\xc3\xbc'

Edit: I think I misunderstood your question. Your issue is that you can't get the length into a bytes object, right?
Okay, you would usually use the struct module for that, in this fashion:
struct.pack("!i", len(bindata)) + bindata
This writes the length of the (binary!) message into a four byte integer object. The return value of pack() is this object (of type bytes). For decoding this on the receiving end you need to read exactly the first 4 bytes of your message into a bytes object. Let's call this first_four_bytes. Decoding is done using struct.unpack, using the same format specifier (!i) in this case:
messagesize, = struct.unpack("!i", first_four_bytes)
Then you know exactly how many of the following bytes belong to the message: messagesize. Read exactly that many bytes, and decode the message.
Old answer:
In Python 3, the __add__ operator returns what we want:
>>> a = b"\x61"
>>> b = b"\x62"
>>> a + b
b'ab'

Related

How can I get a variable containing a byte sequence of several fields (unicode character + 32 bits integer + unicode string)

I want to get a variable containing a byte sequence of several fields (they will be later be transmitted via socket).
The byte sequence will include the following three fields:
Character SOH (ANSI code 0x01)
32bits integer
Unicode string 'Straße'
I have tried:
# -*- coding: UTF-8 -*-
message = b''
soh = u'\0001'
a = 1143
c = u'Straße'
message = message + soh + a + c
print(type(message))
But I get:
TypeError: can't concat str to bytes
I am also not sure that soh = u'\0001' is the right way to define the SOH character.
I am using Python 3.7

Binary data for transfer over a socket connection is best combined using the struct module.
The struct module provides a pack function to create the data structure. You need to provide a format string that describes the data being packed. It's worth studying the format string documentation to ensure that the data is unpacked as expected on the receiving side.
>>> soh = b'\x01'
>>> a = 1143
>>> c = u'Straße'
>>> import struct
>>> pattern = 'ci7s' # 1 byte, 1 int, 1 bytestring of length 7
>>> packed = struct.pack(pattern, soh, a, c.encode('utf-8'))
>>> packed
b'\x01\x00\x00\x00w\x04\x00\x00Stra\xc3\x9fe'
The module provides an unpack function to reverse the packing:
>>> soh_, a_, c_ = struct.unpack(pattern, packed)
>>> soh_
b'\x01'
>>> a
1143
>>> a_
1143
>>> c_.decode('utf-8')
'Straße'

Because a is an int so you cannot concatenate it with str.
What you should do is try using .encode() on all soh, a and c and then concatenate them to message (.encode makes the type from str to bytes)
(In python 3.x unicode type doesn't exist anymore (it's the same as str) so you have to use either str or bytes)

Just in case it is helpful for anyone else, I finally did this:
message = soh.encode('utf-8') + a.to_bytes(4, 'big') + c.encode('utf-8')
struct.pack is really interesting solution but I did not manage to force the integer to be 32 bits and in my particular format the field structure is not known in advance (hence a mechanism to share it between client and server would be needed anyway).
I therefore mixed .to_bytes with .encode for unicode strings.

Updating to python3 from 2 TypeError: can only concatenate str (not "bytes") to str

When trying to upgrade a module from python 2 to using python 3 I am hitting type errors when trying to hash file data I am faced with the TypeError "Unicode-objects must be encoded before hashing" when I encode the data it then throws a TypeError "can only concatenate str (not "bytes") to str"
with open(realPath, "rb") as fn:
while True:
filedata = fn.read(self.piece_length)
if len(filedata) == 0:
break
length += len(filedata)
##First error was here fixed with .decode()
data += filedata.decode('utf-8')
if len(data) >= self.piece_length:
info_pieces += sha1(data[:self.piece_length]).digest()
data = data[self.piece_length:]
if check_md5:
md5sum.update(filedata)
if len(data) > 0:
##New error happens here
info_pieces += sha1(data).digest()

The hash functions work with bytes, not str now. So the object you pass to sha1 should be bytes, and the return value of .digest() will also be bytes.
So you should encode the string data to bytes before passing to sha1(), like:
info_pieces += sha1(data[:self.piece_length].encode('utf-8')).digest()
Make sure you've initialized your variables like data = '' and info_pieces = b'', since data is decoded text and info_pieces contains the hash digests.

.digest() returns a 'byte object', not a string. You also need to decode() it, like:
info_pieces += sha1(data).digest().decode('utf-8')
or
info_pieces += str(sha1(data).digest(), 'utf-8')

How to pack a character and a number correctly?

I'm learning about client-server communication in python, and I want to send some packed structures.I want to pack a mathematical sign and a number. I tried like this:
idx = 50
value1 = "<"
value2 = idx
packer = struct.Struct('1s I')
packed_data = packer.pack(*value1, *value2)
But I got the error:
packed_data = packer.pack(*value1, *value2)
TypeError: 'int' object is not iterable
or this error:
packed_data = packer.pack(*value1, *value2)
struct.error: argument for 's' must be a bytes object
If I try like this:
value2 = [idx]
I don't know how to do this correctly.

The first problem is that you are unnecessarily trying to (sequence-)unpack your arguments. The Struct format expects a bytes and an int, and you (almost) already have them.
The second problem is that "<" is a Unicode string, and pack expects bytes instead. You need to properly encode the string first.
packed_data = packer.pack(value1.encode('utf-8'), value2)
The particular encoding you use doesn't matter, as long as you use the same one to unpack the data.
Note that if you did have a Unicode character that couldn't be encoded in one byte, your string format would be wrong. The struct module doesn't handle variable-length strings by itself, so it would probably be simpler to just encode the int by itself and concatenated that with your encoded string.
value =
packed_data = value1.encode('utf-8') + struct.pack("I", value2)

ord function in python2.7 and python 3.4 are different?

I have been running a script where I use the ord() function and for whatever the reason in python 2.7, it accepts the unicode string character just as it requires and outputs an integer.
In python 3.4, this is not so much the case. This is the output of error that is being produced :
Traceback (most recent call last):
File "udpTransfer.py", line 38, in <module>
buf.append(ord(c))
TypeError: ord() expected string of length 1, but int found
When I look in both documentations, the ord function is explained to be doing the same exact thing.
This is the code that I am using for both python versions:
import socket,sys, ast , os, struct
from time import ctime
import time
import csv
# creating the udo socket necessary to receive data
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
ip = '192.168.10.101' #i.p. of our computer
port = 20000 # socket port opened to connect from the matlab udp send data stream
server_address = (ip, port)
sock.bind(server_address) # bind socket
sock.settimeout(2) # sock configuration
sock.setblocking(1)
print('able to bind')
ii = 0
shotNummer = 0
client = ''
Array = []
byte = 8192
filename = time.strftime("%d_%m_%Y_%H-%M-%S")
filename = filename + '.csv'
try :
with open(filename,'wb') as csvfile :
spamwriter = csv.writer(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
# spamwriter.writerow((titles))
# as long as data comes in, well take it
while True:
data,client = sock.recvfrom(byte)
buf = []
values = []
for c in data:
# print(type(c))
buf.append(ord(c))
if len(buf) == 4 :
###
Can anyone explain why python3.4 it says that c is an integer, rather than in Python 2.7 where it is actually a string, just as the ord() function requires?

You are passing in an integer to ord() in Python 3. That's because you are iterating over a bytes object in Python 3 (the first element in the tuple return value from socket.recvfrom()):
>>> for byte in b'abc':
... print(byte)
...
97
98
99
From the bytes type documentation:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers[.]
and
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer [...].
In Python 2, socket.recvfrom() produces a str object instead, and iteration over such an object gives new one-character string objects, which indeed need to be passed to ord() to be converted to an integer.
You could instead use a bytearray() here to get the same integer sequence in both Python 2 and 3:
for c in bytearray(data):
# c is now in integer in both Python 2 and 3
You don't need to use ord() at all in that case.

I think the difference is that in Python 3 the sock.recvfrom(...) call returns bytes while Python 2.7 recvfrom returns a string. So ord did not change but what is being passed to ord has changed.
Python 2.7 recvfrom
Python 3.5 recvfrom

Python - AttributeError: 'str' object has no attribute 'append'

I keep receiving this error when I try to run this code for the line "encoded.append("i")":
AttributeError: 'str' object has no attribute 'append'
I cannot work out why the list won't append with the string. I'm sure the problem is very simple Thank you for your help.
def encode(code, msg):
'''Encrypts a message, msg, using the substitutions defined in the
dictionary, code'''
msg = list(msg)
encoded = []
for i in msg:
if i in code.keys():
i = code[i]
encoded.append(i)
else:
encoded.append(i)
encoded = ''.join(encoded)
return encoded

You set encoded to string here:
encoded = ''.join(encoded)
And of course it doesn't have attribute 'append'.
Since you're doing it on one of cycle iteration, on next iteration you have str instead of list...

>>> encoded =["d","4"]
>>> encoded="".join(encoded)
>>> print (type(encoded))
<class 'str'> #It's not a list anymore, you converted it to string.
>>> encoded =["d","4",4] # 4 here as integer
>>> encoded="".join(encoded)
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
encoded="".join(encoded)
TypeError: sequence item 2: expected str instance, int found
>>>
As you see, your list is converted to a string in here "".join(encoded). And append is a method of lists, not strings. That's why you got that error. Also as you see if your encoded list has an element as integer, you will see TypeError because, you can't use join method on integers. Better you check your all codes again.

Your string conversion line is under the else clause. Take it out from under the conditional, and the for loop so that it's the last thing done to encoded. As it stands, you are converting to a string halfway through your for loop:
def encode(code, msg):
'''Encrypts a message, msg, using the substitutions defined in the
dictionary, code'''
msg = list(msg)
encoded = []
for i in msg:
if i in code.keys():
i = code[i]
encoded.append(i)
else:
encoded.append(i)
# after all appends and outside for loop
encoded = ''.join(encoded)
return encoded

You are getting the error because of the second expression in you else statement.
''.join(encoded) returns a string that gets assigned to encoded
Thus encoded is now of type string.
In the second loop you have the .append(i) method in either if/else statements which can only be applied to lists and not strings.
Your .join() method should appear after the for loop right before you return it.
I apoligise if the above text does not appear right. This is my first post and I still trying to figure out how this works.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Byte formatting in python 3 [duplicate] - python

Related

How can I get a variable containing a byte sequence of several fields (unicode character + 32 bits integer + unicode string)

Updating to python3 from 2 TypeError: can only concatenate str (not "bytes") to str

How to pack a character and a number correctly?

ord function in python2.7 and python 3.4 are different?

Python - AttributeError: 'str' object has no attribute 'append'

Categories

Resources