ord function in python2.7 and python 3.4 are different? - python

I have been running a script where I use the ord() function and for whatever the reason in python 2.7, it accepts the unicode string character just as it requires and outputs an integer.
In python 3.4, this is not so much the case. This is the output of error that is being produced :
Traceback (most recent call last):
File "udpTransfer.py", line 38, in <module>
buf.append(ord(c))
TypeError: ord() expected string of length 1, but int found
When I look in both documentations, the ord function is explained to be doing the same exact thing.
This is the code that I am using for both python versions:
import socket,sys, ast , os, struct
from time import ctime
import time
import csv
# creating the udo socket necessary to receive data
sock = socket.socket(socket.AF_INET,socket.SOCK_DGRAM)
ip = '192.168.10.101' #i.p. of our computer
port = 20000 # socket port opened to connect from the matlab udp send data stream
server_address = (ip, port)
sock.bind(server_address) # bind socket
sock.settimeout(2) # sock configuration
sock.setblocking(1)
print('able to bind')
ii = 0
shotNummer = 0
client = ''
Array = []
byte = 8192
filename = time.strftime("%d_%m_%Y_%H-%M-%S")
filename = filename + '.csv'
try :
with open(filename,'wb') as csvfile :
spamwriter = csv.writer(csvfile, delimiter=',',quotechar='|', quoting=csv.QUOTE_MINIMAL)
# spamwriter.writerow((titles))
# as long as data comes in, well take it
while True:
data,client = sock.recvfrom(byte)
buf = []
values = []
for c in data:
# print(type(c))
buf.append(ord(c))
if len(buf) == 4 :
###
Can anyone explain why python3.4 it says that c is an integer, rather than in Python 2.7 where it is actually a string, just as the ord() function requires?

You are passing in an integer to ord() in Python 3. That's because you are iterating over a bytes object in Python 3 (the first element in the tuple return value from socket.recvfrom()):
>>> for byte in b'abc':
... print(byte)
...
97
98
99
From the bytes type documentation:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers[.]
and
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer [...].
In Python 2, socket.recvfrom() produces a str object instead, and iteration over such an object gives new one-character string objects, which indeed need to be passed to ord() to be converted to an integer.
You could instead use a bytearray() here to get the same integer sequence in both Python 2 and 3:
for c in bytearray(data):
# c is now in integer in both Python 2 and 3
You don't need to use ord() at all in that case.

I think the difference is that in Python 3 the sock.recvfrom(...) call returns bytes while Python 2.7 recvfrom returns a string. So ord did not change but what is being passed to ord has changed.
Python 2.7 recvfrom
Python 3.5 recvfrom

Related

How can I get a variable containing a byte sequence of several fields (unicode character + 32 bits integer + unicode string)

I want to get a variable containing a byte sequence of several fields (they will be later be transmitted via socket).
The byte sequence will include the following three fields:
Character SOH (ANSI code 0x01)
32bits integer
Unicode string 'Straße'
I have tried:
# -*- coding: UTF-8 -*-
message = b''
soh = u'\0001'
a = 1143
c = u'Straße'
message = message + soh + a + c
print(type(message))
But I get:
TypeError: can't concat str to bytes
I am also not sure that soh = u'\0001' is the right way to define the SOH character.
I am using Python 3.7
Binary data for transfer over a socket connection is best combined using the struct module.
The struct module provides a pack function to create the data structure. You need to provide a format string that describes the data being packed. It's worth studying the format string documentation to ensure that the data is unpacked as expected on the receiving side.
>>> soh = b'\x01'
>>> a = 1143
>>> c = u'Straße'
>>> import struct
>>> pattern = 'ci7s' # 1 byte, 1 int, 1 bytestring of length 7
>>> packed = struct.pack(pattern, soh, a, c.encode('utf-8'))
>>> packed
b'\x01\x00\x00\x00w\x04\x00\x00Stra\xc3\x9fe'
The module provides an unpack function to reverse the packing:
>>> soh_, a_, c_ = struct.unpack(pattern, packed)
>>> soh_
b'\x01'
>>> a
1143
>>> a_
1143
>>> c_.decode('utf-8')
'Straße'
Because a is an int so you cannot concatenate it with str.
What you should do is try using .encode() on all soh, a and c and then concatenate them to message (.encode makes the type from str to bytes)
(In python 3.x unicode type doesn't exist anymore (it's the same as str) so you have to use either str or bytes)
Just in case it is helpful for anyone else, I finally did this:
message = soh.encode('utf-8') + a.to_bytes(4, 'big') + c.encode('utf-8')
struct.pack is really interesting solution but I did not manage to force the integer to be 32 bits and in my particular format the field structure is not known in advance (hence a mechanism to share it between client and server would be needed anyway).
I therefore mixed .to_bytes with .encode for unicode strings.

How to pack a character and a number correctly?

I'm learning about client-server communication in python, and I want to send some packed structures.I want to pack a mathematical sign and a number. I tried like this:
idx = 50
value1 = "<"
value2 = idx
packer = struct.Struct('1s I')
packed_data = packer.pack(*value1, *value2)
But I got the error:
packed_data = packer.pack(*value1, *value2)
TypeError: 'int' object is not iterable
or this error:
packed_data = packer.pack(*value1, *value2)
struct.error: argument for 's' must be a bytes object
If I try like this:
value2 = [idx]
I don't know how to do this correctly.
The first problem is that you are unnecessarily trying to (sequence-)unpack your arguments. The Struct format expects a bytes and an int, and you (almost) already have them.
The second problem is that "<" is a Unicode string, and pack expects bytes instead. You need to properly encode the string first.
packed_data = packer.pack(value1.encode('utf-8'), value2)
The particular encoding you use doesn't matter, as long as you use the same one to unpack the data.
Note that if you did have a Unicode character that couldn't be encoded in one byte, your string format would be wrong. The struct module doesn't handle variable-length strings by itself, so it would probably be simpler to just encode the int by itself and concatenated that with your encoded string.
value =
packed_data = value1.encode('utf-8') + struct.pack("I", value2)

How to convert a sha256 object to integer and pack it to bytearray in python?

I want to convert a hash256 object to a 32-byte integer first, and then pack it into a bytearray.
>>> import hashlib
>>> hashobj = hashlib.sha256('something')
>>> val_hex = hashobj.hexdigest()
>>> print val_hex
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
>>> print len(val_hex)
64
The hex string is 64-byte instead of 32-byte, which is not what I want.
>>> val = hashobj.digest()
>>> print val
?ɶ?E?s????????5Bkz#R???6??H?
>>> print len(val)
32
This is a 32-byte string and I want to convert it to a 32-byte integer.
It gave me an error message when I try:
>>> val_int = int(val, 10)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '?\xc9\xb6\x89E\x9ds\x8f\x8c\x88\xa3\xa4\x8a\xa9\xe35B\x01kz#R\xe0\x01\xaa\xa56\xfc\xa7H\x13\xcb'
What should I do to get my int_val?
And how can I use struct to pack it (32-byte) to a bytearray? I found the longest format in python struct document is 'Q' which is only 8-byte.
Thank you very much.
The simplest way in Python 2 to get the integer value of the SHA-256 digest is via the hexdigest. Alternatively, you can loop over the bytearray constructed from the binary digest. Both methods are illustrated below.
import hashlib
hashobj = hashlib.sha256('something')
val_hex = hashobj.hexdigest()
print val_hex
# Build bytearray from binary digest
val_bytes = bytearray(hashobj.digest())
print ''.join(['%02x' % byte for byte in val_bytes])
# Get integer value of digest from the hexdigest
val_int = int(val_hex, 16)
print '%064x' % val_int
# Get integer value of digest from the bytearray
n = 0
for byte in val_bytes:
n = n<<8 | byte
print '%064x' % n
output
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
In Python 3, we can't pass a plain text string to the hashlib hash function, we must pass a bytes string or a bytearray, eg
b'something'
or
'something'.encode('utf-8')
or
bytearray('something', 'utf-8')
We can simplify the second version to
'something'.encode()
since UTF-8 is the default encoding for str.encode (and bytes.decode()).
To perform the conversion to int, any of the above techniques can be used, but we also have an additional option: the int.from_bytes method. To get the correct integer we need to tell it to interpret the bytes as a big-endian number:
import hashlib
hashobj = hashlib.sha256(b'something')
val = int.from_bytes(hashobj.digest(), 'big')
print('%064x' % val)
output
3fc9b689459d738f8c88a3a48aa9e33542016b7a4052e001aaa536fca74813cb
The point of a bytearray is not to fit the whole content in a single cell. That's why cells are only 1 byte big.
And .digest() returns a byte string, so you are fine just using it immediately:
>>> import hashlib
>>> hashobj = hashlib.sha256('something')
>>> val = hashobj.digest()
>>> print bytearray(val)
?ɶ�E�s������5Bkz#R���6��H�
>>> print repr(bytearray(val))
bytearray(b'?\xc9\xb6\x89E\x9ds\x8f\x8c\x88\xa3\xa4\x8a\xa9\xe35B\x01kz#R\xe0\x01\xaa\xa56\xfc\xa7H\x13\xcb')
I did it this way
import hashlib
x = 'input'
hash = int.from_bytes(hashlib.sha256(x.encode('utf-8')).digest(), 'big')
print(my_hash)
# 91106456816457796232999629894661022820411437165637657988648530670402435361824
lets check the size of the hash
print(len("{0:b}".format(my_hash)))
# 256
perfect!

Byte formatting in python 3 [duplicate]

This question already has answers here:
Python 3 bytes formatting
(6 answers)
Closed 8 years ago.
I know this question has been asked before, but couldn't get it working for me though.
What I want to do is sent a prefix with my message like so:
msg = pickle.dumps(message)
prefix = b'{:0>5d}'.format(len(msg))
message = prefix + msg
This gives me
AttributeError: 'bytes' object has no attribute 'format'
I tried formatting with % and encoding but none of them worked.
You can't format a bytes literal. You also can't concatenate bytes objects with str objects. Instead, put the whole thing together as a str, and then convert it to bytes using the proper encoding.
msg = 'hi there'
prefix = '{:0>5d}'.format(len(msg)) # No b at the front--this is a str
str_message = prefix + msg # still a str
encoded_message = str_message.encode('utf-8') # or whatever encoding
print(encoded_message) # prints: b'00008hi there'
Or if you're a fan of one-liners:
encoded_message = bytes('{:0>5d}{:1}'.format(len(msg), msg), 'utf-8')
According your comment on #Jan-Philip's answer, you need to specify how many bytes you're about to transfer? Given that, you'll need to encode the message first, so you can properly determine how many bytes it will be when you send it. The len function produces a proper byte-count when called on bytes, so something like this should work for arbitrary text:
msg = 'ü' # len(msg) is 1 character
encoded_msg = msg.encode('utf-8') # len(encoded_msg) is 2 bytes
encoded_prefix = '{:0>5d}'.format(len(encoded_msg)).encode('utf-8')
full_message = encoded_prefix + encoded_msg # both are bytes, so we can concat
print(full_message) # prints: b'00002\xc3\xbc'
Edit: I think I misunderstood your question. Your issue is that you can't get the length into a bytes object, right?
Okay, you would usually use the struct module for that, in this fashion:
struct.pack("!i", len(bindata)) + bindata
This writes the length of the (binary!) message into a four byte integer object. The return value of pack() is this object (of type bytes). For decoding this on the receiving end you need to read exactly the first 4 bytes of your message into a bytes object. Let's call this first_four_bytes. Decoding is done using struct.unpack, using the same format specifier (!i) in this case:
messagesize, = struct.unpack("!i", first_four_bytes)
Then you know exactly how many of the following bytes belong to the message: messagesize. Read exactly that many bytes, and decode the message.
Old answer:
In Python 3, the __add__ operator returns what we want:
>>> a = b"\x61"
>>> b = b"\x62"
>>> a + b
b'ab'

Python 3.3 binary to hex function

def bintohex(path):
hexvalue = []
file = open(path,'rb')
while True:
buffhex = pkmfile.read(16)
bufflen = len(buffhex)
if bufflen == 0: break
for i in range(bufflen):
hexvalue.append("%02X" % (ord(buffhex[i])))
I am making a function that will return a list of hex values of a specific file. However, this function doesn't work properly in Python 3.3. How should I modify this code?
File "D:\pkmfile_web\pkmtohex.py", line 12, in bintohex hexvalue.append("%02X" % (ord(buffhex[i]))) TypeError: ord() expected string of length 1, but int found
There's a module for that :-)
>>> import binascii
>>> binascii.hexlify(b'abc')
'616263'
In Python 3, indexing a bytes object returns the integer value; there is no need to call ord:
hexvalue.append("%02X" % buffhex[i])
Additionally, there is no need to be manually looping over the indices. Just loop over the bytes object. I've also modified it to use format rather than %:
buffhex = pkmfile.read(16)
if not buffhex:
for byte in buffhex:
hexvalue.append(format(byte, '02X'))
You may want to even make bintohex a generator. To do that, you could start yielding values:
yield format(byte, '02X')

Categories