ord() expected string of length 1, but int found - python

I receive a byte-array buffer from network, containing many fields. When I want to print the buffer, I get the following error:
(:ord() expected string of length 1, int
found
print(" ".join("{:02X}".format(ord(c)) for c in buf))
How can I fix this?

Python bytearray and bytes objects yield integers when iterating or indexing, not characters. Remove the ord() call:
print(" ".join("{:02X}".format(c) for c in buf))
From the Bytes documentation:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError. This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption).
and further on:
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)
I'd not use str.format() where a format() function will do; there is no larger string to interpolate the hex digits into:
print(" ".join([format(c, "02X") for c in buf]))
For str.join() calls, using list comprehension is marginally faster as the str.join() call has to convert the iterable to a list anyway; it needs to do a double scan to build the output.

Related

Index binary type of a number

b'1'[0]
The binary sequence of 1, when indexed is 49, I can't figure out why is it. Also the b'2'[0] is 50, what is the underlying binary sequence for the numbers?
What you have there is no a "binary sequence", it is a bytes literal.
>>> type(b'1')
<class 'bytes'>
A bytes object is an immutable sequence of single bytes, so all numbers in this sequence have to be in range(0, 256)`. You can construct it from a list of numbers as well:
>>> bytes([50, 33])
b'2!'
So what is this b'' notation all about?
Well, sequences of bytes are often related to text. Not always, but often enough that Python supports a lot of string methods on bytes objects, like capitalize, index and split, as well as this convenient literal syntax where you can enter text, and have it be equivalent to series of bytes corresponding to that text encoded in ASCII. It's still an immutable sequence of numbers in range(0, 256) under the hood, though, which is why indexing a bytes object gives a number.

Python 3 string index lookup is O(1)?

Short story:
Is Python 3 unicode string lookup O(1) or O(n)?
Long story:
Index lookup of a character in a C char array is constant time O(1) because we can with certainty jump to a contiguous memory location:
const char* mystring = "abcdef";
char its_d = mystring[3];
Its the same as saying:
char its_d = *(mystring + 3);
Because we know that sizeof(char) is 1 as C99, and because of ASCII one character fits in one byte.
Now, in Python 3, now that string literals are unicode strings, we have the following:
>>> mystring = 'ab€cd'
>>> len(mystring)
5
>>> mybytes = mystring.encode('utf-8')
>>> len(mybytes)
7
>>> mybytes
b'ab\xe2\x82\xaccd'
>>> mystring[2]
'€'
>>> mybytes[2]
226
>> ord(mystring[2])
8364
Being UTF-8 encoded, byte 2 is > 127 and thus uses a multibyte representation for the character 3.
I cannot other than conclude that a index lookup in a Python string CANNOT be O(1), because of the multibyte representation of characters? That means that mystring[2] is O(n), and that somehow a on-the-fly interpretation of the memory array is being performed ir order to find the character at index? If that's the case, did I missed some relevant documentation stating this?
I made some very basic benchmark but I cannot infer an O(n) behaviour: https://gist.github.com/carlos-jenkins/e3084a07402ccc25dfd0038c9fe284b5
$ python3 lookups.py
Allocating memory...
Go!
String lookup: 0.513942 ms
Bytes lookup : 0.486462 ms
EDIT: Updated with better example.
UTF-8 is the default source encoding for Python. The internal representation uses fixed-size per-character elements in both Python 2 and Python 3. One of the results is that accessing characters in Python (Unicode) string objects by index has O(1) cost.
The code and results you presented do not demonstrate otherwise. You convert a string to a UTF-8-encoded byte sequence, and we all know that UTF-8 uses variable-length code sequences, but none of that says anything about the internal representation of the original string.

How to take an integer array and convert it into other types?

I'm currently trying to take integer arrays that actually represent other data types and convert them into the correct datatype.
So for example, if I had the integer array [1196773188, 542327116], I discover that this integer array represents a string from some other function, convert it, and realize it represents the string "DOUGLAS". The first number translates to the hexadecimal number 0x47554F44 and the second number represents the hexadecimal number 0x2053414C. Using a hex to string converter, these correspond to the strings 'GOUD' and 'SAL' respectively, spelling DOUGLAS in a little endian manner. The way the letters are backwards in individual elements of the array likely stem from the bytes being stored in a litte endian manner, although I might be mistaken on that.
These integer arrays could represent a number of datatypes, including strings, booleans, and floats.
I need to use Python 2.7, so I unfortunately can't use the bytes function.
Is there a simple way to convert an integer array to its corresponding datatype?
It seems that the struct module is the best way to go when converting between different types like this:
import struct
bufferstr = ""
dougarray = [1196773188, 542327116]
for num in dougarray:
bufferstr += struct.pack("i", num)
print bufferstr # Result is 'DOUGLAS'
From this point on we can easily convert 'DOUGLAS' to any datatype we want using struct.unpack():
print struct.unpack("f", bufferstr[0:4]) # Result is (54607.265625)
We can only unpack a certain number of bytes at a time however. Thank you all for the suggestions!

How to extract and change a sequence of bytes from a serial packet to the correct integer representation in python?

374c4f4f00000800ff74**d102**29190300006f00fffffffffffffffffffff
This is the serial packet I am processing using pyserial. The two bytes in bold actually correspond to a real world measurement which corresponds to 721(decimal) or 02d1(hex). How do I extract those bytes in python and get the correct int value which is 721?
Processing to and from such byte strings is quickly and easily done with the struct library functions pack/pack_to and unpack/unpack_from:
While it is normally best practice to unpack/unpack the entire packet you can use the _from & _to versions to selectively manipulate packets.
In your case:
>>> import struct
>>> val # Generated using binascii.unhexlify
b'7LOO\x00\x00\x08\x00\xfft\xd1\x02)\x19\x03\x00\x00o\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
>>> struct.unpack_from('<H', val, 10)
(721,) # Note the return is a tupple so you need the 0th element
>>> struct.unpack_from('<H', val, 10)[0]
721
More info
>>> import struct
>>> help (struct.unpack)
Help on built-in function unpack in module _struct:
unpack(...)
unpack(fmt, buffer) -> (v1, v2, ...)
Return a tuple containing values unpacked according to the format string
fmt. Requires len(buffer) == calcsize(fmt). See help(struct) for more
on format strings.
>>> help (struct.pack)
Help on built-in function pack in module _struct:
pack(...)
pack(fmt, v1, v2, ...) -> bytes
Return a bytes object containing the values v1, v2, ... packed according
to the format string fmt. See help(struct) for more on format strings.
>>> help(struct)
Help on module struct:
NAME
struct
DESCRIPTION
Functions to convert between Python values and C structs.
Python bytes objects are used to hold the data representing the C struct
and also as format strings (explained below) to describe the layout of data
in the C struct.
The optional first format char indicates byte order, size and alignment:
#: native order, size & alignment (default)
=: native order, std. size & alignment
<: little-endian, std. size & alignment
>: big-endian, std. size & alignment
!: same as >
The remaining chars indicate types of args and must match exactly;
these can be preceded by a decimal repeat count:
x: pad byte (no data); c:char; b:signed byte; B:unsigned byte;
?: _Bool (requires C99; if not available, char is used instead)
h:short; H:unsigned short; i:int; I:unsigned int;
l:long; L:unsigned long; f:float; d:double.
Special cases (preceding decimal count indicates length):
s:string (array of char); p: pascal string (with count byte).
Special cases (only available in native format):
n:ssize_t; N:size_t;
P:an integer type that is wide enough to hold a pointer.
Special case (not in native mode unless 'long long' in platform C):
q:long long; Q:unsigned long long
Whitespace between formats is ignored.
The variable struct.error is an exception raised on errors.
Your hex-encoded string is of odd length, so I don't know where the padding's missing from, see below
In [18]: s = '374c4f4f00000800ff74d10229190300006f00fffffffffffffffffffff0' # a nibble of padding at the end
In [19]: buffer = binascii.unhexlify(s)
In [20]: buffer
Out[20]: b'7LOO\x00\x00\x08\x00\xfft\xd1\x02)\x19\x03\x00\x00o\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xf0'
In [21]: struct.unpack('<10BH18B', buffer)
Out[21]:
(55,
76,
79,
79,
0,
0,
8,
0,
255,
116,
721,
...
For more information on what the format strings in pack and unpack can be, see the documentation. In short, < stands for little-endian, B for unsigned char (assumed 8-bit width), H for short (assumed 16-bit width).
Since the actual format strings are somewhat strange at first sight, I have given you a solution based on the above answer:
The command unhexlify will convert the 4 byte ascii representation back to a two byte binary representation of your integer.
the '<' takes care of your reversed byte order in your integer (Formatting details)
The 'i' means that we are facing a two byte integer
Hope that helps.
import struct
from binascii import unhexlify
s ="374c4f4f00000800ff74d10229190300006f00fffffffffffffffffffff"
s1= s[20:24]
print struct.unpack('<h', unhexlify(s1))[0]

str_to_a32 - What does this function do?

I need to rewrite some Python script in Objective-C. It's not that hard since Python is easily readable but this piece of code struggles me a bit.
def str_to_a32(b):
if len(b) % 4:
# pad to multiple of 4
b += '\0' * (4 - len(b) % 4)
return struct.unpack('>%dI' % (len(b) / 4), b)
What is this function supposed to do?
I'm not positive, but I'm using the documentation to take a stab at it.
Looking at the docs, we're going to return a tuple based on the format string:
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The item coming in (b) is probably a byte buffer (represented as a string) - looking at the examples they are represented the the \x escape, which consumes the next two characters as hex.
It appears the format string is
'>%dI' % (len(b) / 4)
The % and %d are going to put a number into the format string, so if the length of b is 32 the format string becomes
`>8I`
The first part of the format string is >, which the documentation says is setting the byte order to big-endian and size to standard.
The I says it will be an unsigned int with size 4 (docs), and the 8 in front of it means it will be repeated 8 times.
>IIIIIIII
So I think this is saying: take this byte buffer, make sure it's a multiple of 4 by appending as many 0x00s as is necessary, then unpack that into a tuple with as many unsigned integers as there are blocks of 4 bytes in the buffer.
Looks like it's supposed to take an input array of bytes represented as a string and unpack them as big-endian (the ">") unsigned ints (the 'I') The formatting codes are explaied in http://docs.python.org/2/library/struct.html
This takes a string and converts it into a tuple of Unsigned Integers. If you look at the python struct documentation you will see how it works. In a nutshell it handles conversions between Python values and C structs represented as Python strings for handling binary data stored in files (unceremoniously copied from the link provided).
In your case, the function takes a string, b and adds some extra characters to make sure that it is the standard size of the an unsigned int (see link), and then converts it into a tuple of integers using the big endian representation of the characters. This is the '>' part. The I part says to use unsigned integers

Categories