str_to_a32 - What does this function do? - python

I need to rewrite some Python script in Objective-C. It's not that hard since Python is easily readable but this piece of code struggles me a bit.
def str_to_a32(b):
if len(b) % 4:
# pad to multiple of 4
b += '\0' * (4 - len(b) % 4)
return struct.unpack('>%dI' % (len(b) / 4), b)
What is this function supposed to do?

I'm not positive, but I'm using the documentation to take a stab at it.
Looking at the docs, we're going to return a tuple based on the format string:
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The item coming in (b) is probably a byte buffer (represented as a string) - looking at the examples they are represented the the \x escape, which consumes the next two characters as hex.
It appears the format string is
'>%dI' % (len(b) / 4)
The % and %d are going to put a number into the format string, so if the length of b is 32 the format string becomes
`>8I`
The first part of the format string is >, which the documentation says is setting the byte order to big-endian and size to standard.
The I says it will be an unsigned int with size 4 (docs), and the 8 in front of it means it will be repeated 8 times.
>IIIIIIII
So I think this is saying: take this byte buffer, make sure it's a multiple of 4 by appending as many 0x00s as is necessary, then unpack that into a tuple with as many unsigned integers as there are blocks of 4 bytes in the buffer.

Looks like it's supposed to take an input array of bytes represented as a string and unpack them as big-endian (the ">") unsigned ints (the 'I') The formatting codes are explaied in http://docs.python.org/2/library/struct.html

This takes a string and converts it into a tuple of Unsigned Integers. If you look at the python struct documentation you will see how it works. In a nutshell it handles conversions between Python values and C structs represented as Python strings for handling binary data stored in files (unceremoniously copied from the link provided).
In your case, the function takes a string, b and adds some extra characters to make sure that it is the standard size of the an unsigned int (see link), and then converts it into a tuple of integers using the big endian representation of the characters. This is the '>' part. The I part says to use unsigned integers

Related

Convert binary to signed, little endian 16bit integer in Python

Trying to a convert a binary list into a signed 16bit little endian integer
input_data = [['1100110111111011','1101111011111111','0010101000000011'],['1100111111111011','1101100111111111','0010110100000011']]
Desired Output =[[-1074, -34, 810],[-1703, -39, 813]]
This is what I've got so far. It's been adapted from: Hex string to signed int in Python 3.2?,
Conversion from HEX to SIGNED DEC in python
results = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
convert = [int(y[4:6] + y[2:4], 16) for y in hex_convert]
results.append(convert)
print (results)
output: [[64461, 65502, 810], [64463, 65497, 813]]
This is works fine, but the above are unsigned integers. I need signed integers capable of handling negative values. I then tried a different approach:
results_2 = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
to_bytes = [bytes(j, 'utf-8') for j in hex_convert]
split_bits = [int(k, 16) for k in to_bytes]
convert_2 = [int.from_bytes(b, byteorder = 'little', signed = True) for b in to_bytes]
results_2.append(convert_2)
print (results_2)
Output: [[108191910426672, 112589973780528, 56282882144304], [108191943981104, 112589235583024, 56282932475952]]
This result is even more wild than the first. I know my approach is wrong, and it doesn't help that i've never been able to get my head around binary conversion etc, but I feel i'm on the right path with:
(b, byteorder = 'little', signed = True)
but can't work out where i'm wrong. Any help explaining this concept would be greatly appreciated.
This result is even more wild than the first. I know my approach is wrong... but can't work out where i'm wrong.
The problem is in the conversion to bytes. Let's look at it a step at a time:
int(x, 2)
Fine; we treat the string as a base-2 representation of the integer value, and get that integer. Only problem is it's a) unsigned and b) big-endian.
hex(int(x,2))
What this does is create a string representation of the integer, in base 16, with a 0x prefix. Notably, there are two text characters per byte that we want. This is already heading is down the wrong path.
You might have thought of using hexadecimal because you've seen \xAB style escapes inside string representations. This is a completely different thing. The string '\xAB' contains one character. The string '0xAB' contains four.
From there, everything else is still nonsense. Converting to bytes with a text encoding just means that the text character 0 for example is replaced with the byte value 48 (since in UTF-8 it's encoded with a single byte with that value). For this data we get the same results with UTF-8 that we would by assuming plain ASCII (since UTF-8 is "ASCII transparent" and there are no non-ASCII characters in the text).
So how do we do it?
We want to convert the integer from the first step into the bytes used to represent it. Just as there is a .from_bytes class method allowing us to create an integer from underlying bytes, there is an instance method allowing us to get the bytes that would represent the integer.
So, we use .to_bytes, specifying the length, signedness and endianness that was assumed when we created the int from the binary string - that gives us bytes that correspond to that string. Then, we re-create the integer from those bytes, except now specifying the proper signedness and endianness. The reason that .to_bytes makes us specify a length is because the integer doesn't have a particular length - there are a minimum number of bytes required to represent it, but you could use as many more as you like. (This is especially important if you want to handle signed values, since it will do sign-extension automatically.)
Thus:
for i in input_data:
values = [int(x,2) for x in i]
as_bytes = [x.to_bytes(2, byteorder='big', signed=False) for x in values]
reinterpreted = [int.from_bytes(x, byteorder='little', signed=True) for x in as_bytes]
results_2.append(reinterpreted)
But let's improve the organization of the code a bit. I will first make a function to handle a single integer value, and then we can use comprehensions to process the list. In fact, we can use nested comprehensions for the nested list.
def as_signed_little(binary_str):
# This time, taking advantage of positional args and default values.
as_bytes = int(binary_str, 2).to_bytes(2, 'big')
return int.from_bytes(as_bytes, 'little', signed=True)
# And now we can do:
results_2 = [[as_signed_little(x) for x in i] for i in input_data]

How to extract and change a sequence of bytes from a serial packet to the correct integer representation in python?

374c4f4f00000800ff74**d102**29190300006f00fffffffffffffffffffff
This is the serial packet I am processing using pyserial. The two bytes in bold actually correspond to a real world measurement which corresponds to 721(decimal) or 02d1(hex). How do I extract those bytes in python and get the correct int value which is 721?
Processing to and from such byte strings is quickly and easily done with the struct library functions pack/pack_to and unpack/unpack_from:
While it is normally best practice to unpack/unpack the entire packet you can use the _from & _to versions to selectively manipulate packets.
In your case:
>>> import struct
>>> val # Generated using binascii.unhexlify
b'7LOO\x00\x00\x08\x00\xfft\xd1\x02)\x19\x03\x00\x00o\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
>>> struct.unpack_from('<H', val, 10)
(721,) # Note the return is a tupple so you need the 0th element
>>> struct.unpack_from('<H', val, 10)[0]
721
More info
>>> import struct
>>> help (struct.unpack)
Help on built-in function unpack in module _struct:
unpack(...)
unpack(fmt, buffer) -> (v1, v2, ...)
Return a tuple containing values unpacked according to the format string
fmt. Requires len(buffer) == calcsize(fmt). See help(struct) for more
on format strings.
>>> help (struct.pack)
Help on built-in function pack in module _struct:
pack(...)
pack(fmt, v1, v2, ...) -> bytes
Return a bytes object containing the values v1, v2, ... packed according
to the format string fmt. See help(struct) for more on format strings.
>>> help(struct)
Help on module struct:
NAME
struct
DESCRIPTION
Functions to convert between Python values and C structs.
Python bytes objects are used to hold the data representing the C struct
and also as format strings (explained below) to describe the layout of data
in the C struct.
The optional first format char indicates byte order, size and alignment:
#: native order, size & alignment (default)
=: native order, std. size & alignment
<: little-endian, std. size & alignment
>: big-endian, std. size & alignment
!: same as >
The remaining chars indicate types of args and must match exactly;
these can be preceded by a decimal repeat count:
x: pad byte (no data); c:char; b:signed byte; B:unsigned byte;
?: _Bool (requires C99; if not available, char is used instead)
h:short; H:unsigned short; i:int; I:unsigned int;
l:long; L:unsigned long; f:float; d:double.
Special cases (preceding decimal count indicates length):
s:string (array of char); p: pascal string (with count byte).
Special cases (only available in native format):
n:ssize_t; N:size_t;
P:an integer type that is wide enough to hold a pointer.
Special case (not in native mode unless 'long long' in platform C):
q:long long; Q:unsigned long long
Whitespace between formats is ignored.
The variable struct.error is an exception raised on errors.
Your hex-encoded string is of odd length, so I don't know where the padding's missing from, see below
In [18]: s = '374c4f4f00000800ff74d10229190300006f00fffffffffffffffffffff0' # a nibble of padding at the end
In [19]: buffer = binascii.unhexlify(s)
In [20]: buffer
Out[20]: b'7LOO\x00\x00\x08\x00\xfft\xd1\x02)\x19\x03\x00\x00o\x00\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xf0'
In [21]: struct.unpack('<10BH18B', buffer)
Out[21]:
(55,
76,
79,
79,
0,
0,
8,
0,
255,
116,
721,
...
For more information on what the format strings in pack and unpack can be, see the documentation. In short, < stands for little-endian, B for unsigned char (assumed 8-bit width), H for short (assumed 16-bit width).
Since the actual format strings are somewhat strange at first sight, I have given you a solution based on the above answer:
The command unhexlify will convert the 4 byte ascii representation back to a two byte binary representation of your integer.
the '<' takes care of your reversed byte order in your integer (Formatting details)
The 'i' means that we are facing a two byte integer
Hope that helps.
import struct
from binascii import unhexlify
s ="374c4f4f00000800ff74d10229190300006f00fffffffffffffffffffff"
s1= s[20:24]
print struct.unpack('<h', unhexlify(s1))[0]

Convert hex-string to integer with python

Note that the problem is not hex to decimal but a string of hex values to integer.
Say I've got a sting from a hexdump (eg. '6c 02 00 00') so i need to convert that into actual hex first, and then get the integer it represents... (this particular one would be 620 as an int16 and int32)
I tried a lot of things but confused myself more. Is there a quick way to do such a conversion in python (preferably 3.x)?
update From Python 3.7 on, bytes.from_hex will ignore whitespaces -so, the straightforward thing to do is parse the string to a bytes object, and then see then as an integer:
In [10]: int.from_bytes(bytes.fromhex("6c 02 00 00"), byteorder="little")
Out[10]: 620
original answer
Not only that is a string, but it is in little endian order - meanng that just removing the spaces, and using int(xx, 16) call will work. Neither does it have the actual byte values as 4 arbitrary 0-255 numbers (in which case struct.unpack would work).
I think a nice approach is to swap the components back into "human readable" order, and use the int call - thus:
number = int("".join("6c 02 00 00".split()[::-1]), 16)
What happens there: the first part of th expession is the split - it breaks the string at the spaces, and provides a list with four strings, two digits in each. The [::-1] special slice goes next - it means roughly "provide me a subset of elements from the former sequence, starting at the edges, and going back 1 element at a time" - which is a common Python idiom to reverse any sequence.
This reversed sequence is used in the call to "".join(...) - which basically uses the empty string as a concatenator to every element on the sequence - the result of the this call is "0000026c". With this value, we just call Python's int class which accepts a secondary optional paramter denoting the base that should be used to interpret the number denoted in the first argument.
>>> int("".join("6c 02 00 00".split()[::-1]), 16)
620
Another option, is to cummulatively add the conversion of each 2 digits, properly shifted to their weight according to their position - this can also be done in a single expression using reduce, though a 4 line Python for loop would be more readable:
>>> from functools import reduce #not needed in Python2.x
>>> reduce(lambda x, y: x + (int(y[1], 16)<<(8 * y[0]) ), enumerate("6c 02 00 00".split()), 0)
620
update The OP just said he does not actually have the "spaces" in the string - in that case, one can use just abotu the same methods, but taking each two digits instead of the split() call:
reduce(lambda x, y: x + (int(y[1], 16)<<(8 * y[0]//2) ), ((i, a[i:i+2]) for i in range(0, len(a), 2)) , 0)
(where a is the variable with your digits, of course) -
Or, convert it to an actual 4 byte number in memory, usign the hex codec, and unpack the number with struct - this may be more semantic correct for your code:
import codecs
import struct
struct.unpack("<I", codecs.decode("6c020000", "hex") )[0]
So the approach here is to pass each 2 digits to an actual byte in memory in a bytes object returned by the codecs.decode call, and struct to read the 4 bytes in the buffer as a single 32bit integer.
You can use unhexlify() to convert the hex string to its binary form, and then use struct.unpack() to decode the little endian value into an int:
>>> from struct import unpack
>>> from binascii import unhexlify
>>> n = unpack('<i', unhexlify('6c 02 00 00'.replace(' ','')))[0]
>>> n
The format string '<i' means little endian signed integer. You can substitute with '<I' or '<L' for unsigned int or long (both 4 bytes).
If the data does not contain spaces this simplifies to
>>> n = unpack('<i', unhexlify('6c020000'))[0]

ord() expected string of length 1, but int found

I receive a byte-array buffer from network, containing many fields. When I want to print the buffer, I get the following error:
(:ord() expected string of length 1, int
found
print(" ".join("{:02X}".format(ord(c)) for c in buf))
How can I fix this?
Python bytearray and bytes objects yield integers when iterating or indexing, not characters. Remove the ord() call:
print(" ".join("{:02X}".format(c) for c in buf))
From the Bytes documentation:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256 (attempts to violate this restriction will trigger ValueError. This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption).
and further on:
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1)
I'd not use str.format() where a format() function will do; there is no larger string to interpolate the hex digits into:
print(" ".join([format(c, "02X") for c in buf]))
For str.join() calls, using list comprehension is marginally faster as the str.join() call has to convert the iterable to a list anyway; it needs to do a double scan to build the output.

Python unpack binary data, numeric of length 12

I have a file with big endian binaries. There are two numeric fields. The first has length 8 and the second length 12. How can I unpack the two numbers?
I am using the Python module struct (https://docs.python.org/2/library/struct.html) and it works for the first field
num1 = struct.unpack('>Q',payload[0:8])
but I don't know how I can unpack the second number. If I treat it as char(12), then I get something like '\x00\xe3AC\x00\x00\x00\x06\x00\x00\x00\x01'.
Thanks.
I think you should create a new string of bytes for the second number of length 16, fill the last 12 bytes with the string of bytes that hold your number and first 4 ones with zeros.
Then decode the bytestring with unpack with format >QQ, let's say to numHI, numLO variables. Then, you get final number with that: number = numHI * 2^64 + numLO*. AFAIR the integers in Python can be (almost) as large as you wish, so you will have no problems with overflows. That's only rough idea, please comment if you have problems with writing that in actual Python code, I'll then edit my answer to provide more help.
*^ is in this case the math power, so please use math.pow. Alternatively, you can use byte shift: number = numHI << 64 + numLO.

Categories