I am faced with a problem in Python and I think I don't understand how signed numbers are handled in Python. My logic works in Java where everything is signed so need some help in Python.
I have some bytes that are coded in HEX and I need to decode them and interpret them to numbers. The protocol are defined.
Say the input may look like:
raw = '016402570389FFCF008F1205DB2206CA'
And I decode like this:
bin_bytes = binascii.a2b_hex(raw)
lsb = bin_bytes[5] & 0xff
msb = bin_bytes[6] << 8
aNumber = int(lsb | msb)
print(" X: " + str(aNumber / 4000.0))
After dividing by 4000.0, X can be in a range of -0.000025 to +0.25.
This logic works when X is in positive range. When X is expected
to be negative, I am getting back a positive number.
I think I am not handling "msb" correctly when it is a signed number.
How should I handlehandle negative signed number in
Python?
Any tips much appreciated.
You can use Python's struct module to convert the byte string to integers. It takes care of endianness and sign extension for you. I guess you are trying to interpret this 16-byte string as 8 2-byte signed integers, in big-endian byte order. The format string for this is '>8h. The > character tells Python to interpret the string as big endian, 8 means 8 of the following data type, and h means signed short integers.
import struct
nums = struct.unpack('>8h', bin_bytes)
Now nums is a tuple of integers that you can process further.
I'm not quite sure if your data is little or big endian. If it is little-endian, you can use < to indicate that in the struct.unpack format string.
Related
Trying to a convert a binary list into a signed 16bit little endian integer
input_data = [['1100110111111011','1101111011111111','0010101000000011'],['1100111111111011','1101100111111111','0010110100000011']]
Desired Output =[[-1074, -34, 810],[-1703, -39, 813]]
This is what I've got so far. It's been adapted from: Hex string to signed int in Python 3.2?,
Conversion from HEX to SIGNED DEC in python
results = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
convert = [int(y[4:6] + y[2:4], 16) for y in hex_convert]
results.append(convert)
print (results)
output: [[64461, 65502, 810], [64463, 65497, 813]]
This is works fine, but the above are unsigned integers. I need signed integers capable of handling negative values. I then tried a different approach:
results_2 = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
to_bytes = [bytes(j, 'utf-8') for j in hex_convert]
split_bits = [int(k, 16) for k in to_bytes]
convert_2 = [int.from_bytes(b, byteorder = 'little', signed = True) for b in to_bytes]
results_2.append(convert_2)
print (results_2)
Output: [[108191910426672, 112589973780528, 56282882144304], [108191943981104, 112589235583024, 56282932475952]]
This result is even more wild than the first. I know my approach is wrong, and it doesn't help that i've never been able to get my head around binary conversion etc, but I feel i'm on the right path with:
(b, byteorder = 'little', signed = True)
but can't work out where i'm wrong. Any help explaining this concept would be greatly appreciated.
This result is even more wild than the first. I know my approach is wrong... but can't work out where i'm wrong.
The problem is in the conversion to bytes. Let's look at it a step at a time:
int(x, 2)
Fine; we treat the string as a base-2 representation of the integer value, and get that integer. Only problem is it's a) unsigned and b) big-endian.
hex(int(x,2))
What this does is create a string representation of the integer, in base 16, with a 0x prefix. Notably, there are two text characters per byte that we want. This is already heading is down the wrong path.
You might have thought of using hexadecimal because you've seen \xAB style escapes inside string representations. This is a completely different thing. The string '\xAB' contains one character. The string '0xAB' contains four.
From there, everything else is still nonsense. Converting to bytes with a text encoding just means that the text character 0 for example is replaced with the byte value 48 (since in UTF-8 it's encoded with a single byte with that value). For this data we get the same results with UTF-8 that we would by assuming plain ASCII (since UTF-8 is "ASCII transparent" and there are no non-ASCII characters in the text).
So how do we do it?
We want to convert the integer from the first step into the bytes used to represent it. Just as there is a .from_bytes class method allowing us to create an integer from underlying bytes, there is an instance method allowing us to get the bytes that would represent the integer.
So, we use .to_bytes, specifying the length, signedness and endianness that was assumed when we created the int from the binary string - that gives us bytes that correspond to that string. Then, we re-create the integer from those bytes, except now specifying the proper signedness and endianness. The reason that .to_bytes makes us specify a length is because the integer doesn't have a particular length - there are a minimum number of bytes required to represent it, but you could use as many more as you like. (This is especially important if you want to handle signed values, since it will do sign-extension automatically.)
Thus:
for i in input_data:
values = [int(x,2) for x in i]
as_bytes = [x.to_bytes(2, byteorder='big', signed=False) for x in values]
reinterpreted = [int.from_bytes(x, byteorder='little', signed=True) for x in as_bytes]
results_2.append(reinterpreted)
But let's improve the organization of the code a bit. I will first make a function to handle a single integer value, and then we can use comprehensions to process the list. In fact, we can use nested comprehensions for the nested list.
def as_signed_little(binary_str):
# This time, taking advantage of positional args and default values.
as_bytes = int(binary_str, 2).to_bytes(2, 'big')
return int.from_bytes(as_bytes, 'little', signed=True)
# And now we can do:
results_2 = [[as_signed_little(x) for x in i] for i in input_data]
I just learned Python (3.x) and I am stuck with HEX String conversion to Float. I have this HEX String values:
'0x22354942F31AFA42CE6A494311518A43082CAF437C6BD4C35F78FA433BF10F442A5222448D3D3544200749C438295C4468AF6E4406B4804450518A4423B0934450E99CC4'
And I want to turn it into float.
I have tried to use this code:
bs=bytes.fromhex(row[2:])
fmt = '<' + ('H' * (len(bs) // 2))
res=struct.unpack(fmt, bs)
and it gives me the result of 13602.0,16969.0,6899.0,17146.0,27342.0,17225.0,20753.0,17290.0,11272.0,17327.0,27516.0,50132.0,30815.0,17402.0,61755.0,17423.0,21034.0,17442.0,15757.0,17461.0,1824.0,50249.0,10552.0,17500.0,44904.0,17518.0,46086.0,17536.0,20816.0,17546.0,45091.0,17555.0,59728.0,50332.0
After checking it, I found out that the code that what I currently have is float in base 16, while I need it in base 32 (or maybe not because I am not sure what base/format), with expected float results as 50.3018875, 125.052635,201.4172,276.633331,350.344,424.839722,500.9404,575.7692,649.2838,724.961731,804.1113,880.644043,954.7407,1029.62573,106.541,1181.50427,1255.291 the values which I got from this Calculator Converter.
What should I change in the coding to get the expected results?
Thank you.
Let's break things down here, because you seem to be confused a bit with all of the juggling of representations. You have some hexadecimal string (that's base 16 encoding) of some binary data. That's your 0x22354942F31AFA42CE6A494311.... You correctly identified that you can convert this from its encoded form to python bytes with bytes.fromhex:
hex_encoded = '0x22354942F31AFA42CE6A494311518A43082CAF437C6BD4C35F78FA433BF10F442A5222448D3D3544200749C438295C4468AF6E4406B4804450518A4423B0934450E99CC4'
binary_data = bytes.fromhex(hex_encoded[2:]) # we do 2: to remove the leading '0x'
At this point, unless we know how binary_data was constructed we can't do anything. But we can take some guesses. You know the first few numbers are floating points: 50.3018875, 125.052635, 201.4172, .... Typically floats are encoded using the IEEE 754 standard. This provides 3 different encodings of a floating point number: binary16 (16 bits), float (32 bits), and double (64 bits). You can see these in the struct documentation, they are format codes 'e', 'f', and 'd', respectively. We can try each to see which of (if any) your binary data is encoded as. By trial and error, we discover your data was encoded as 32-bit floats, so you can decode them with:
FLOAT = 'f'
fmt = '<' + FLOAT * (len(binary_data) // struct.calcsize(FLOAT))
numbers = struct.unpack(fmt, binary_data)
print(numbers)
Why did what you tried not work? Well you used the format code 'H' which is for an unsigned short. This is an integer, which is why you were getting back numbers with no fractional part!
Hi I have a 32b value that I need to easily truncate in to it's four bytes, convert each byte to ASCII and combine them to a four letter string. And I also need the reverse process. I have been able to do this in one direction in the following ugly way:
## the variable "binword" is a 32 bit value read directly from an MCU, where each byte is an
## ASCII character
char0 = (binword & 0xFF000000) >> 24
char1 = (binword & 0xFF0000) >> 16
char2 = (binword & 0xFF00) >> 8
char3 = (binword & 0xFF)
fourLetterWord = str(unichr(char0))+str(unichr(char1))+str(unichr(char2))+str(unichr(char3))
Now, I find this method really un-elegant and time consuming, so the question is how do I do this better? And, I guess the more important question, how do I convert the other way?
You should use the struct module's pack and unpack calls for these convertions
number = 32424234
import struct
result = struct.pack("I", number)
and back:
number = struct.unpack("I", result)[0]
Please, refer to the official docs on the struct module for the struct-string syntax,
and markers to ensure endiannes, and number size.
https://docs.python.org/2/library/struct.html
On a side note - this is by no way "ASCII" - it is a bytestring.
ASCII refers to a particular text encoding with codes on the 32-127 numeric range.
The point is that you should not think on bytestrings as text, if you need a stream of bytes - and much less think of "ASCII" as an alias for text strings - as it can represent less than 1% of textual characters existing in the World.
I need to rewrite some Python script in Objective-C. It's not that hard since Python is easily readable but this piece of code struggles me a bit.
def str_to_a32(b):
if len(b) % 4:
# pad to multiple of 4
b += '\0' * (4 - len(b) % 4)
return struct.unpack('>%dI' % (len(b) / 4), b)
What is this function supposed to do?
I'm not positive, but I'm using the documentation to take a stab at it.
Looking at the docs, we're going to return a tuple based on the format string:
Unpack the string (presumably packed by pack(fmt, ...)) according to the given format. The result is a tuple even if it contains exactly one item. The string must contain exactly the amount of data required by the format (len(string) must equal calcsize(fmt)).
The item coming in (b) is probably a byte buffer (represented as a string) - looking at the examples they are represented the the \x escape, which consumes the next two characters as hex.
It appears the format string is
'>%dI' % (len(b) / 4)
The % and %d are going to put a number into the format string, so if the length of b is 32 the format string becomes
`>8I`
The first part of the format string is >, which the documentation says is setting the byte order to big-endian and size to standard.
The I says it will be an unsigned int with size 4 (docs), and the 8 in front of it means it will be repeated 8 times.
>IIIIIIII
So I think this is saying: take this byte buffer, make sure it's a multiple of 4 by appending as many 0x00s as is necessary, then unpack that into a tuple with as many unsigned integers as there are blocks of 4 bytes in the buffer.
Looks like it's supposed to take an input array of bytes represented as a string and unpack them as big-endian (the ">") unsigned ints (the 'I') The formatting codes are explaied in http://docs.python.org/2/library/struct.html
This takes a string and converts it into a tuple of Unsigned Integers. If you look at the python struct documentation you will see how it works. In a nutshell it handles conversions between Python values and C structs represented as Python strings for handling binary data stored in files (unceremoniously copied from the link provided).
In your case, the function takes a string, b and adds some extra characters to make sure that it is the standard size of the an unsigned int (see link), and then converts it into a tuple of integers using the big endian representation of the characters. This is the '>' part. The I part says to use unsigned integers
I am currently using an Arduino that's outputting some integers (int) through Serial (using pySerial) to a Python script that I'm writing for the Arduino to communicate with X-Plane, a flight simulation program.
I managed to separate the original into two bytes so that I could send it over to the script, but I'm having a little trouble reconstructing the original integer.
I tried using basic bitwise operators (<<, >> etc.) as I would have done in a C++like program, but it does not seem to be working.
I suspect it has to do with data types. I may be using integers with bytes in the same operations, but I can't really tell which type each variable holds, since you don't really declare variables in Python, as far as I know (I'm very new to Python).
self.pot=self.myline[2]<<8
self.pot|=self.myline[3]
You can use the struct module to convert between integers and representation as bytes. In your case, to convert from a Python integer to two bytes and back, you'd use:
>>> import struct
>>> struct.pack('>H', 12345)
'09'
>>> struct.unpack('>H', '09')
(12345,)
The first argument to struct.pack and struct.unpack represent how you want you data to be formatted. Here, I ask for it to be in big-ending mode by using the > prefix (you can use < for little-endian, or = for native) and then I say there is a single unsigned short (16-bits integer) represented by the H.
Other possibilities are b for a signed byte, B for an unsigned byte, h for a signed short (16-bits), i for a signed 32-bits integer, I for an unsigned 32-bits integer. You can get the complete list by looking at the documentation of the struct module.
For example, using Big Endian encoding:
int.from_bytes(my_bytes, byteorder='big')
What you have seems basically like it should work, assuming the data stored in myline has the high byte first:
myline = [0, 1, 2, 3]
pot = myline[2]<<8 | myline[3]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 515, 0x0203"
Otherwise, if it's low-byte first you'd need to do the opposite way:
myline = [0, 1, 2, 3]
pot = myline[3]<<8 | myline[2]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 770, 0x0302"
This totally works:
long = 500
first = long & 0xff #244
second = long >> 8 #1
result = (second << 8) + first #500
If you are not sure of types in 'myline' please check Stack Overflow question How to determine the variable type in Python?.
To convert a byte or char to the number it represents, use ord(). Here's a simple round trip from an int to bytes and back:
>>> number = 3**9
>>> hibyte = chr(number / 256)
>>> lobyte = chr(number % 256)
>>> hibyte, lobyte
('L', '\xe3')
>>> print number == (ord(hibyte) << 8) + ord(lobyte)
True
If your myline variable is string or bytestring, you can use the formula in the last line above. If it somehow is a list of integers, then of course you don't need ord.