Byte array to Int in Python 2.x using Standard Libraries - python

I am comfortable in Python 3.x and Bytearray to Decimal conversion using int.from bytes(). Could come up with the below conversion snippet. Is there a way to achieve the same functionality using Python 2 for positive and negative integers.
val = bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00')
a = int.from_bytes(val, byteorder='big', signed=True)
# print(type(a), type(val), val, a)
# <class 'int'> <class 'bytearray'> bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00') -2083330000000000000000
Need to use Python 2.7 standard libraries to convert byte array to Int.
Eg. bytearray(b'\x00')--> Expected Result: 0
bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00') --> Expected Result: -300000000000000000000
bytearray(b'\x10CV\x1a\x88)0\x00\x00') --> Expected Result: 300000000000000000000

There is no built-in function in Python 2.7 to do the equivalent of int.from_bytes in 3.2+; that's why the method was added in the first place.
If you don't care about handling any cases other than big-endian signed ints, and care about readability more than performance (so you can extend it or maintain it yourself), the simplest solution is probably an explicit loop over the bytes.
For unsigned, this would be easy:
n = 0
for by in b:
n = n * 256 + by
But to handle negative numbers, you need to do three things:
Take off the sign bit from the highest byte. Since we only care about big-endian, this is the 0x80 bit on b[0].
That makes an empty bytearray a special case, so handle that specially.
At the end, if the sign bit was set, 2's-complement the result.
So:
def int_from_bytes(b):
'''Convert big-endian signed integer bytearray to int
int_from_bytes(b) == int.from_bytes(b, 'big', signed=True)'''
if not b: # special-case 0 to avoid b[0] raising
return 0
n = b[0] & 0x7f # skip sign bit
for by in b[1:]:
n = n * 256 + by
if b[0] & 0x80: # if sign bit is set, 2's complement
bits = 8*len(b)
offset = 2**(bits-1)
return n - offset
else:
return n
(This works on any iterable of ints. In Python 3, that includes both bytes and bytearray; in Python 2, it includes bytearray but not str.)
Testing your inputs in Python 3:
>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
... bytearray(b'\x00'),
... bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
... bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
... print(int.from_bytes(b, 'big', signed=True), int_from_bytes(b))
-2083330000000000000000 -2083330000000000000000
0 0
-300000000000000000000 -300000000000000000000
300000000000000000000 300000000000000000000
And in Python 2:
>>> for b in (bytearray(b'\x8f\x0f\xfd\x02\xf4\x95s\x00\x00'),
... bytearray(b'\x00'),
... bytearray(b'\xef\xbc\xa9\xe5w\xd6\xd0\x00\x00'),
... bytearray(b'\x10CV\x1a\x88)0\x00\x00')):
... print int_from_bytes(b)
-2083330000000000000000
0
-300000000000000000000
300000000000000000000
If this is a bottleneck, there are almost surely faster ways to do this. Maybe via gmpy2, for example. In fact, even converting the bytes to a hex string and unhexlifying might be faster, even though it's more than twice the work, if you can find a way to move those main loops from Python to C. Or you could merge up the results of calling struct.unpack_from on 8 bytes at a time instead of handling each byte one by one. But this version should be easy to understand and maintain, and doesn't require anything outside the stdlib.

Related

How do I do a bitwise Not operation in Python?

In order to test building an Xor operation with more basic building blocks (using Nand, Or, and And in my case) I need to be able to do a Not operation. The built-in not only seems to do this with single bits. If I do:
x = 0b1100
x = not x
I should get 0b0011 but instead I just get 0b0. What am I doing wrong? Or is Python just missing this basic functionality?
I know that Python has a built-in Xor function but I've been using Python to test things for an HDL project/course where I need to build an Xor gate. I wanted to test this in Python but I can't without an equivalent to a Not gate.
The problem with using ~ in Python, is that it works with signed integers. This is also the only way that really makes sense unless you limit yourself to a particular number of bits. It will work ok with bitwise math, but it can make it hard to interpret the intermediate results.
For 4 bit logic, you should just subtract from 0b1111
0b1111 - 0b1100 # == 0b0011
For 8 bit logic, subtract from 0b11111111 etc.
The general form is
def bit_not(n, numbits=8):
return (1 << numbits) - 1 - n
Another way to achieve this, is to assign a mask like this (should be all 1's):
mask = 0b1111
Then xor it with your number like this:
number = 0b1100
mask = 0b1111
print(bin(number ^ mask))
You can refer the xor truth table to know why it works.
Python bitwise ~ operator invert all bits of integer but we can't see native result because all integers in Python has signed representation.
Indirectly we can examine that:
>>> a = 65
>>> a ^ ~a
-1
Or the same:
>>> a + ~a
-1
Ther result -1 means all bits are set. But the minus sign ahead don't allow us to directly examine this fact:
>>> bin(-1)
'-0b1'
The solution is simple: we must use unsigned integers.
First way is to import numpy or ctypes modules wich both support unsigned integers. But numpy more simplest using than ctypes (at least for me):
import numpy as np
a = np.uint8(0b1100)
y = ~x
Check result:
>>> bin(x)
'0b1100'
>>> bin(y)
'0b11110011'
And finally check:
>>> x + y
255
Unsigned integer '255' for 8-bits integers (bytes) mean the same as '-1' becouse has all bits set to 1. Make sure:
>>> np.uint8(-1)
255
And another simplest solution, not quite right, but if you want to include additional modules, you can invert all bits with XOR operation, where second argument has all bits are set to 1:
a = 0b1100
b = a ^ 0xFF
This operation will also drop most significant bit of signed integer and we can see result like this:
>>> print('{:>08b}'.format(a))
00001100
>>> print('{:>08b}'.format(b))
11110011
Finally solution contains one more operation and therefore is not optimal:
>>> b = ~a & 0xFF
>>> print('{:>08b}'.format(b))
11110011
Try this, it's called the bitwise complement operator:
~0b1100
The answers here collectively have great nuggets in each one, but all do not scale well with depending on edge cases.
Rather than fix upon an 8-bit mask or requiring the programmer to change how many bits are in the mask, simply create a mask based on input via bit_length():
def bit_not(num):
return num ^ ((1 << num.bit_length()) - 1)
string of binary can be used to preserve the left 0s, since we know that:
bin(0b000101) # '0b101'
bin(0b101) # '0b101'
This function will return string format of the NOT of input number
def not_bitwise(n):
'''
n: input string of binary number (positive or negative)
return: binary number (string format)
'''
head, tail = n.split('b')
not_bin = head+'b'+tail.replace('0','a').replace('1','0').replace('a','1')
return not_bin
Example:
In[266]: not_bitwise('0b0001101')
Out[266]: '0b1110010'
In[267]: int(not_bitwise('0b0001101'), 2)
Out[267]: 114
In[268]: not_bitwise('-0b1010101')
Out[268]: '-0b0101010'
In[269]: int(not_bitwise('-0b1010101'), 2)
Out[269]: -42
The general form given by John La Rooy, can be simplified in this way (python == 2.7 and >=3.1):
def bit_not(n):
return (1 << n.bit_length()) - 1 - n

Convert hex-string to integer with python

Note that the problem is not hex to decimal but a string of hex values to integer.
Say I've got a sting from a hexdump (eg. '6c 02 00 00') so i need to convert that into actual hex first, and then get the integer it represents... (this particular one would be 620 as an int16 and int32)
I tried a lot of things but confused myself more. Is there a quick way to do such a conversion in python (preferably 3.x)?
update From Python 3.7 on, bytes.from_hex will ignore whitespaces -so, the straightforward thing to do is parse the string to a bytes object, and then see then as an integer:
In [10]: int.from_bytes(bytes.fromhex("6c 02 00 00"), byteorder="little")
Out[10]: 620
original answer
Not only that is a string, but it is in little endian order - meanng that just removing the spaces, and using int(xx, 16) call will work. Neither does it have the actual byte values as 4 arbitrary 0-255 numbers (in which case struct.unpack would work).
I think a nice approach is to swap the components back into "human readable" order, and use the int call - thus:
number = int("".join("6c 02 00 00".split()[::-1]), 16)
What happens there: the first part of th expession is the split - it breaks the string at the spaces, and provides a list with four strings, two digits in each. The [::-1] special slice goes next - it means roughly "provide me a subset of elements from the former sequence, starting at the edges, and going back 1 element at a time" - which is a common Python idiom to reverse any sequence.
This reversed sequence is used in the call to "".join(...) - which basically uses the empty string as a concatenator to every element on the sequence - the result of the this call is "0000026c". With this value, we just call Python's int class which accepts a secondary optional paramter denoting the base that should be used to interpret the number denoted in the first argument.
>>> int("".join("6c 02 00 00".split()[::-1]), 16)
620
Another option, is to cummulatively add the conversion of each 2 digits, properly shifted to their weight according to their position - this can also be done in a single expression using reduce, though a 4 line Python for loop would be more readable:
>>> from functools import reduce #not needed in Python2.x
>>> reduce(lambda x, y: x + (int(y[1], 16)<<(8 * y[0]) ), enumerate("6c 02 00 00".split()), 0)
620
update The OP just said he does not actually have the "spaces" in the string - in that case, one can use just abotu the same methods, but taking each two digits instead of the split() call:
reduce(lambda x, y: x + (int(y[1], 16)<<(8 * y[0]//2) ), ((i, a[i:i+2]) for i in range(0, len(a), 2)) , 0)
(where a is the variable with your digits, of course) -
Or, convert it to an actual 4 byte number in memory, usign the hex codec, and unpack the number with struct - this may be more semantic correct for your code:
import codecs
import struct
struct.unpack("<I", codecs.decode("6c020000", "hex") )[0]
So the approach here is to pass each 2 digits to an actual byte in memory in a bytes object returned by the codecs.decode call, and struct to read the 4 bytes in the buffer as a single 32bit integer.
You can use unhexlify() to convert the hex string to its binary form, and then use struct.unpack() to decode the little endian value into an int:
>>> from struct import unpack
>>> from binascii import unhexlify
>>> n = unpack('<i', unhexlify('6c 02 00 00'.replace(' ','')))[0]
>>> n
The format string '<i' means little endian signed integer. You can substitute with '<I' or '<L' for unsigned int or long (both 4 bytes).
If the data does not contain spaces this simplifies to
>>> n = unpack('<i', unhexlify('6c020000'))[0]

How can I convert two bytes of an integer back into an integer in Python?

I am currently using an Arduino that's outputting some integers (int) through Serial (using pySerial) to a Python script that I'm writing for the Arduino to communicate with X-Plane, a flight simulation program.
I managed to separate the original into two bytes so that I could send it over to the script, but I'm having a little trouble reconstructing the original integer.
I tried using basic bitwise operators (<<, >> etc.) as I would have done in a C++like program, but it does not seem to be working.
I suspect it has to do with data types. I may be using integers with bytes in the same operations, but I can't really tell which type each variable holds, since you don't really declare variables in Python, as far as I know (I'm very new to Python).
self.pot=self.myline[2]<<8
self.pot|=self.myline[3]
You can use the struct module to convert between integers and representation as bytes. In your case, to convert from a Python integer to two bytes and back, you'd use:
>>> import struct
>>> struct.pack('>H', 12345)
'09'
>>> struct.unpack('>H', '09')
(12345,)
The first argument to struct.pack and struct.unpack represent how you want you data to be formatted. Here, I ask for it to be in big-ending mode by using the > prefix (you can use < for little-endian, or = for native) and then I say there is a single unsigned short (16-bits integer) represented by the H.
Other possibilities are b for a signed byte, B for an unsigned byte, h for a signed short (16-bits), i for a signed 32-bits integer, I for an unsigned 32-bits integer. You can get the complete list by looking at the documentation of the struct module.
For example, using Big Endian encoding:
int.from_bytes(my_bytes, byteorder='big')
What you have seems basically like it should work, assuming the data stored in myline has the high byte first:
myline = [0, 1, 2, 3]
pot = myline[2]<<8 | myline[3]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 515, 0x0203"
Otherwise, if it's low-byte first you'd need to do the opposite way:
myline = [0, 1, 2, 3]
pot = myline[3]<<8 | myline[2]
print 'pot: {:d}, 0x{:04x}'.format(pot, pot) # outputs "pot: 770, 0x0302"
This totally works:
long = 500
first = long & 0xff #244
second = long >> 8 #1
result = (second << 8) + first #500
If you are not sure of types in 'myline' please check Stack Overflow question How to determine the variable type in Python?.
To convert a byte or char to the number it represents, use ord(). Here's a simple round trip from an int to bytes and back:
>>> number = 3**9
>>> hibyte = chr(number / 256)
>>> lobyte = chr(number % 256)
>>> hibyte, lobyte
('L', '\xe3')
>>> print number == (ord(hibyte) << 8) + ord(lobyte)
True
If your myline variable is string or bytestring, you can use the formula in the last line above. If it somehow is a list of integers, then of course you don't need ord.

Convert int to single byte in a string?

I'm implementing PKCS#7 padding right now in Python and need to pad chunks of my file in order to amount to a number divisible by sixteen. I've been recommended to use the following method to append these bytes:
input_chunk += '\x00'*(-len(input_chunk)%16)
What I need to do is the following:
input_chunk_remainder = len(input_chunk) % 16
input_chunk += input_chunk_remainder * input_chunk_remainder
Obviously, the second line above is wrong; I need to convert the first input_chunk_remainder to a single byte string. How can I do this in Python?
In Python 3, you can create bytes of a given numeric value with the bytes() type; you can pass in a list of integers (between 0 and 255):
>>> bytes([5])
b'\x05'
bytes([5] * 5)
b'\x05\x05\x05\x05\x05'
An alternative method is to use an array.array() with the right number of integers:
>>> import array
>>> array.array('B', 5*[5]).tobytes()
b'\x05\x05\x05\x05\x05'
or use the struct.pack() function to pack your integers into bytes:
>>> import struct
>>> struct.pack('{}B'.format(5), *(5 * [5]))
b'\x05\x05\x05\x05\x05'
There may be more ways.. :-)
In Python 2 (ancient now), you can do the same by using the chr() function:
>>> chr(5)
'\x05'
>>> chr(5) * 5
'\x05\x05\x05\x05\x05'
In Python3, the bytes built-in accepts a sequence of integers. So for just one integer:
>>> bytes([5])
b'\x05'
Of course, thats bytes, not a string. But in Python3 world, OP would probably use bytes for the app he described, anyway.

Is there a way to pad to an even number of digits?

I'm trying to create a hex representation of some data that needs to be transmitted (specifically, in ASN.1 notation). At some points, I need to convert data to its hex representation. Since the data is transmitted as a byte sequence, the hex representation has to be padded with a 0 if the length is odd.
Example:
>>> hex2(3)
'03'
>>> hex2(45)
'2d'
>>> hex2(678)
'02a6'
The goal is to find a simple, elegant implementation for hex2.
Currently I'm using hex, stripping out the first two characters, then padding the string with a 0 if its length is odd. However, I'd like to find a better solution for future reference. I've looked in str.format without finding anything that pads to a multiple.
def hex2(n):
x = '%x' % (n,)
return ('0' * (len(x) % 2)) + x
To be totally honest, I am not sure what the issue is. A straightforward implementation of what you describe goes like this:
def hex2(v):
s = hex(v)[2:]
return s if len(s) % 2 == 0 else '0' + s
I would not necessarily call this "elegant" but I would certainly call it "simple."
Python's binascii module's b2a_hex is guaranteed to return an even-length string.
the trick then is to convert the integer into a bytestring. Python3.2 and higher has that built-in to int:
from binascii import b2a_hex
def hex2(integer):
return b2a_hex(integer.to_bytes((integer.bit_length() + 7) // 8, 'big'))
Might want to look at the struct module, which is designed for byte-oriented i/o.
import struct
>>> struct.pack('>i',678)
'\x00\x00\x02\xa6'
#Use h instead of i for shorts
>>> struct.pack('>h',1043)
'\x04\x13'

Categories