Converting raw binary data to unsigned int in python [duplicate] - python

I'm currently working on an encryption/decryption program and I need to be able to convert bytes to an integer. I know that:
bytes([3]) = b'\x03'
Yet I cannot find out how to do the inverse. What am I doing terribly wrong?

Assuming you're on at least 3.2, there's a built in for this:
int.from_bytes( bytes, byteorder, *, signed=False )
...
The argument bytes must either be a bytes-like object or an iterable
producing bytes.
The byteorder argument determines the byte order used to represent the
integer. If byteorder is "big", the most significant byte is at the
beginning of the byte array. If byteorder is "little", the most
significant byte is at the end of the byte array. To request the
native byte order of the host system, use sys.byteorder as the byte
order value.
The signed argument indicates whether two’s complement is used to
represent the integer.
## Examples:
int.from_bytes(b'\x00\x01', "big") # 1
int.from_bytes(b'\x00\x01', "little") # 256
int.from_bytes(b'\x00\x10', byteorder='little') # 4096
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True) #-1024

Lists of bytes are subscriptable (at least in Python 3.6). This way you can retrieve the decimal value of each byte individually.
>>> intlist = [64, 4, 26, 163, 255]
>>> bytelist = bytes(intlist) # b'#\x04\x1a\xa3\xff'
>>> for b in bytelist:
... print(b) # 64 4 26 163 255
>>> [b for b in bytelist] # [64, 4, 26, 163, 255]
>>> bytelist[2] # 26

list() can be used to convert bytes to int (works in Python 3.7):
list(b'\x03\x04\x05')
[3, 4, 5]

int.from_bytes( bytes, byteorder, *, signed=False )
doesn't work with me
I used function from this website, it works well
https://coderwall.com/p/x6xtxq/convert-bytes-to-int-or-int-to-bytes-in-python
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
def int_to_bytes(value, length):
result = []
for i in range(0, length):
result.append(value >> (i * 8) & 0xff)
result.reverse()
return result

In case of working with buffered data I found this useful:
int.from_bytes([buf[0],buf[1],buf[2],buf[3]], "big")
Assuming that all elements in buf are 8-bit long.

An old question that I stumbled upon while looking for an existing solution. Rolled my own and thought I'd share because it allows you to create a 32-bit integer from a list of bytes, specifying an offset.
def bytes_to_int(bList, offset):
r = 0
for i in range(4):
d = 32 - ((i + 1) * 8)
r += bList[offset + i] << d
return r

#convert bytes to int
def bytes_to_int(value):
return int.from_bytes(bytearray(value), 'little')
bytes_to_int(b'\xa231')

Related

Simulating a C cast in Python [duplicate]

Let's say I have this number i = -6884376.
How do I refer to it as to an unsigned variable?
Something like (unsigned long)i in C.
Assuming:
You have 2's-complement representations in mind; and,
By (unsigned long) you mean unsigned 32-bit integer,
then you just need to add 2**32 (or 1 << 32) to the negative value.
For example, apply this to -1:
>>> -1
-1
>>> _ + 2**32
4294967295L
>>> bin(_)
'0b11111111111111111111111111111111'
Assumption #1 means you want -1 to be viewed as a solid string of 1 bits, and assumption #2 means you want 32 of them.
Nobody but you can say what your hidden assumptions are, though. If, for example, you have 1's-complement representations in mind, then you need to apply the ~ prefix operator instead. Python integers work hard to give the illusion of using an infinitely wide 2's complement representation (like regular 2's complement, but with an infinite number of "sign bits").
And to duplicate what the platform C compiler does, you can use the ctypes module:
>>> import ctypes
>>> ctypes.c_ulong(-1) # stuff Python's -1 into a C unsigned long
c_ulong(4294967295L)
>>> _.value
4294967295L
C's unsigned long happens to be 4 bytes on the box that ran this sample.
To get the value equivalent to your C cast, just bitwise and with the appropriate mask. e.g. if unsigned long is 32 bit:
>>> i = -6884376
>>> i & 0xffffffff
4288082920
or if it is 64 bit:
>>> i & 0xffffffffffffffff
18446744073702667240
Do be aware though that although that gives you the value you would have in C, it is still a signed value, so any subsequent calculations may give a negative result and you'll have to continue to apply the mask to simulate a 32 or 64 bit calculation.
This works because although Python looks like it stores all numbers as sign and magnitude, the bitwise operations are defined as working on two's complement values. C stores integers in twos complement but with a fixed number of bits. Python bitwise operators act on twos complement values but as though they had an infinite number of bits: for positive numbers they extend leftwards to infinity with zeros, but negative numbers extend left with ones. The & operator will change that leftward string of ones into zeros and leave you with just the bits that would have fit into the C value.
Displaying the values in hex may make this clearer (and I rewrote to string of f's as an expression to show we are interested in either 32 or 64 bits):
>>> hex(i)
'-0x690c18'
>>> hex (i & ((1 << 32) - 1))
'0xff96f3e8'
>>> hex (i & ((1 << 64) - 1)
'0xffffffffff96f3e8L'
For a 32 bit value in C, positive numbers go up to 2147483647 (0x7fffffff), and negative numbers have the top bit set going from -1 (0xffffffff) down to -2147483648 (0x80000000). For values that fit entirely in the mask, we can reverse the process in Python by using a smaller mask to remove the sign bit and then subtracting the sign bit:
>>> u = i & ((1 << 32) - 1)
>>> (u & ((1 << 31) - 1)) - (u & (1 << 31))
-6884376
Or for the 64 bit version:
>>> u = 18446744073702667240
>>> (u & ((1 << 63) - 1)) - (u & (1 << 63))
-6884376
This inverse process will leave the value unchanged if the sign bit is 0, but obviously it isn't a true inverse because if you started with a value that wouldn't fit within the mask size then those bits are gone.
Python doesn't have builtin unsigned types. You can use mathematical operations to compute a new int representing the value you would get in C, but there is no "unsigned value" of a Python int. The Python int is an abstraction of an integer value, not a direct access to a fixed-byte-size integer.
Since version 3.2 :
def unsignedToSigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=False), 'little', signed=True)
def signedToUnsigned(n, byte_count):
return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
output :
In [3]: unsignedToSigned(5, 1)
Out[3]: 5
In [4]: signedToUnsigned(5, 1)
Out[4]: 5
In [5]: unsignedToSigned(0xFF, 1)
Out[5]: -1
In [6]: signedToUnsigned(0xFF, 1)
---------------------------------------------------------------------------
OverflowError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 signedToUnsigned(0xFF, 1)
Input In [1], in signedToUnsigned(n, byte_count)
4 def signedToUnsigned(n, byte_count):
----> 5 return int.from_bytes(n.to_bytes(byte_count, 'little', signed=True), 'little', signed=False)
OverflowError: int too big to convert
In [7]: signedToUnsigned(-1, 1)
Out[7]: 255
Explanations : to/from_bytes convert to/from bytes, in 2's complement considering the number as one of size byte_count * 8 bits. In C/C++, chances are you should pass 4 or 8 as byte_count for respectively a 32 or 64 bit number (the int type).
I first pack the input number in the format it is supposed to be from (using the signed argument to control signed/unsigned), then unpack to the format we would like it to have been from. And you get the result.
Note the Exception when trying to use fewer bytes than required to represent the number (In [6]). 0xFF is 255 which can't be represented using a C's char type (-128 ≤ n ≤ 127). This is preferable to any other behavior.
You could use the struct Python built-in library:
Encode:
import struct
i = -6884376
print('{0:b}'.format(i))
packed = struct.pack('>l', i) # Packing a long number.
unpacked = struct.unpack('>L', packed)[0] # Unpacking a packed long number to unsigned long
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
-11010010000110000011000
4288082920
11111111100101101111001111101000
Decode:
dec_pack = struct.pack('>L', unpacked) # Packing an unsigned long number.
dec_unpack = struct.unpack('>l', dec_pack)[0] # Unpacking a packed unsigned long number to long (revert action).
print(dec_unpack)
Out:
-6884376
[NOTE]:
> is BigEndian operation.
l is long.
L is unsigned long.
In amd64 architecture int and long are 32bit, So you could use i and I instead of l and L respectively.
[UPDATE]
According to the #hl037_ comment, this approach works on int32 not int64 or int128 as I used long operation into struct.pack(). Nevertheless, in the case of int64, the written code would be changed simply using long long operand (q) in struct as follows:
Encode:
i = 9223372036854775807 # the largest int64 number
packed = struct.pack('>q', i) # Packing an int64 number
unpacked = struct.unpack('>Q', packed)[0] # Unpacking signed to unsigned
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
9223372036854775807
111111111111111111111111111111111111111111111111111111111111111
Next, follow the same way for the decoding stage. As well as this, keep in mind q is long long integer — 8byte and Q is unsigned long long
But in the case of int128, the situation is slightly different as there is no 16-byte operand for struct.pack(). Therefore, you should split your number into two int64.
Here's how it should be:
i = 10000000000000000000000000000000000000 # an int128 number
print(len('{0:b}'.format(i)))
max_int64 = 0xFFFFFFFFFFFFFFFF
packed = struct.pack('>qq', (i >> 64) & max_int64, i & max_int64)
a, b = struct.unpack('>QQ', packed)
unpacked = (a << 64) | b
print(unpacked)
print('{0:b}'.format(unpacked))
Out:
123
10000000000000000000000000000000000000
111100001011110111000010000110101011101101001000110110110010000000011110100001101101010000000000000000000000000000000000000
just use abs for converting unsigned to signed in python
a=-12
b=abs(a)
print(b)
Output:
12

How to split bytes into bits [duplicate]

I am working with Python3.2. I need to take a hex stream as an input and parse it at bit-level. So I used
bytes.fromhex(input_str)
to convert the string to actual bytes. Now how do I convert these bytes to bits?
Another way to do this is by using the bitstring module:
>>> from bitstring import BitArray
>>> input_str = '0xff'
>>> c = BitArray(hex=input_str)
>>> c.bin
'0b11111111'
And if you need to strip the leading 0b:
>>> c.bin[2:]
'11111111'
The bitstring module isn't a requirement, as jcollado's answer shows, but it has lots of performant methods for turning input into bits and manipulating them. You might find this handy (or not), for example:
>>> c.uint
255
>>> c.invert()
>>> c.bin[2:]
'00000000'
etc.
What about something like this?
>>> bin(int('ff', base=16))
'0b11111111'
This will convert the hexadecimal string you have to an integer and that integer to a string in which each byte is set to 0/1 depending on the bit-value of the integer.
As pointed out by a comment, if you need to get rid of the 0b prefix, you can do it this way:
>>> bin(int('ff', base=16))[2:]
'11111111'
... or, if you are using Python 3.9 or newer:
>>> bin(int('ff', base=16)).removepreffix('0b')
'11111111'
Note: using lstrip("0b") here will lead to 0 integer being converted to an empty string. This is almost always not what you want to do.
Operations are much faster when you work at the integer level. In particular, converting to a string as suggested here is really slow.
If you want bit 7 and 8 only, use e.g.
val = (byte >> 6) & 3
(this is: shift the byte 6 bits to the right - dropping them. Then keep only the last two bits 3 is the number with the first two bits set...)
These can easily be translated into simple CPU operations that are super fast.
using python format string syntax
>>> mybyte = bytes.fromhex("0F") # create my byte using a hex string
>>> binary_string = "{:08b}".format(int(mybyte.hex(),16))
>>> print(binary_string)
00001111
The second line is where the magic happens. All byte objects have a .hex() function, which returns a hex string. Using this hex string, we convert it to an integer, telling the int() function that it's a base 16 string (because hex is base 16). Then we apply formatting to that integer so it displays as a binary string. The {:08b} is where the real magic happens. It is using the Format Specification Mini-Language format_spec. Specifically it's using the width and the type parts of the format_spec syntax. The 8 sets width to 8, which is how we get the nice 0000 padding, and the b sets the type to binary.
I prefer this method over the bin() method because using a format string gives a lot more flexibility.
I think simplest would be use numpy here. For example you can read a file as bytes and then expand it to bits easily like this:
Bytes = numpy.fromfile(filename, dtype = "uint8")
Bits = numpy.unpackbits(Bytes)
input_str = "ABC"
[bin(byte) for byte in bytes(input_str, "utf-8")]
Will give:
['0b1000001', '0b1000010', '0b1000011']
Here how to do it using format()
print "bin_signedDate : ", ''.join(format(x, '08b') for x in bytevector)
It is important the 08b . That means it will be a maximum of 8 leading zeros be appended to complete a byte. If you don't specify this then the format will just have a variable bit length for each converted byte.
To binary:
bin(byte)[2:].zfill(8)
Use ord when reading reading bytes:
byte_binary = bin(ord(f.read(1))) # Add [2:] to remove the "0b" prefix
Or
Using str.format():
'{:08b}'.format(ord(f.read(1)))
The other answers here provide the bits in big-endian order ('\x01' becomes '00000001')
In case you're interested in little-endian order of bits, which is useful in many cases, like common representations of bignums etc -
here's a snippet for that:
def bits_little_endian_from_bytes(s):
return ''.join(bin(ord(x))[2:].rjust(8,'0')[::-1] for x in s)
And for the other direction:
def bytes_from_bits_little_endian(s):
return ''.join(chr(int(s[i:i+8][::-1], 2)) for i in range(0, len(s), 8))
One line function to convert bytes (not string) to bit list. There is no endnians issue when source is from a byte reader/writer to another byte reader/writer, only if source and target are bit reader and bit writers.
def byte2bin(b):
return [int(X) for X in "".join(["{:0>8}".format(bin(X)[2:])for X in b])]
I came across this answer when looking for a way to convert an integer into a list of bit positions where the bitstring is equal to one. This becomes very similar to this question if you first convert your hex string to an integer like int('0x453', 16).
Now, given an integer - a representation already well-encoded in the hardware, I was very surprised to find out that the string variants of the above solutions using things like bin turn out to be faster than numpy based solutions for a single number, and I thought I'd quickly write up the results.
I wrote three variants of the function. First using numpy:
import math
import numpy as np
def bit_positions_numpy(val):
"""
Given an integer value, return the positions of the on bits.
"""
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
arr = np.frombuffer(bytestr, dtype=np.uint8, count=length)
bit_arr = np.unpackbits(arr, bitorder='big')
bit_positions = np.where(bit_arr[::-1])[0].tolist()
return bit_positions
Then using string logic:
def bit_positions_str(val):
is_negative = val < 0
if is_negative:
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
neg_position = (length * 8) - 1
# special logic for negatives to get twos compliment repr
max_val = 1 << neg_position
val_ = max_val + val
else:
val_ = val
binary_string = '{:b}'.format(val_)[::-1]
bit_positions = [pos for pos, char in enumerate(binary_string)
if char == '1']
if is_negative:
bit_positions.append(neg_position)
return bit_positions
And finally, I added a third method where I precomputed a lookuptable of the positions for a single byte and expanded that given larger itemsizes.
BYTE_TO_POSITIONS = []
pos_masks = [(s, (1 << s)) for s in range(0, 8)]
for i in range(0, 256):
positions = [pos for pos, mask in pos_masks if (mask & i)]
BYTE_TO_POSITIONS.append(positions)
def bit_positions_lut(val):
bit_length = val.bit_length() + 1
length = math.ceil(bit_length / 8.0) # bytelength
bytestr = val.to_bytes(length, byteorder='big', signed=True)
bit_positions = []
for offset, b in enumerate(bytestr[::-1]):
pos = BYTE_TO_POSITIONS[b]
if offset == 0:
bit_positions.extend(pos)
else:
pos_offset = (8 * offset)
bit_positions.extend([p + pos_offset for p in pos])
return bit_positions
The benchmark code is as follows:
def benchmark_bit_conversions():
# for val in [-0, -1, -3, -4, -9999]:
test_values = [
# -1, -2, -3, -4, -8, -32, -290, -9999,
# 0, 1, 2, 3, 4, 8, 32, 290, 9999,
4324, 1028, 1024, 3000, -100000,
999999999999,
-999999999999,
2 ** 32,
2 ** 64,
2 ** 128,
2 ** 128,
]
for val in test_values:
r1 = bit_positions_str(val)
r2 = bit_positions_numpy(val)
r3 = bit_positions_lut(val)
print(f'val={val}')
print(f'r1={r1}')
print(f'r2={r2}')
print(f'r3={r3}')
print('---')
assert r1 == r2
import xdev
xdev.profile_now(bit_positions_numpy)(val)
xdev.profile_now(bit_positions_str)(val)
xdev.profile_now(bit_positions_lut)(val)
import timerit
ti = timerit.Timerit(10000, bestof=10, verbose=2)
for timer in ti.reset('str'):
for val in test_values:
bit_positions_str(val)
for timer in ti.reset('numpy'):
for val in test_values:
bit_positions_numpy(val)
for timer in ti.reset('lut'):
for val in test_values:
bit_positions_lut(val)
for timer in ti.reset('raw_bin'):
for val in test_values:
bin(val)
for timer in ti.reset('raw_bytes'):
for val in test_values:
val.to_bytes(val.bit_length(), 'big', signed=True)
And it clearly shows the str and lookup table implementations are ahead of numpy. I tested this on CPython 3.10 and 3.11.
Timed str for: 10000 loops, best of 10
time per loop: best=20.488 µs, mean=21.438 ± 0.4 µs
Timed numpy for: 10000 loops, best of 10
time per loop: best=25.754 µs, mean=28.509 ± 5.2 µs
Timed lut for: 10000 loops, best of 10
time per loop: best=19.420 µs, mean=21.305 ± 3.8 µs

Convert Bytes to int 0x Python [duplicate]

I'm currently working on an encryption/decryption program and I need to be able to convert bytes to an integer. I know that:
bytes([3]) = b'\x03'
Yet I cannot find out how to do the inverse. What am I doing terribly wrong?
Assuming you're on at least 3.2, there's a built in for this:
int.from_bytes( bytes, byteorder, *, signed=False )
...
The argument bytes must either be a bytes-like object or an iterable
producing bytes.
The byteorder argument determines the byte order used to represent the
integer. If byteorder is "big", the most significant byte is at the
beginning of the byte array. If byteorder is "little", the most
significant byte is at the end of the byte array. To request the
native byte order of the host system, use sys.byteorder as the byte
order value.
The signed argument indicates whether two’s complement is used to
represent the integer.
## Examples:
int.from_bytes(b'\x00\x01', "big") # 1
int.from_bytes(b'\x00\x01', "little") # 256
int.from_bytes(b'\x00\x10', byteorder='little') # 4096
int.from_bytes(b'\xfc\x00', byteorder='big', signed=True) #-1024
Lists of bytes are subscriptable (at least in Python 3.6). This way you can retrieve the decimal value of each byte individually.
>>> intlist = [64, 4, 26, 163, 255]
>>> bytelist = bytes(intlist) # b'#\x04\x1a\xa3\xff'
>>> for b in bytelist:
... print(b) # 64 4 26 163 255
>>> [b for b in bytelist] # [64, 4, 26, 163, 255]
>>> bytelist[2] # 26
list() can be used to convert bytes to int (works in Python 3.7):
list(b'\x03\x04\x05')
[3, 4, 5]
int.from_bytes( bytes, byteorder, *, signed=False )
doesn't work with me
I used function from this website, it works well
https://coderwall.com/p/x6xtxq/convert-bytes-to-int-or-int-to-bytes-in-python
def bytes_to_int(bytes):
result = 0
for b in bytes:
result = result * 256 + int(b)
return result
def int_to_bytes(value, length):
result = []
for i in range(0, length):
result.append(value >> (i * 8) & 0xff)
result.reverse()
return result
In case of working with buffered data I found this useful:
int.from_bytes([buf[0],buf[1],buf[2],buf[3]], "big")
Assuming that all elements in buf are 8-bit long.
An old question that I stumbled upon while looking for an existing solution. Rolled my own and thought I'd share because it allows you to create a 32-bit integer from a list of bytes, specifying an offset.
def bytes_to_int(bList, offset):
r = 0
for i in range(4):
d = 32 - ((i + 1) * 8)
r += bList[offset + i] << d
return r
#convert bytes to int
def bytes_to_int(value):
return int.from_bytes(bytearray(value), 'little')
bytes_to_int(b'\xa231')

Translating Python 2 byte checksum calculator to Python 3

I'm trying to translate this Python 2 code to Python 3.
def calculate_checksum(packet):
total = 0
for char in packet:
total += struct.unpack('B', char)[0]
return (256 - (total % 256)) & 0xff
In Python 3 it causes a TypeError:
total += struct.unpack('B', char)[0]
TypeError: a bytes-like object is required, not 'int'
I have been trying to research the changes in strings and bytes but it is a bit overwhelming.
The code basically translates individual characters in a bytestring to their integer equivalent; the character \x42 becomes 0x42 (or decimal 66), for example:
>>> # Python 2
...
>>> struct.unpack('B', '\x42')[0]
66
As an aside, you could do the same more simply with the ord() function:
>>> ord('\x42')
66
In Python 3, you already get integers when you iterate over a bytes object, which is why you get your error:
>>> # Python 3
...
>>> b'\x42'[0]
66
The whole struct.unpack() call can simply be dropped:
for char in packet:
total += char
or simply use sum() to calculate the total in one step:
total = sum(packet)
making the complete version:
def calculate_checksum_ord(packet):
total = sum(ord(c) for c in packet)
return (256 - (total % 256)) & 0xff
Note that the Python 2 code could also use sum(), with the ord() function rather than use struct: total = sum(ord(c) for c in packed).

How to map characters to integer range [-128, 127] in Python?

I would like to convert a list of characters (represented on a single byte ie. the range [0, 255]) to be represented with integers in the range [-128,127]. I've read that Python's modulo operator (%) always return a number having the same sign as the denominator.
What is the right way to do this conversion in Python?
EDIT
Characters that map to [128,255] with ord should be remapped to [-128,-1], with 128 mapped to -128 and 255 mapped to -1. (For the inverse of the conversion I use chr(my_int%256), but my_int can be a negative number.)
I've found out that I could do this conversion with "unpacking from byte" with the struct module:
# gotcha|pitfall: my original idea, but this generates a list of 1-tuples:
# x = [struct.unpack("b",a) for a in charlist]
fmt = "%ib"%len(charlist) # eg. "5b", if charlist's length is 5
x = struct.unpack(fmt,charlist) # tuple of ints
Not sure if I understood the question... You want to do something like that?
[i - 255 if i > 127 else i for i in [ord(l) for l in "azertyuiopqsdféhjklm3{"]]
def to_ints(input):
return [o if o <= 128 else 255 - o for o in [ord(char) in input]]
def to_str(input):
return "".join([chr(i%256) for i in input])
out = to_ints("This is a test")
print to_str(out)
I wonder what do you mean by "a list of characters", are they numbers? If so, I think x % 256 - 128 or x % -256 + 128 should work.
This is an old question, but for future readers the straightforward solution for single integers is (x + 128)%256 - 128. Replace x with ord(c) if dealing with ASCII character data.
The result of the % operator there is congruent (mod 256) to x + 128; and subtracting 128 from that makes the final result congruent to x and shifts the range to [128,127].
This will work in a generator expression, and save a step in Python 3 where you'd need to convert a string to a bytes object.

Categories