Interpreting 5bit subsets within Packed Binary Data Python - python

I have been having some real trouble with this for a while. I am receiving a string of binary data in python and I am having trouble unpacking and interpreting only a 5bit subset (not an entire byte) of the data. It seems like whatever method comes to mind just simply fails miserably.
Let's say I have two bytes packed binary data, and I would like to interpret the first 10bits within the 16. How could I convert this to an 2 integers representing 5bits each?

Use bitmasks and bitshifting:
>>> example = 0x1234 # Hexadecimal example; 2 bytes, 4660 decimal.
>>> bin(example) # Show as binary digits
'0b1001000110100'
>>> example & 31 # Grab 5 most significant bits
20
>>> bin(example & 31) # Same, now represented as binary digits
'0b10100'
>>> (example >> 5) & 31 # Grab the next 5 bits (shift right 5 times first)
17
>>> bin(example >> 5 & 31)
'0b10001'
The trick here is to know that 31 is a 5-bit bitmask:
>>> bin(31)
'0b11111'
>>> 0b11111
31
>>> example & 0b11111
20
As you can see you could also just use the 0b binary number literal notation if you find that easier to work with.
See the Python Wiki on bit manipulation for more background info.

Related

Pack into c types and obtain the binary value back

I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40

Convert an Integer into 32bit Binary Python

I am trying to make a program that converts a given integer(limited by the value 32 bit int can hold) into 32 bit binary number. For example 1 should return (000..31times)1. I have been searching the documents and everything but haven't been able to find some concrete way. I got it working where number of bits are according to the number size but in String. Can anybody tell a more efficient way to go about this?
'{:032b}'.format(n) where n is an integer. If the binary representation is greater than 32 digits it will expand as necessary:
>>> '{:032b}'.format(100)
'00000000000000000000000001100100'
>>> '{:032b}'.format(8589934591)
'111111111111111111111111111111111'
>>> '{:032b}'.format(8589934591 + 1)
'1000000000000000000000000000000000' # N.B. this is 33 digits long
You can just left or right shift integer and convert it to string for display if you need.
>>> 1<<1
2
>>> "{:032b}".format(2)
'00000000000000000000000000000010'
>>>
or if you just need a binary you can consider bin
>>> bin(4)
'0b100'
Say for example the number you want to convert into 32 bit binary is 4. So, num=4.
Here is the code that does this: "s" is the empty string initially.
for i in range(31,-1,-1):
cur=(num>>i) & 1 #(right shift operation on num and i and bitwise AND with 1)
s+=str(cur)
print(s)#s contains 32 bit binary representation of 4(00000000000000000000000000000100)
00000000000000000000000000000100
Lets say
a = 4
print(bin(a)) # 0b101
For the output you may append 0s from LSB to till 101 to get the 32bit address for the integer - 4.
If you don't want 0b you may slice it
print(bin(a)[-3:]) # 101

Parsing out bit offsets from a hex number in Python

I have a 64-bit hex number inputting into my script
0x0000040800000000. I want to take this number and extract bits 39:32.
How is this possible? I have been parsing individual parts of a string and have ended up in a mess.
I was initially converting it into binary and parsing out sections of the string from
command_register = "".join(["{0:04b}".format(int(c,16)) for c in str(command_register)])
You simply need to first convert your hex string into an integer and then use normal maths to extract the bits.
Bit numbering is usually done from the least significant bit, i.e. the furthest right when displayed in binary is bit 0. So to extract bits 39:32 (8 consecutive bits), you would simply need a mask of 0xFF00000000. Simply AND your number and shift the result 32 bits to the right.
Using your hex value and extracting bits 39 to 32 would give you a value of 0x08. The following script shows you how:
hex_string = "0x0000040800000000"
number = int(hex_string, 16) # Convert to an integer
mask_39_to_32 = 0xFF00000000 # Suitable mask to extract the bits with
print(f"As hex: 0x{number:X}")
print()
print("Bits 39-32: xxxxxxxx")
print(f" As binary: {bin(number)[2:]:0>64s}")
print(f" Mask: {bin(mask_39_to_32)[2:]:0>64s}")
print(f"AND result: {bin(number & mask_39_to_32)[2:]:0>64s}")
print(f" Shifted: {bin((number & mask_39_to_32) >> 32)[2:]:0>64s}")
print(f" As an int: {(number & mask_39_to_32) >> 32}")
Which displays the following output:
As hex: 0x40800000000
Bits 39-32: xxxxxxxx
As binary: 0000000000000000000001000000100000000000000000000000000000000000
Mask: 0000000000000000000000001111111100000000000000000000000000000000
AND result: 0000000000000000000000000000100000000000000000000000000000000000
Shifted: 0000000000000000000000000000000000000000000000000000000000001000
As an int: 8
The mask needed for 47 to 40 would be:
Bits 47-40: xxxxxxxx
As binary: 0000000000000000111111110000000000000000000000000000000000000000
As hex: 0xFF0000000000
The use of hexadecimal simply makes it less verbose, and clearer once you get used to it. Groups of 8 bits for masks always end up as 'FF'.
The Wikipedia article on bitwise operations should help you to understand the process.

Representing a number using two bytes

What is the easiest way in Python to represent a number from 0 to 65535 using two bytes?
For example 300 in decimal is 0000000100101100 in binary and 012C in hexadecimal.
What I want to get as output when I get 300 is two bytes:
first is 00101100 (in binary representation)
second is 00000001 (in binary representation)
What is the easiest way to do it?
I'm sure there is something better than this, though:
from struct import pack, unpack
unpack('BB', pack('H',300))
# gives (44, 1), the two bytes you were asking for
See python docs to see what the available letter codes are, also be mindful of byte order.
You can get the low bits using & 255 (i.e. bitwise AND with 0b11111111):
>>> "{:08b}".format(300 & 255)
'00101100'
and the high bits by adding a bitwise shift:
>>> "{:08b}".format((300 >> 8) & 255)
'00000001'
For more information on the bitwise operators, see e.g. the Python wiki.
I think you're looking for struct.pack:
>>> import struct
>>> i = 300
>>> struct.pack("H",i)
',\x01'
where the , is its ascii value - 44.
As noted in this SO answer, you could do the following :
>>> my_hexdata = hex(300)
>>> scale = 16 ## equals to hexadecimal
>>> num_of_bits = 16
>>> mybin = bin(int(my_hexdata, scale))[2:].zfill(num_of_bits)
>>> mybin
'0000000100101100'
>>> mybin[:8]
'00000001'
>>> mybin[8:16]
'00101100'

How to shift bits in a 2-5 byte long bytes object in python?

I am trying to extract data out of a byte object. For example:
From b'\x93\x4c\x00' my integer hides from bit 8 to 21.
I tried to do bytes >> 3 but that isn't possible with more than one byte.
I also tried to solve this with struct but the byte object must have a specific length.
How can I shift the bits to the right?
Don't use bytes to represent integer values; if you need bits, convert to an int:
value = int.from_bytes(your_bytes_value, byteorder='big')
bits_21_to_8 = (value & 0x1fffff) >> 8
where the 0x1fffff mask could also be calculated with:
mask = 2 ** 21 - 1
Demo:
>>> your_bytes_value = b'\x93\x4c\x00'
>>> value = int.from_bytes(your_bytes_value, byteorder='big')
>>> (value & 0x1fffff) >> 8
4940
You can then move back to bytes with the int.to_bytes() method:
>>> ((value & 0x1fffff) >> 8).to_bytes(2, byteorder='big')
b'\x13L'
As you have a bytes string and you want to strip the right-most eight bits (i.e. one byte), you can simply it from the bytes string:
>>> b'\x93\x4c\x00'[:-1]
b'\x93L'
If you want to convert that then to an integer, you can use Python’s struct to unpack it. As you correctly said, you need a fixed size to use structs, so you can just pad the bytes string to add as many zeros as you need:
>>> data = b'\x93\x4c\x00'
>>> data[:-1]
b'\x93L'
>>> data[:-1].rjust(4, b'\x00')
b'\x00\x00\x93L'
>>> struct.unpack('>L', data[:-1].rjust(4, b'\x00'))[0]
37708
Of course, you can also convert it first, and then shift off the 8 bits from the resulting integer:
>>> struct.unpack('>Q', data.rjust(8, b'\x00'))[0] >> 8
37708
If you want to make sure that you don’t actually interpret more than those 13 bits (bits 8 to 21), you have to apply the bit mask 0x1FFF of course:
>>> 37708 & 0x1FFF
4940
(If you need big-endianness instead, just use <L or <Q respectively.)
If you are really counting the bits from left to right (which would be unusual but okay), then you can use that padding technique too:
>>> struct.unpack('>Q', data.ljust(8, b'\x00'))[0] >> 43
1206656
Note that we’re adding the padding to the other side, and are shifting it by 43 bits (your 3 bits plus 5 bytes for the padded data we won’t need to look at)
Another approach that works for arbitrarily long byte sequences is to use the bitstring library which allows for bitwise operations on bitstrings e.g.
>>> import bitstring
>>> bitstring.BitArray(bytes=b'\x93\x4c\x00') >> 3
BitArray('0x126980')
You could convert your bytes to an integer then multiply or divide by powers of two to accomplish the shifting

Categories