The answers to this question make it seem like there are two ways to convert an integer to a bytes object in Python 3. They show
s = str(n).encode()
and
n = 5
bytes( [n] )
Being the same. However, testing that shows the values returned are different:
print(str(8).encode())
#Prints b'8'
but
print(bytes([8])) #prints b'\x08'
I know that the first method changes the int 8 into a string (utf-8 I believe) which has the hex value of 56, but what does the second one print? Is that just the hex value of 8? (a utf-8 value of backspace?)
Similarly, are both of these one byte in size? It seems like the second one has two characters == two bytes but I could be wrong there...
b'8' is a bytes object which contains a single byte with value of the character '8' which is equal to 56.
b'\x08' is a bytes object which contains a single byte with value 8, which is the same as 0x8.
Those two examples are not equivalent. str(n).encode() takes whatever you give it, turns it into its string representation, and then encodes using a character codec like utf8. bytes([..]) will form a bytestring with the byte values of the array given. The representation \xFF is in fact the hexadecimal representation of a single byte value.
>>> str(8).encode()
b'8'
>>> b'8' == b'\x38'
True
Related
I'm using the following code to pack an integer into an unsigned short as follows,
raw_data = 40
# Pack into little endian
data_packed = struct.pack('<H', raw_data)
Now I'm trying to unpack the result as follows. I use utf-16-le since the data is encoded as little-endian.
def get_bin_str(data):
bin_asc = binascii.hexlify(data)
result = bin(int(bin_asc.decode("utf-16-le"), 16))
trimmed_res = result[2:]
return trimmed_res
print(get_bin_str(data_packed))
Unfortunately, it throws the following error,
result = bin(int(bin_asc.decode("utf-16-le"), 16)) ValueError: invalid
literal for int() with base 16: '㠲〰'
How do I properly decode the bytes in little-endian to binary data properly?
Use unpack to reverse what you packed. The data isn't UTF-encoded so there is no reason to use UTF encodings.
>>> import struct
>>> data_packed = struct.pack('<H', 40)
>>> data_packed.hex() # the two little-endian bytes are 0x28 (40) and 0x00 (0)
2800
>>> data = struct.unpack('<H',data_packed)
>>> data
(40,)
unpack returns a tuple, so index it to get the single value
>>> data = struct.unpack('<H',data_packed)[0]
>>> data
40
To print in binary use string formatting. Either of these work work best. bin() doesn't let you specify the number of binary digits to display and the 0b needs to be removed if not desired.
>>> format(data,'016b')
'0000000000101000'
>>> f'{data:016b}'
'0000000000101000'
You have not said what you are trying to do, so let's assume your goal is to educate yourself. (If you are trying to pack data that will be passed to another program, the only reliable test is to check if the program reads your output correctly.)
Python does not have an "unsigned short" type, so the output of struct.pack() is a byte array. To see what's in it, just print it:
>>> data_packed = struct.pack('<H', 40)
>>> print(data_packed)
b'(\x00'
What's that? Well, the character (, which is decimal 40 in the ascii table, followed by a null byte. If you had used a number that does not map to a printable ascii character, you'd see something less surprising:
>>> struct.pack("<H", 11)
b'\x0b\x00'
Where 0b is 11 in hex, of course. Wait, I specified "little-endian", so why is my number on the left? The answer is, it's not. Python prints the byte string left to right because that's how English is written, but that's irrelevant. If it helps, think of strings as growing upwards: From low memory locations to high memory. The least significant byte comes first, which makes this little-endian.
Anyway, you can also look at the bytes directly:
>>> print(data_packed[0])
40
Yup, it's still there. But what about the bits, you say? For this, use bin() on each of the bytes separately:
>>> bin(data_packed[0])
'0b101000'
>>> bin(data_packed[1])
'0b0'
The two high bits you see are worth 32 and 8. Your number was less than 256, so it fits entirely in the low byte of the short you constructed.
What's wrong with your unpacking code?
Just for fun let's see what your sequence of transformations in get_bin_str was doing.
>>> binascii.hexlify(data_packed)
b'2800'
Um, all right. Not sure why you converted to hex digits, but now you have 4 bytes, not two. (28 is the number 40 written in hex, the 00 is for the null byte.) In the next step, you call decode and tell it that these 4 bytes are actually UTF-16; there's just enough for two unicode characters, let's take a look:
>>> b'2800'.decode("utf-16-le")
'㠲〰'
In the next step Python finally notices that something is wrong, but by then it does not make much difference because you are pretty far away from the number 40 you started with.
To correctly read your data as a UTF-16 character, call decode directly on the byte string you packed.
>>> data_packed.decode("utf-16-le")
'('
>>> ord('(')
40
Trying to a convert a binary list into a signed 16bit little endian integer
input_data = [['1100110111111011','1101111011111111','0010101000000011'],['1100111111111011','1101100111111111','0010110100000011']]
Desired Output =[[-1074, -34, 810],[-1703, -39, 813]]
This is what I've got so far. It's been adapted from: Hex string to signed int in Python 3.2?,
Conversion from HEX to SIGNED DEC in python
results = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
convert = [int(y[4:6] + y[2:4], 16) for y in hex_convert]
results.append(convert)
print (results)
output: [[64461, 65502, 810], [64463, 65497, 813]]
This is works fine, but the above are unsigned integers. I need signed integers capable of handling negative values. I then tried a different approach:
results_2 = []
for i in input_data:
hex_convert = [hex(int(x,2)) for x in i]
to_bytes = [bytes(j, 'utf-8') for j in hex_convert]
split_bits = [int(k, 16) for k in to_bytes]
convert_2 = [int.from_bytes(b, byteorder = 'little', signed = True) for b in to_bytes]
results_2.append(convert_2)
print (results_2)
Output: [[108191910426672, 112589973780528, 56282882144304], [108191943981104, 112589235583024, 56282932475952]]
This result is even more wild than the first. I know my approach is wrong, and it doesn't help that i've never been able to get my head around binary conversion etc, but I feel i'm on the right path with:
(b, byteorder = 'little', signed = True)
but can't work out where i'm wrong. Any help explaining this concept would be greatly appreciated.
This result is even more wild than the first. I know my approach is wrong... but can't work out where i'm wrong.
The problem is in the conversion to bytes. Let's look at it a step at a time:
int(x, 2)
Fine; we treat the string as a base-2 representation of the integer value, and get that integer. Only problem is it's a) unsigned and b) big-endian.
hex(int(x,2))
What this does is create a string representation of the integer, in base 16, with a 0x prefix. Notably, there are two text characters per byte that we want. This is already heading is down the wrong path.
You might have thought of using hexadecimal because you've seen \xAB style escapes inside string representations. This is a completely different thing. The string '\xAB' contains one character. The string '0xAB' contains four.
From there, everything else is still nonsense. Converting to bytes with a text encoding just means that the text character 0 for example is replaced with the byte value 48 (since in UTF-8 it's encoded with a single byte with that value). For this data we get the same results with UTF-8 that we would by assuming plain ASCII (since UTF-8 is "ASCII transparent" and there are no non-ASCII characters in the text).
So how do we do it?
We want to convert the integer from the first step into the bytes used to represent it. Just as there is a .from_bytes class method allowing us to create an integer from underlying bytes, there is an instance method allowing us to get the bytes that would represent the integer.
So, we use .to_bytes, specifying the length, signedness and endianness that was assumed when we created the int from the binary string - that gives us bytes that correspond to that string. Then, we re-create the integer from those bytes, except now specifying the proper signedness and endianness. The reason that .to_bytes makes us specify a length is because the integer doesn't have a particular length - there are a minimum number of bytes required to represent it, but you could use as many more as you like. (This is especially important if you want to handle signed values, since it will do sign-extension automatically.)
Thus:
for i in input_data:
values = [int(x,2) for x in i]
as_bytes = [x.to_bytes(2, byteorder='big', signed=False) for x in values]
reinterpreted = [int.from_bytes(x, byteorder='little', signed=True) for x in as_bytes]
results_2.append(reinterpreted)
But let's improve the organization of the code a bit. I will first make a function to handle a single integer value, and then we can use comprehensions to process the list. In fact, we can use nested comprehensions for the nested list.
def as_signed_little(binary_str):
# This time, taking advantage of positional args and default values.
as_bytes = int(binary_str, 2).to_bytes(2, 'big')
return int.from_bytes(as_bytes, 'little', signed=True)
# And now we can do:
results_2 = [[as_signed_little(x) for x in i] for i in input_data]
I have a hex string, but i need to convert it to actual hex.
For example, i have this hex string:
3f4800003f480000
One way I could achieve my goal is by using escape sequences:
print("\x3f\x48\x00\x00\x3f\x48\x00\x00")
However, I can't do it this way, because I want create together my hex from multiple variables.
My program's purpose is to:
take in a number for instance 100
multiply it by 100: 100 * 100 = 10000
convert it to hex 2710
add 0000
add 2710 again
add 0000 once more
Result I'm expecting is 2710000027100000. Now I need to pass this hexadecimal number as argument to a function (as hexadecimal).
In Python, there is no separate type as 'hex'. It represents the hexadecimal notation of the number as str. You may check the type yourself by calling it on hex() as:
# v convert integer to hex
>>> type(hex(123))
<type 'str'>
But in order to represent the value as a hexadecimal, Python prepends the 0x to the string which represents hexadecimal number. For example:
>>> hex(123)
'0x7b'
So, in your case in order to display your string as a hexadecimal equivalent, all you need is to prepend it with "0x" as:
>>> my_hex = "0x" + "3f4800003f480000"
This way if you probably want to later convert it into some other notation, let's say integer (which based on the nature of your problem statement, you'll definitely need), all you need to do is call int with base 16 as:
>>> int("0x3f4800003f480000", base=16)
4559894623774310400
In fact Python's interpreter is smart enough. If you won't even prepend "0x", it will take care of it. For example:
>>> int("3f4800003f480000", base=16)
4559894623774310400
"0x" is all about representing the string is hexadecimal string in case someone is looking/debugging your code (in future), they'll get the idea. That's why it is preferred.
So my suggestion is to stick with Python's Hex styling, and don't convert it with escape characters as "\x3f\x48\x00\x00\x3f\x48\x00\x00"
From the Python's hex document :
Convert an integer number to a lowercase hexadecimal string prefixed with “0x”. If x is not a Python int object, it has to define an index() method that returns an integer.
try binascii.unhexlify:
Return the binary data represented by the hexadecimal string hexstr.
example:
assert binascii.unhexlify('3f4800003f480000') == b"\x3f\x48\x00\x00\x3f\x48\x00\x00"
>>> hex(int('3f4800003f480000', 16))
'0x3f4800003f480000'
I dont know why Hex function returns a string like '0x41' instead 0x41
I need to convert an ASCII value into a hex. But i want in 0x INT format, not into a '0x' string.
ascii = 360
hexstring = hex(ascii)
hexstring += 0x41 # i cant do this because hexstring is a string not a int hex
How i can get a int hex??
thanks
There is no int hex object. There is only an alternative syntax to create integers:
>>> 0x41
65
You could have used 0o1010 too, to get the same value. Or use 0b1000001 to specify it in binary; they are all the exact same numeric value to Python; they are all just different forms to specify an integer value in your code.
Simply keep ascii as an integer and sum your hex notation values with that:
>>> ascii = 360
>>> ascii += 0x41
>>> ascii
425
hex() produces a string that can be interpreted by a Python program in the same manner, and is usually used when debugging code or quick presentation output (but you should use format(number, 'x') if you want to produce end-user output without the 0x prefix). It is not needed to work with integers.
I'm converting int to bytes using this command in python:
a = 5
b = bytes(a, 'utf-8')
but when I print b I get this value:
b'\x00\x00\x00\x00\x00'
what is wrong with this piece of code?
The bytes() function documentation points to the bytearray() documentation, which states:
The optional source parameter can be used to initialize the array in a few different ways:
[....]
If it is an integer, the array will have that size and will be initialized with null bytes.
You asked for a bytes() object of size 5, initialised to null bytes.
You probably want to turn a into a string first:
bytes(str(a), 'utf-8')
Demo:
>>> a = 5
>>> bytes(str(a), 'utf-8')
b'5'
If you wanted to have the byte value 5 (so the ENQ ASCII control code or whatever else you might want it to mean) you'll need to put it a in a list:
bytes([a])
(no need to provide an encoding then):
>>> bytes([a])
b'\x05'
You are creating a byte array of length 5.
To get the binary represenation of the number 5 you can use bin()
bin(5)
0b101