Python: Converting HEX string to bytes - python

I'm trying to make byte frame which I will send via UDP. I have class Frame which has attributes sync, frameSize, data, checksum etc. I'm using hex strings for value representation. Like this:
testFrame = Frame("AA01","0034","44853600","D43F")
Now, I need to concatenate this hex values together and convert them to byte array like this?!
def convertToBits(self):
stringMessage = self.sync + self.frameSize + self.data + self.chk
return b16decode(self.stringMessage)
But when I print converted value I don't get the same values or I don't know to read python notation correctly:
This is sync: AA01
This is frame size: 0034
This is data:44853600
This is checksum: D43F
b'\xaa\x01\x004D\x856\x00\xd4?'
So, first word is converted ok (AA01 -> \xaa\x01) but (0034 -> \x004D) it's not the same. I tried to use bytearray.fromhex because I can use spaces between bytes but I got same result. Can you help me to send same hex words via UDP?

Python displays any byte that can represent a printable ASCII character as that character. 4 is the same as \x34, but as it opted to print the ASCII character in the representation.
So \x004 is really the same as \x00\x34, D\x856\x00 is the same as \x44\x85\x36\x00, and \xd4? is the same as \xd4\x3f, because:
>>> b'\x34'
'4'
>>> b'\x44'
'D'
>>> b'\x36'
'6'
>>> b'\x3f'
'?'
This is just the representation of the bytes value; the value is entirely correct and you don't need to do anything else.
If it helps, you can visualise the bytes values as hex again using binascii.hexlify():
>>> import binascii
>>> binascii.hexlify(b'\xaa\x01\x004D\x856\x00\xd4?')
b'aa01003444853600d43f'
and you'll see that 4, D, 6 and ? are once again represented by the correct hexadecimal characters.

Related

Convert hex to decimal/string in python

So I wrote this small socket program to send a udp packet and receive the response
sock.sendto(data, (MCAST_GRP, MCAST_PORT))
msgFromServer = sock.recvfrom(1024)
banner=msgFromServer[0]
print(msgFromServer[0])
#name = msgFromServer[0].decode('ascii', 'ignore')
#print(name)
Response is
b'\xff\xff\xff\xffI\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00\xda\x02\x00\x10\x00dl\x01\x011.38.2.2\x00\xa1\x87iempty,secure\x00\xda\x02\x00\x00\x00\x00\x00\x00'
Now the thing is I wanted to convert all hex value to decimal,
I tried the decode; but then I endup loosing all the hex values.
How can I convert all the hex values to decimal in my case
example: \x13 = 19
EDIT: I guess better way to iterate my question is
How do I convert only the hex values to decimal in the given response
There are two problems here:
handling the non-ASCII bytes
handling \xhh sequences which are legitimate characters in Python strings
We can address both with a mix of regular expressions and string methods.
First, decode the bytes to ASCII using the backslashreplace error handler to avoid losing the non-ASCII bytes.
>>> import re
>>>
>>> decoded = msgFromServer[0].decode('ascii', errors='backslashreplace')
>>> decoded
'\\xff\\xff\\xff\\xffI\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00\\xda\x02\x00\x10\x00dl\x01\x011.38.2.2\x00\\xa1\\x87iempty,secure\x00\\xda\x02\x00\x00\x00\x00\x00\x00'
Next, use a regular expression to replace the non-ASCII '\\xhh' sequences with their numeric equivalents:
>>> temp = re.sub(r'\\x([a-fA-F0-9]{2})', lambda m: str(int(m.group(1), 16)), decoded)
>>> temp
'255255255255I\x11server banner\x00map\x00game\x00Counter-Strike: Global Offensive\x00218\x02\x00\x10\x00dl\x01\x011.38.2.2\x00161135iempty,secure\x00218\x02\x00\x00\x00\x00\x00\x00'
Finally, map \xhh escape sequences to their decimal values using str.translate:
>>> tt = str.maketrans({x: str(x) for x in range(32)})
>>> final = temp.translate(tt)
>>> final
'255255255255I17server banner0map0game0Counter-Strike: Global Offensive021820160dl111.38.2.20161135iempty,secure02182000000'
You can first convert the bytes representation to hex using the bytes.hex method and then cast it into an integer with the appropriate base with int(x, base)
>>> b'\x13'.hex()
'13'
>>> int(b'\x13'.hex(), 16)
19
Assume v contains the response, what you are asking for is
[int(i) for i in v]
I suspect it's not what you want, it is what I read from the question

How to replace 'b' with 0 or 1 in the binary representation of a string

import binascii
a = []
a = input('enter the messge')
def str2bin(message):
binary = bin(int(binascii.hexlify(message.encode("ascii")), 16))
return binary[1:]
print(str2bin(a))
Input string : hai
Output : b11010000110000101101001
How to remove or replace the 'b' from the output and replace it with another binary digit ?
Python strings cannot be changed after they have been created, they are immutable. You will have to create a new string, combining the digit and a substring of the original string, like this:
data = str2bin(a)
data0 = "0" + data[1:]
data1 = "1" + data[1:]
[1:] is a slice. In this case, it makes a copy of the string with the first character (at index 0) removed.
The bin function isn't suitable for this task. Not only does it give you that unwanted 'b', it also removes leading zeros, so the encoded bit strings vary in length, making them difficult to decode correctly. Instead, you can use the format function or method, and specify the bit length so no leading zeros are lost.
In Python 3, binascii.hexlify isn't required, we can get the necessary integers directly from the bytes object. The code below ensures that the bit string for each byte has exactly 8 bits, padding with zeros on the left when necessary. It uses the default UTF-8 encoding, but you can change that to 'ascii' if you want. Both encodings give the same result if the input string is pure ASCII, but 'utf8' handles any Unicode. Of course, for characters outside the ASCII range a single character will be encoded as 2 or more bytes.
s = 'hai'
bits = ''.join([format(u, '08b') for u in s.encode()])
print(bits)
output
011010000110000101101001
If you have Python 3.6+, you can do this using the more compact (and faster) f-string syntax:
bits = ''.join([f'{u:08b}' for u in s.encode()])

Converting Byte to String and Back Properly in Python3?

Given a random byte (i.e. not only numbers/characters!), I need to convert it to a string and then back to the inital byte without loosing information. This seems like a basic task, but I ran in to the following problems:
Assuming:
rnd_bytes = b'w\x12\x96\xb8'
len(rnd_bytes)
prints: 4
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
my_str = rnd_bytes.decode('utf-8' , 'backslashreplace')
Now, I have the string.
I want to convert it back to exactly the original byte (size 4!):
According to python ressources and this answer, there are different possibilities:
conv_bytes = bytes(my_str, 'utf-8')
conv_bytes = my_str.encode('utf-8')
But len(conv_bytes) returns 10.
I tried to analyse the outcome:
>>> repr(rnd_bytes)
"b'w\\x12\\x96\\xb8'"
>>> repr(my_str)
"'w\\x12\\\\x96\\\\xb8'"
>>> repr(conv_bytes)
"b'w\\x12\\\\x96\\\\xb8'"
It would make sense to replace '\\\\'. my_str.replace('\\\\','\\') doesn't change anything. Probably, because four backslashes represent only two. So, my_str.replace('\\','\') would find the '\\\\', but leads to
SyntaxError: EOL while scanning string literal
due to the last argument '\'. This had been discussed here, where the following suggestion came up:
>>> my_str2=my_str.encode('utf_8').decode('unicode_escape')
>>> repr(my_str2)
"'w\\x12\\x96¸'"
This replaces the '\\\\' but seems to add / change some other characters:
>>> conv_bytes2 = my_str2.encode('utf8')
>>> len(conv_bytes2)
6
>>> repr(conv_bytes2)
"b'w\\x12\\xc2\\x96\\xc2\\xb8'"
There must be a prober way to convert a (complex) byte to a string and back. How can I achieve that?
Note: Some codes found on the Internet.
You could try to convert it to hex format. Then it is easy to convert it back to byte format.
Sample code to convert bytes to string:
hex_str = rnd_bytes.hex()
Here is how 'hex_str' looks like:
'771296b8'
And code for converting it back to bytes:
new_rnd_bytes = bytes.fromhex(hex_str)
The result is:
b'w\x12\x96\xb8'
For processing you can use:
readable_str = ''.join(chr(int(hex_str[i:i+2], 16)) for i in range(0, len(hex_str), 2))
But newer try to encode readable string, here is how readable string looks like:
'w\x12\x96¸'
After processing readable string convert it back to hex format before converting it back to bytes string like:
hex_str = ''.join([str(hex(ord(i)))[2:4] for i in readable_str])
Now, converting it to a string. Note: I need to set backslashreplace as it otherwise returns a 'UnicodeDecodeError' or would loose information setting it to another flag value.
The UTF-8 encoding cannot interpret every possible sequence of bytes as a string. Using backslashreplace gives you a string that preserves the information for bytes that couldn't be converted:
>>> rnd_bytes = b'w\x12\x96\xb8'
>>> rnd_bytes.decode('utf-8', 'backslashreplace')
'w\x12\\x96\\xb8'
but that representation is not very useful for converting back.
Instead, use an encoding that does interpret every possible sequence of bytes as a string. The most straightforward of these is ISO-8859-1, which simply maps each byte one at a time to the first 256 Unicode code points respectively.
>>> rnd_bytes.decode('iso-8859-1')
'w\x12\x96¸'
>>> rnd_bytes.decode('iso-8859-1').encode('iso-8859-1') == rnd_bytes
True

encoding unicode using UTF-8

In Python, if I type
euro = u'\u20AC'
euroUTF8 = euro.encode('utf-8')
print(euroUTF8, type(euroUTF8), len(euroUTF8))
the output is
('\xe2\x82\xac', <type 'str'>, 3)
I have two questions:
1. it looks like euroUTF8 is encoded over 3 bytes, but how do I get its binary representation to see how many bits it contain?
2. what does 'x' in '\xe2\x82\xac' mean? I don't think 'x' is a hex number. And why there are three '\'?
In Python 2, print is a statement, not a function. You are printing a tuple here. Print the individual elements by removing the (..):
>>> euro = u'\u20AC'
>>> euroUTF8 = euro.encode('utf-8')
>>> print euroUTF8, type(euroUTF8), len(euroUTF8)
€ <type 'str'> 3
Now you get the 3 individual objects written as strings to stdout; my terminal just happens to be configured to interpret anything written to it as UTF-8, so the bytes correctly result in the € Euro symbol being displayed.
The \x<hh> sequences are Python string literal escape sequences (see the reference documentation); they are the default output for the repr() applied to a string with non-ASCII, non-printable bytes in them. You'll see the same thing when echoing the value in an interactive interpreter:
>>> euroUTF8
'\xe2\x82\xac'
>>> euroUTF8[0]
'\xe2'
>>> euroUTF8[1]
'\x82'
>>> euroUTF8[2]
'\xac'
They provide you with ASCII-safe debugging output. The contents of all Python standard library containers use this format; including lists, tuples and dictionaries.
If you want to format to see the bits that make up these values, convert each byte to an integer by using the ord() function, then format the integer as binary:
>>> ' '.join([format(ord(b), '08b') for b in euroUTF8])
'11100010 10000010 10101100'
Each letter in each encoding are represented using different number of bits. UTF-8 is a 8 bit encoding, so there is no need to get a binary representation to know each bit count of each character. (If you still want to present bits, refer to Martijn's answer.)
\x means that the following value is a byte. So x is not something like a hex number that you should convert or read. It identifies the following value, which is you are interested in. \'s are used to escape that x's because they are not a part of the value.

python string to hex with escaped hex values

I have a string like "Some characters \x00\x80\x34 and then some other characters". How can I convert the regular characters to their hex equivalent, while converting \x00 to the actual 00 hex value?
binascii.hexlify() considers '\', 'x', '0', '0' as actual characters.
Later edit:
The string itself is produced by another function. When I print it, it actually prints "\x00".
As my understanding you are trying to convert only the characters that are not hex values to hex. It would help if you gave a sample input string that you are trying to convert to hex.
Also you can convert to hex values using just the built in encoding and decoding method. That should take care of what you are trying to do. The following three lines are what I ran in terminal of my machine, and gave the output you are expecting. I also attached an image to show you. Hope it helps:
aStr = "Some characters \x00\x80\x34 and then some other characters"
aStr.encode("hex")
aStr.encode("hex").decode("hex")
It's unclear what you're asking, since binascii.hexlify should work:
>>> import binascii
>>> s = "\x00\x80\x34"
>>> binascii.hexlify(s)
'008034'
>>> s = "foobar \x00\x80\x34 foobar"
>>> binascii.hexlify(s)
'666f6f6261722000803420666f6f626172'
foorbar = 666f6f6261722, space = 20
↳ https://docs.python.org/3/library/binascii.html

Categories