This question already has answers here:
How to convert hexadecimal string to bytes in Python?
(7 answers)
Closed 5 years ago.
I want to convert a hexadecimal string like 1030 to a byte array like b'\x10\x30'
I know we can use bytearray.fromhex("1030") or "1030".decode("hex"). However, I get output '\x100'.
What am I missing here?
bytearray(b'\x100') is correct, you just interpret it wrong way. It is character \x10 followed by character 0 (which happens to be ASCII for \x30).
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.
Related
This question already has answers here:
Normalizing Unicode
(2 answers)
Closed 5 months ago.
s1='ফটিকছড়ি' #escape-unicode= %u09AB%u099F%u09BF%u0995%u099B%u09A1%u09BC%u09BF
s2='ফটিকছড়ি' #escape-unicode= %u09AB%u099F%u09BF%u0995%u099B%u09DC%u09BF
They are looking the same but are different. How can I consider them as the same string?
In Unicode, the character U+09DC is canonically equivalent to the sequence U+09A1 U+09BC. When you compare Unicode strings, you should always use Unicode normalization to fold together canonically equivalent sequences. So, convert both strings to Unicode normalization form C or Unicode normalization form D before comparing.
See UAX #15 Unicode Normalization Forms for details on Unicode normalization.
See this answer for how to normalize Unicode strings in Python.
This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?
Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
you can do decoded_string = s.decode("utf8")
This question already has an answer here:
Python: Converting HEX string to bytes
(1 answer)
Closed 2 years ago.
I am using python 3 and try to convert a hex-string to a byte-represented form. So i used the following command:
bytes.fromhex('97ad300414b64c')
I Expected results like this: b'\x97\xad\x30\x04\x14\xb6\x4c'' but got b'\x97\xad0\x04\x14\xb6L'. I am note sure what i am doing wrong, but maybe it is something with the encoding?
As pointed by #user8651755 in the comments, this is due to the fact that some bytes correspond to printable characters. So the answer is: you are doing everything right.
This question already has answers here:
What is the difference between UTF-8 and ISO-8859-1? [closed]
(8 answers)
Converting Byte to String and Back Properly in Python3?
(2 answers)
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
In Python 3, I have a string like the following:
mystr = "\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
This string was read from a file and it is the bytes representation of some text. To be clear, this is a unicode string, not a bytes object.
I need to transform mystr into a bytes object like the following:
mybytes = b"\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
Notice that the translation should be literal. I don't want to encode the string.
Running .encode('utf-8') will escape the \.
It I manually copy and past the content into a bytes string, then everything works. What I couldn't find anywhere is how could I convert it without copy+paste.
mystr.encode("latin-1") is what you want.
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I receive a string like this from a third-party service:
>>> s
'\\u0e4f\\u032f\\u0361\\u0e4f'
I know that this string actually contains sequences of a single backslash, lowercase u etc. How can I convert the string such that the '\\u0e4f' is replaced by '\u0e4f' (i.e. '๏'), etc.? The result for this example input should be '๏̯͡๏'.
In 2.x:
>>> u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
u'\u0e4f\u032f\u0361\u0e4f'
>>> print u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
๏̯͡๏
There's an interesting list of encodings supported by .encode() and .decode() methods. Those magic ones in the second table include the unicode_escape.
Python3:
bytes("\\u0e4f\\u032f\\u0361\\u0e4f", "ascii").decode("unicode-escape")