Convert bytes data inside a string to a true bytes object [duplicate] - python

This question already has answers here:
What is the difference between UTF-8 and ISO-8859-1? [closed]
(8 answers)
Converting Byte to String and Back Properly in Python3?
(2 answers)
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
In Python 3, I have a string like the following:
mystr = "\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
This string was read from a file and it is the bytes representation of some text. To be clear, this is a unicode string, not a bytes object.
I need to transform mystr into a bytes object like the following:
mybytes = b"\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
Notice that the translation should be literal. I don't want to encode the string.
Running .encode('utf-8') will escape the \.
It I manually copy and past the content into a bytes string, then everything works. What I couldn't find anywhere is how could I convert it without copy+paste.

mystr.encode("latin-1") is what you want.

Related

How Python decodes UTF8 Encoding in String Format [duplicate]

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?
Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
you can do decoded_string = s.decode("utf8")

Changing string to ascii in python [duplicate]

This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)

Decode part literal bytes in str [duplicate]

This question already has answers here:
Converting double slash utf-8 encoding
(6 answers)
Closed 4 years ago.
I have a str like "abc\\xe2\\x86\\x92", what I want is "abc→". (b"abc\xe2\x86\x92".encode() equals →)
The problem is, \\xe2\\x86\\x92 is in str, and \\ is just one character so there is no way to replace it into \.
I found a way to solve this:
In [107]: original.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
Out[107]: 'abc→'

convert hexadecimal sting to byte array [duplicate]

This question already has answers here:
How to convert hexadecimal string to bytes in Python?
(7 answers)
Closed 5 years ago.
I want to convert a hexadecimal string like 1030 to a byte array like b'\x10\x30'
I know we can use bytearray.fromhex("1030") or "1030".decode("hex"). However, I get output '\x100'.
What am I missing here?
bytearray(b'\x100') is correct, you just interpret it wrong way. It is character \x10 followed by character 0 (which happens to be ASCII for \x30).
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.

Conversion of strings like \\uXXXX in python [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I receive a string like this from a third-party service:
>>> s
'\\u0e4f\\u032f\\u0361\\u0e4f'
I know that this string actually contains sequences of a single backslash, lowercase u etc. How can I convert the string such that the '\\u0e4f' is replaced by '\u0e4f' (i.e. '๏'), etc.? The result for this example input should be '๏̯͡๏'.
In 2.x:
>>> u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
u'\u0e4f\u032f\u0361\u0e4f'
>>> print u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
๏̯͡๏
There's an interesting list of encodings supported by .encode() and .decode() methods. Those magic ones in the second table include the unicode_escape.
Python3:
bytes("\\u0e4f\\u032f\\u0361\\u0e4f", "ascii").decode("unicode-escape")

Categories