How Python decodes UTF8 Encoding in String Format [duplicate] - python

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?

Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')

Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')

you can do decoded_string = s.decode("utf8")

Related

simple way of converting ASCII into HEX with no change in value [duplicate]

This question already has answers here:
How to convert hexadecimal string to bytes in Python?
(7 answers)
Closed 3 years ago.
What is the easiest way of converting a ASCII-String into hex without changing the value?
what i have:
string = "00FA0086"
what i want:
hex_string = b'\x00\xFA\x00\x86"
Is there a simple way or do i need to write a function?
You are looking for the binascii module from the Standard Python Library:
import binascii
string = "00FA0086"
print(repr(binascii.a2b_hex(string)))
gives:
b'\x00\xfa\x00\x86'

Changing string to ascii in python [duplicate]

This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)

Convert bytes data inside a string to a true bytes object [duplicate]

This question already has answers here:
What is the difference between UTF-8 and ISO-8859-1? [closed]
(8 answers)
Converting Byte to String and Back Properly in Python3?
(2 answers)
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
In Python 3, I have a string like the following:
mystr = "\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
This string was read from a file and it is the bytes representation of some text. To be clear, this is a unicode string, not a bytes object.
I need to transform mystr into a bytes object like the following:
mybytes = b"\x00\x00\x01\x01\x80\x02\xc0\x02\x00"
Notice that the translation should be literal. I don't want to encode the string.
Running .encode('utf-8') will escape the \.
It I manually copy and past the content into a bytes string, then everything works. What I couldn't find anywhere is how could I convert it without copy+paste.
mystr.encode("latin-1") is what you want.

Python read file - pass variable name [duplicate]

This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 7 years ago.
Very simple question for Python 2:
I am calling specific library/function passing filename with readonly flag:
myfunction(r'/tmp/file.txt')
I wanted to replace it with variable:
filename = '/tmp/file.txt'
myfunction(r????)
How can I call that function?
That is not readonly flag. That means raw string. You use it when you don't want escape sequences inside string to be interpreted (like \n, \t etc.) See https://docs.python.org/2.0/ref/strings.html
For your string, you don't need it since it does not contain any escape sequence. Just omit the leading r.
filename = r'/tmp/file.txt'
myfunction(filename)
in most cases you can use
myfunction(r''+filename)
if you don't want to define the r prefix in your variable. This works with b'' and u'' too.

Conversion of strings like \\uXXXX in python [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I receive a string like this from a third-party service:
>>> s
'\\u0e4f\\u032f\\u0361\\u0e4f'
I know that this string actually contains sequences of a single backslash, lowercase u etc. How can I convert the string such that the '\\u0e4f' is replaced by '\u0e4f' (i.e. '๏'), etc.? The result for this example input should be '๏̯͡๏'.
In 2.x:
>>> u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
u'\u0e4f\u032f\u0361\u0e4f'
>>> print u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
๏̯͡๏
There's an interesting list of encodings supported by .encode() and .decode() methods. Those magic ones in the second table include the unicode_escape.
Python3:
bytes("\\u0e4f\\u032f\\u0361\\u0e4f", "ascii").decode("unicode-escape")

Categories