Converting \x escaped string to UTF-8 [duplicate] - python

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 9 years ago.
How can convert a string that looks like '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82' to something readable?

In python 2.7
>>> print '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
привет
>>> print '\\xd0\\xbf\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'.decode('string-escape')
привет
>>> print r'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'.decode('string-escape')
привет
In python 3.x
>>> br'\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'.decode('unicode-escape').encode('latin1').decode('utf-8')
'привет'

For file reading you may use this instead of open():
import codecs
with codecs.open('filename','r','string-escape') as f:
data=f.read()
data will be reencoded while f reading.

Related

How Python decodes UTF8 Encoding in String Format [duplicate]

This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?
Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
you can do decoded_string = s.decode("utf8")

simple way of converting ASCII into HEX with no change in value [duplicate]

This question already has answers here:
How to convert hexadecimal string to bytes in Python?
(7 answers)
Closed 3 years ago.
What is the easiest way of converting a ASCII-String into hex without changing the value?
what i have:
string = "00FA0086"
what i want:
hex_string = b'\x00\xFA\x00\x86"
Is there a simple way or do i need to write a function?
You are looking for the binascii module from the Standard Python Library:
import binascii
string = "00FA0086"
print(repr(binascii.a2b_hex(string)))
gives:
b'\x00\xfa\x00\x86'

Changing string to ascii in python [duplicate]

This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)

Convert str with percents (url) to usual str [duplicate]

This question already has answers here:
Decode escaped characters in URL
(5 answers)
Closed 8 years ago.
I have strings like C%2B%2B_name.zip which are supposed as url encoded. How to convert them to C++_name.zip?
Py 3.x.
For Python 3, you will need to use:
urllib.parse.unquote('C%2B%2B_name.zip')
See urllib.parse.unquote.
All you need is URL library
import urllib
print urllib.unquote('C%2B%2B_name.zip')
and if you have names with other characters (not only English), then you can add .decode('utf8')

Conversion of strings like \\uXXXX in python [duplicate]

This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I receive a string like this from a third-party service:
>>> s
'\\u0e4f\\u032f\\u0361\\u0e4f'
I know that this string actually contains sequences of a single backslash, lowercase u etc. How can I convert the string such that the '\\u0e4f' is replaced by '\u0e4f' (i.e. '๏'), etc.? The result for this example input should be '๏̯͡๏'.
In 2.x:
>>> u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
u'\u0e4f\u032f\u0361\u0e4f'
>>> print u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
๏̯͡๏
There's an interesting list of encodings supported by .encode() and .decode() methods. Those magic ones in the second table include the unicode_escape.
Python3:
bytes("\\u0e4f\\u032f\\u0361\\u0e4f", "ascii").decode("unicode-escape")

Categories