This question already has answers here:
Convert a Unicode string to a string in Python (containing extra symbols)
(12 answers)
Closed 3 years ago.
I need to convert word
name = 'Łódź'
to ASCII characters
output: 'Lodz'
I can't import any library like unicodedata.
I need to do it in clear python.
I've tried to encode than decode and nothing worked.
Well, a simple method would be to map and replace. This also does not require any special imports.
name = 'Łódź'
name=name.replace('Ł','L')
name=name.replace('ó','o')
name=name.replace('ź','z')
print(name)
Related
This question already has answers here:
Convert "\x" escaped string into readable string in python
(4 answers)
Closed 1 year ago.
Now there is a string of utf-8:
s = '\\346\\235\\216\\346\\265\\267\\347\\216\\211'
I need to decode it, but now I only do it in this way:
result = eval(bytes(f"b'{s}'", encoding="utf8")).decode('utf-8')
This is not safe, so is there a better way?
Use ast.literal_eval(), it's not unsafe.
Then you don't need to call bytes(), since it will return a byte string.
result = ast.literal_eval(f"b'{s}'").decode('utf-8')
Might be what you are hoping to get ... :
'\\346\\235\\216\\346\\265\\267\\347\\216\\211'.encode('utf8').decode('unicode-escape')
you can do decoded_string = s.decode("utf8")
This question already has answers here:
What is the best way to remove accents (normalize) in a Python unicode string?
(13 answers)
Closed 2 years ago.
I am trying to write a script that will replace all non-english alphabet letters in file names with their English counterparts, is this possible?
If you mean to "deburr" strings, there's a nice, simple-ish recipe for it (for many accented characters anyway) that uses the Unicode NFKD normalization form, then strips everything non-ascii out of it:
>>> import unicodedata
>>> unicodedata.normalize("NFKD", "törkylempijävongahdus").encode("ascii", "ignore").decode()
'torkylempijavongahdus'
For more complex use cases, maybe https://pypi.org/project/transliterate/ is your thing.
This question already has answers here:
How do I escape curly-brace ({}) characters in a string while using .format (or an f-string)?
(23 answers)
Closed 2 years ago.
I have multiple files of the format myfilexyz-200407171758.tar.gz
(myfilexyz)-(200407171758).tar.gz
Group1 is a variable.
Group2 can be of 12 to 14 digits.
Using variable substitution, I can get this working
r = re.compile('(%s)-(\d){12,13}.tar.gz' %myvar)
But if I were to try the newer format method, I get into trouble
r = re.compile('({})-(\d){12,14}.tar.gz'.format(myvar))
key '12,14' has no corresponding arguments
Obviously the {12,14} is messing up format. Is there a way around this problem and still use the format method for substitution?
From documentation,
If you need to include a bracing character in the literal text, it can be escaped by doubling:
{{ and }}.
Use
'({})-(\d){{12,14}}.tar.gz'.format(myvar)
Also, format is older way of doing it. Use f-string
f'({myvar})-(\d){{12,14}}.tar.gz'
Why not concatenate directly?
myvar + '-(\d){{12,14}}.tar.gz'
This question already has answers here:
Decode escaped characters in URL
(5 answers)
Closed 8 years ago.
I have strings like C%2B%2B_name.zip which are supposed as url encoded. How to convert them to C++_name.zip?
Py 3.x.
For Python 3, you will need to use:
urllib.parse.unquote('C%2B%2B_name.zip')
See urllib.parse.unquote.
All you need is URL library
import urllib
print urllib.unquote('C%2B%2B_name.zip')
and if you have names with other characters (not only English), then you can add .decode('utf8')
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
I receive a string like this from a third-party service:
>>> s
'\\u0e4f\\u032f\\u0361\\u0e4f'
I know that this string actually contains sequences of a single backslash, lowercase u etc. How can I convert the string such that the '\\u0e4f' is replaced by '\u0e4f' (i.e. '๏'), etc.? The result for this example input should be '๏̯͡๏'.
In 2.x:
>>> u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
u'\u0e4f\u032f\u0361\u0e4f'
>>> print u'\\u0e4f\\u032f\\u0361\\u0e4f'.decode('unicode-escape')
๏̯͡๏
There's an interesting list of encodings supported by .encode() and .decode() methods. Those magic ones in the second table include the unicode_escape.
Python3:
bytes("\\u0e4f\\u032f\\u0361\\u0e4f", "ascii").decode("unicode-escape")