This question already has answers here:
Python unicode codepoint to unicode character
(4 answers)
Closed 3 months ago.
I have a simple syntax related question that I would be grateful if someone could answer. So I currently have character labels in a string format: '0941'.
To print out unicode characters in Python, I can just use the command:
print(u'\u0941')
Now, my question is how can I convert the label I have ('0941') into the unicode readable format (u'\u0941')?
Thank you so much!
>>> chr(int('0941',16)) == '\u0941'
True
One way to accomplish this without fussing with your numeric keypad is to simply print the character and then copy/paste it as a label.
>>> print("lower case delta: \u03B4")
lower case delta: δ
>>> δ = 42 # copy the lower case delta symbol and paste it to use it as a label
>>> δδ = δ ** 2 # paste it twice to define another label.
>>> δ # at this point, they are just normal labels...
42
>>> δδ
1764
>>> δabc = 737 # using paste, it's just another character in a label
>>> δ123 = 456
>>> δabc, δ123 # exactly like any other alpha character.
(737, 456)
Related
This question already has answers here:
What is the best way to remove accents (normalize) in a Python unicode string?
(13 answers)
Closed 2 years ago.
Normally I use unicodedata to normalize other latin-ish text. However, I've come across this and not sure what to do:
s = 'Nguyễn Văn Trỗi'
>>> unicodedata.normalize('NFD', s)
'Nguyễn Văn Trỗi'
Is there another module that can normalize more accents than unicodedata ? The output I want is:
Nguyen Van Troi
normalize doesn't mean "remove accents". It is converting between composed and decomposed forms:
>>> import unicodedata as ud
>>> a = 'ă'
>>> print(ascii(ud.normalize('NFD',a))) # LATIN SMALL LETTER A + COMBINING BREVE
'a\u0306'
>>> print(ascii(ud.normalize('NFC',a))) # LATIN SMALL LETTER A WITH BREVE
'\u0103'
One way to remove them is to then encode the decomposed form as ASCII ignoring errors, which works because combining characters are not ASCII. Note, however, that not all international characters have decomposed forms, such as đ.
>>> s = 'Nguyễn Văn Trỗi'
>>> ud.normalize('NFD',s).encode('ascii',errors='ignore').decode('ascii')
'Nguyen Van Troi'
>>> s = 'Ngô Đình Diệm'
>>> ud.normalize('NFD',s).encode('ascii',errors='ignore').decode('ascii')
'Ngo inh Diem' # error
You can work around the exceptions with a translation table:
>>> table = {ord('Đ'):'D',ord('đ'):'d'}
>>> ud.normalize('NFD',s).translate(table).encode('ascii',errors='ignore').decode('ascii')
'Ngo Dinh Diem'
This question already has answers here:
Why do backslashes appear twice?
(2 answers)
Closed 4 years ago.
How can i do to represent a string with (\") inside string
I tried several ways:
date = 'xpto\"xpto'
'xpto"xpto'
date = 'xpto\\"xpto'
'xpto\\"xpto'
data='xpto\\' + '"xpto'
'xpto\\"xpto'
data= r'xpto\"xpto'
'xpto\\"xpto'
i need the string exactly like this
'xpto\"xpto'
if someone knows how, I really appreciate the help
The following line works.
print(r"'xpto\"xpto'")
Output:
'xpto\"xpto'
We add r to insinuate that the string is in a raw format.
and/or
print("'xpto\\\"xpto'") where \\ = \ escapes this and \" = " escaping the " with \
"'xpto\\\"xpto'" is correct. Part of the confusion is distinguishing the actual string with Python's textual representation of the string.
>>> date = "'xpto\\\"xpto'"
>>> date
'\'xpto\\"xpto\''
>>> print(date)
'xpto\"xpto'
A simpler solution (which came to mind after reading Elvir's answer) is to use a triple-quoted raw string:
date = r"""'xpto\"xpto'"""
This question already has answers here:
Python string.strip stripping too many characters [duplicate]
(3 answers)
Closed 6 years ago.
I have encountered a very odd behavior of built-in function lstrip.
I will explain with a few examples:
print 'BT_NAME_PREFIX=MUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=NUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=PUV'.lstrip('BT_NAME_PREFIX=') # UV
print 'BT_NAME_PREFIX=SUV'.lstrip('BT_NAME_PREFIX=') # SUV
print 'BT_NAME_PREFIX=mUV'.lstrip('BT_NAME_PREFIX=') # mUV
As you can see, the function trims one additional character sometimes.
I tried to model the problem, and noticed that it persisted if I:
Changed BT_NAME_PREFIX to BT_NAME_PREFIY
Changed BT_NAME_PREFIX to BT_NAME_PREFIZ
Changed BT_NAME_PREFIX to BT_NAME_PREF
Further attempts have made it even more weird:
print 'BT_NAME=MUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=NUV'.lstrip('BT_NAME=') # UV
print 'BT_NAME=PUV'.lstrip('BT_NAME=') # PUV - different than before!!!
print 'BT_NAME=SUV'.lstrip('BT_NAME=') # SUV
print 'BT_NAME=mUV'.lstrip('BT_NAME=') # mUV
Could someone please explain what on earth is going on here?
I know I might as well just use array-slicing, but I would still like to understand this.
Thanks
You're misunderstanding how lstrip works. It treats the characters you pass in as a bag and it strips characters that are in the bag until it finds a character that isn't in the bag.
Consider:
'abc'.lstrip('ba') # 'c'
It is not removing a substring from the start of the string. To do that, you need something like:
if s.startswith(prefix):
s = s[len(prefix):]
e.g.:
>>> s = 'foobar'
>>> prefix = 'foo'
>>> if s.startswith(prefix):
... s = s[len(prefix):]
...
>>> s
'bar'
Or, I suppose you could use a regular expression:
>>> s = 'foobar'
>>> import re
>>> re.sub('^foo', '', s)
'bar'
The argument given to lstrip is a list of things to remove from the left of a string, on a character by character basis. The phrase is not considered, only the characters themselves.
S.lstrip([chars]) -> string or unicode
Return a copy of the string S with leading whitespace removed. If
chars is given and not None, remove characters in chars instead. If
chars is unicode, S will be converted to unicode before stripping
You could solve this in a flexible way using regular expressions (the re module):
>>> import re
>>> re.sub('^BT_NAME_PREFIX=', '', 'BT_NAME_PREFIX=MUV')
MUV
This question already has answers here:
Python Trailing L Problem
(5 answers)
Closed 9 years ago.
I receive from a module a string that is a representation of an long int
>>> str = hex(5L)
>>> str
'0x5L'
What I now want is to convert the string str back to a number (integer)
int(str,16) does not work because of the L.
Is there a way to do this without stripping the last L out of the string? Because it is also possible that the string contains a hex without the L ?
Use str.rstrip; It works for both cases:
>>> int('0x5L'.rstrip('L'),16)
5
>>> int('0x5'.rstrip('L'),16)
5
Or generate the string this way:
>>> s = '{:#x}'.format(5L) # remove '#' if you don' want '0x'
>>> s
'0x5'
>>> int(s, 16)
5
You could even just use:
>>> str = hex(5L)
>>> long(str,16)
5L
>>> int(long(str,16))
5
>>>
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 3 years ago.
Given a variable which holds a string is there a quick way to cast that into another raw string variable?
The following code should illustrate what I'm after:
line1 = "hurr..\n..durr"
line2 = r"hurr..\n..durr"
print(line1 == line2) # outputs False
print(("%r"%line1)[1:-1] == line2) # outputs True
The closest I have found so far is the %r formatting flag which seems to return a raw string albeit within single quote marks. Is there any easier way to do this kind of thing?
Python 3:
"hurr..\n..durr".encode('unicode-escape').decode()
Python 2:
"hurr..\n..durr".encode('string-escape')
Yet another way:
>>> s = "hurr..\n..durr"
>>> print repr(s).strip("'")
hurr..\n..durr
Above it was shown how to encode.
'hurr..\n..durr'.encode('string-escape')
This way will decode.
r'hurr..\n..durr'.decode('string-escape')
Ex.
In [12]: print 'hurr..\n..durr'.encode('string-escape')
hurr..\n..durr
In [13]: print r'hurr..\n..durr'.decode('string-escape')
hurr..
..durr
This allows one to "cast/trasform raw strings" in both directions. A practical case is when the json contains a raw string and I want to print it nicely.
{
"Description": "Some lengthy description.\nParagraph 2.\nParagraph 3.",
...
}
I would do something like this.
print json.dumps(json_dict, indent=4).decode('string-escape')