This question already has answers here:
Convert an int value to unicode
(4 answers)
Closed 5 years ago.
In Python I'm trying to print out the character corresponding to a given code like this:
a = 159
print unichr(a)
It prints out a strange symbol. Can anyone explain why?
#To get numerical value of string. This gives me 49.
ord("1")
#To get string from numerical value. This gives me "1".
chr(49)
It is possible that the numerical value that you're trying to convert to a digit is the representative of a special character, in which case it is likely that python converted it into it's hex equivalent. To see the hex value of an integer:
hex(ord("1"))
If that is not the case, it's possible that it used another representative, since it is(hypothetically) a special character.
The character at unicode 159 is an Application Program Command. It's a control character, and is deemed not a graphic character.
More information
Related
This question already has answers here:
Escaped Unicode to Emoji in Python
(1 answer)
How to encode Python 3 string using \u escape code?
(1 answer)
Closed 1 year ago.
I was looking at https://r12a.github.io/app-conversion/ and I see that they have a "JS/Java/C" section. I was wondering if anyone had the code for that in python. I can't seem to find it. Thanks!
Edit: code
b = '😀'
txt = b.encode('utf-8')
From How to work with surrogate pairs in Python? (linked from duplicate Escaped Unicode to Emoji in Python )
If you see '\ud83d\ude4f' Python string (2 characters) then there is a bug upstream. Normally, you shouldn't get such string. If you get one and you can't fix upstream that generates it; you could fix it using surrogatepass error handler:
>>> "\uD83D\uDE00".encode('utf-16', 'surrogatepass').decode('utf-16')
'😀'
Original Answer
Perhaps you're looking for ord()?
Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, ord('a') returns the integer 97 and ord('€') (Euro sign) returns 8364. This is the inverse of chr().
>>> hex(ord("😀"))
'0x1f600'
This question already has answers here:
How can I put an actual backslash in a string literal (not use it for an escape sequence)?
(4 answers)
Closed 2 years ago.
Say I assign variable
x = '\\dnassmb1\biloadfiles_dev\Workday'
print(x)
Output:
'\\dnassmb1\x08iloadfiles_dev\\Workday'
I would like to know why it's changing to "x08.." specifically and how to avoid that automatic change and use string as it is. Thank you!
You are doing wrong.Backslash has a different meaning in pyhton while using in strings.
Backslashes are actually used to put some special character inside the string.
If you want to get the above string printed;
x = '\\\dnassmb1\\biloadfiles_dev\\Workday'
print(x)
If you got this, i am using an extra backslash everywhere where i want a backslash to be printed. This is because the first backslash indicates that what ever is going to come after it is just a part of the string and has no special meaning.
Use raw strings:
x = r'\\dnassmb1\biloadfiles_dev\Workday'
This will prevent python from treating your backslashes as escape sequences. See string and byte literals in the Python documentation for a full treatment of string parsing.
It's important to pay close attention to the difference between representation and value here. Just because a string appears to have four backslashes in it, doesn't mean that those backslashes are in the value of the string. Consider:
>>> x = '\\dnassmb1\biloadfiles_dev\Workday' # regular string
>>> y = r'\\dnassmb1\biloadfiles_dev\Workday' # raw string
>>> print(x); print(y)
\dnassmbiloadfiles_dev\Workday
\\dnassmb1\biloadfiles_dev\Workday
Here, x and y are both just strings, once Python has parsed them. But even though the parts inside the quotes are the same, the bytes of the string are different. In y's case, you see exactly the number of backslashes you put in.
This question already has answers here:
How to use digit separators for Python integer literals?
(4 answers)
Closed 2 years ago.
I seem to remember that there is a syntax in python to directly input numbers with comma separators (1,000,000 instead of 1000000). Googling the issue gives either results on how to:
Print numbers with comma separators
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
locale.format("%d", 1255000, grouping=True)
Remove commas from numbers to input into python
a = '1,000,000'
int(a.replace(',' , ''))
I don't want to do either and the plethora of results stops me from finding the information I need.
Instead of commas, Python allows the use of underscores.
See https://www.python.org/dev/peps/pep-0515/
grouping decimal numbers by thousands
amount = 10_000_000.0
grouping hexadecimal addresses by words
addr = 0xCAFE_F00D
grouping bits into nibbles in a binary literal
flags = 0b_0011_1111_0100_1110
same, for string conversions
flags = int('0b_1111_0000', 2)
This question already has answers here:
Removing u in list
(8 answers)
Closed 6 years ago.
I want to remove 'u' from every element in the list, can anybody help me?
[u'four', u'gag', u'prefix', u'woods']
The issue is with the encoding of strings.
Do this :
l = [u'four', u'gag', u'prefix', u'woods']
l2 = [i.encode('UTF-8') for i in l]
print l2
['four', 'gag', 'prefix', 'woods']
The u is an attribute that tells what type of string it is. If it was a byte string, this would be b. If you call type on these, they will return String. The difference between Unicode and something like ASCII is that Unicode is a super-set of ASCII that is the same for 0-127, but has more capability to represent different types of characters. These can be UTF-8 or UTF-32 or whatever, but generally are larger than one byte.
It should behave the same for 99% of the things that you want to do, but you can also change the encoding if you have a function that needs a very particular type of string.
This question already has answers here:
What does a leading `\x` mean in a Python string `\xaa`
(2 answers)
Closed 7 years ago.
What is the difference between \x and \u escape sequences in python? (Apart from the fact that \x uses the syntax \xXX and \u uses \uXXXX). print('\xa5') gives the output as 'Â¥' in script mode and so does print('\u00a5'), so how is one different from the other, apart from the syntax used?
The most important difference is that \uXXXX accepts 4 hexadecimal digits and is therefore suitable for higher numbers (and therefore can be used to refer to special characters that are not in ASCII or your current code page). It can therefore only be used in unicode strings:
u'\u0123'
The older \xXX can be used in both unicode strings and str strings, but only for code points up to 255:
u'\u0123\x20'
'\x20'