How can I add Unicode character in status text by using Tweepy? - python

I want to update status with the Chinese text 我 for which the Unicode is U+6211. I do the same thing when I add emoji in status ("\U0006211") but it didn't work. So is it possible to update the text that is not English?
The error that I got:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-8: truncated \UXXXXXXXX escape

when I use print("\u6211") it prints the correct character but as a "?" so it should work if you have the font I think

The \U escape sequence [...] expects eight hex digits
https://docs.python.org/3/howto/unicode.html#unicode-literals-in-python-source-code
You're providing 7 instead of 8 digits. You can simply append an extra 0: "\U00006211"
Alternatively, you can use the \u excape sequence: "\u6211"

Related

Can't do ASCII with 'u' character in python

I'm trying to do an ascii image in python but gives me this error
File "main.py", line 1
teste = print('''
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 375-376: truncated \UXXXXXXXX escape
And I think it's because of the U character, why that happened, is any way to solve this?
ASCII image
You've got \U in your string, which is being interpreted as the beginning of a Unicode ordinal escape (it expects it to be followed by 8 hex characters representing a single Unicode ordinal).
You could double the escape, making it \\U, but that would make it harder to see the image in the code itself. The simplest approach is to make it a raw string that ignores all escapes save escapes applied to the quote character, by putting an r immediately before the literal:
teste = print(r'''
Note the r immediately after the (, before the '''.

DISCORD // 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

I'm trying to open discord with this script
import subprocess
subprocess.call(['C:\Users\xerxe\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Discord Inc\\Discord.exe'])
but only get this this error
'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
The \ character is an escape character - \n means a newline character, \t is a tab character, etc. \U is used to denote the beginning of a Unicode escape sequence, like \U000145d3, where the 8 chars following \U are hex digits (0-9a-f). Since \Users\xer is not a valid Unicode escape sequence, you got an error. For Windows paths, you either need to escape the escape character:
subprocess.call(['C:\\Users\\xerxe\\AppData\\Roaming\\Microsoft\\Windows\\Start Menu\\Programs\\Discord Inc\\Discord.exe'])
use a raw string literal (note the r just before the opening '):
subprocess.call([r'C:\Users\xerxe\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Discord Inc\Discord.exe'])
or use / characters as path delimiters:
subprocess.call(['C:/Users/xerxe/AppData/Roaming/Microsoft/Windows/Start Menu/Programs/Discord Inc/Discord.exe'])

How to list Amharic (Unicode) code points in python 3.6

I want a list containing Amharic alphabet from utf-8. The character ranges are from U+1200 to U+1399. I am using windows 8. I encountered SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: truncated \UXXXXXXXX escape.
I tried this:
[print(c) for c in u'U1399']
How can i list the characters?
To print the characters from U-1200 to U-1399, I would use a for loop with an int control variable. It's easy enough to convert numbers to characters using chr().
The integer value 0x1200 (i.e. 1200 in hexadecimal) can be converted to the Unicode codepoint U-1200 like so: chr(0x1200) == '\u1200'.
Similarly for 0x1201, 0x1202, ... 0x1399.
Note that we use .isprintable() to filter out code some of the useless entries.
print(' '.join(chr(x) for x in range(0x1200, 0x139A) if chr(x).isprintable()))
or
for x in range(0x1200, 0x139A):
if chr(x).isprintable():
print(hex(x), chr(x))
Note that the code samples require Python3.
Your posted code doesn't produce any errors at all:
>>> [print(c) for c in u'U1399']
U
1
3
9
9
[None, None, None, None, None]
It also doesn't have any non-ASCII characters in it.
You probably wanted to use a Unicode backslash escape. And your problem is probably more like this:
>>> u'\U1399'
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: truncated \UXXXXXXXX escape
The reason is that—as the error message implies—a \U escape requires 8 hex digits, and you've only provided 4. So:
>>> u'\U00001399'
'᎙'
But there's a different escape, sequence \u (notice the lowercase u), which takes only 4 digits:
>>> u'\u1399'
'᎙'
If you're using Python 2.7, and possibly even with Python 3 on Windows, you may not see that nice output, but instead something with backslash escapes in it. But if you print that string, you will see the right character.
The full details for \U and \u escapes (and other escapes) are documented in String and Bytes literals (make sure to switch to the Python version you're actually using, because the details can be different, especially between 2.x and 3.x), but usually you don't need to know much more than explained above.

Python utf-8 character range

I work with a text file, encoded with utf-8, and read its contents with python. After reading the content, I split the text to characters array.
import codecs
with codecs.open(fullpath,'r',encoding='utf8') as f:
text = f.read()
# Split the 'text' to characters
Now, I'm iterating on each character. First, convert it to its hexadecimal representation and running some code on it.
numerialValue = ord(char)
I have noticed that between all those characters, some characters are beyond the expected range.
Expected max value - FFFF.
Actual character value - 1D463.
I translated this code to python. The original source code is coming from C#, whose value '\u1D463' is invalid character.
Being confused.
It seems you escaped your Unicode code-point (U+1D463) with \u instead of \U. The former expects four hex digits, where the latter expects eight hex digits. According to Microsoft Visual Studio:
The condition was ch == '\u1D463'
When I used this literal in Python Interpreter, it doesn't complain but it escapes the first four hex digits happily and 3 prints normally when run in cmd:
>>> print('\u1D463')
ᵆ3
You got this exception:Expected max value - FFFF. Actual character value - 1D463 because you're using the incorrect unicode escape, use \U0001D463 instead of \u1D463. The maximum value for characters code-points in \u is \uFFFF and the maximum value for \U is \UFFFFFFFF. Notice the leading zeros in \U0001D463, \U takes exactly eight hex digits and \u takes exactly four hex digits:
>>> '\U1D463'
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-6: truncated \UXXXXXXXX escape
>>> '\uFF'
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-3: truncated \uXXXX escape

python join/format possible hex values for regex

I'd like to create a template string as possible values for an expression:
'\x1C,\x2C,\x3C,\x4C,\x5C,\x6C,\x7C,\x8C,\x9C,\xAC,\xBC,\xCC,\xDC,\xEC,\xFC'
in a manner like this:
from string import digits, ascii_uppercase
','.join(['\x'+i+'C' for i in digits+ascii_uppercase[:6]])
but unfortunately join does not treat '\x' litterally:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \xXX escape
Unlike, for example, double slashes:
','.join(['\\x'+i+'C' for i in digits+ascii_uppercase[:6]])
\\x0C,\\x1C,\\x2C,\\x3C,\\x4C,\\x5C,\\x6C,\\x7C,\\x8C,\\x9C,\\xAC,\\xBC,\\xCC,\\xDC,\\xEC,\\xFC'
Any ideas around this? Maybe another encoding?
Since you're dealing with characters, deal with characters.
','.join(chr(x) for x in range(0x1c, 0x100, 0x10))
\x will try to escape \x like \n (newline), you need use \\ to use the first \ escape the second \.
However, the two \ only display when you just type it in shell, but when you print it out the another one will be gone:
>>> text = '\\x0C,\\x1C,\\x2C,\\x3C,\\x4C,\\x5C,\\x6C,\\x7C,\\x8C,\\x9C,\\xAC,\\ xBC,\\xCC,\\xDC,\\xEC,\\xFC'
>>> text
'\\x0C,\\x1C,\\x2C,\\x3C,\\x4C,\\x5C,\\x6C,\\x7C,\\x8C,\\x9C,\\xAC,\\xBC,\\xCC,\\xDC,\\xEC,\\xFC'
>>> print(text)
\x0C,\x1C,\x2C,\x3C,\x4C,\x5C,\x6C,\x7C,\x8C,\x9C,\xAC,\xBC,\xCC,\xDC,\xEC,\xFC

Categories