I've Been Trying To Create A System Which Turns A Table Of 1's And 0's To A Braille Character But It Keeps Giving Me This Error
File "brail.py", line 16
stringToWrite=u"\u"+brail([1,1,1,0,0,0,1,1])
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
My Current Code Is
def brail(brailList):
if len(brailList) == 8:
brailList.reverse()
brailHelperList=[0x80,0x40,0x20,0x10,0x8,0x4,0x2,0x1]
brailNum=0x0
for num in range(len(brailList)):
if brailList[num] == 1:
brailNum+=brailHelperList[num]
stringToReturn="28"+str(hex(brailNum))[2:len(str(hex(brailNum)))]
return stringToReturn
else:
return "String Needs To Be 8 In Length"
fileWrite=open('Write.txt','w',encoding="utf-8")
stringToWrite=u"\u"+brail([1,1,1,0,0,0,1,1])
fileWrite.write(stringToWrite)
fileWrite.close()
It Works When I Do fileWrite.write(u"\u28c7") But When I Do A Function Which Should Return That Exact Same Thing It Errors.
Image Of Code Just In Case
\u is the unicode escape sequence for Python literal strings. A 4 hex digit unicode code point is expected to follow the escape sequence. It is a syntax error if the code point is missing or is too short.
>>> '\u28c7'
'⣇'
>>> '\u'
File "<stdin>", line 1
'\u'
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
If you are using Python 3 then the u string prefix is not required as strings are stored as unicode internally. The u prefix was maintained for compatibility with Python 2 code.
That's the cause of the exception, however, you don't need to construct the unicode code point like that. You can use the ord() and chr() functions:
from unicodedata import lookup
braille_start = ord(lookup('BRAILLE PATTERN BLANK'))
return chr(braille_start + brailNum)
You can rewrite
stringToWrite=u"\u"+brail([1,1,1,0,0,0,1,1])
as
stringToWrite="\\u{0}".format(brail([1, 1, 1, 0, 0, 0, 1, 1]))
All strings are unicode in Python 3, so you don't need the leading "u".
def braille(brailleString):
brailleList = []
brailleList[:0]=brailleString
if len(brailleList) > 8:
brailleList=brailleList[0:8]
if len(brailleList) < 8:
while len(brailleList) < 8:
brailleList.append('0')
brailleList1=[
int(brailleList[0]),
int(brailleList[1]),
int(brailleList[2]),
int(brailleList[4]),
int(brailleList[5]),
int(brailleList[6]),
int(brailleList[3]),
int(brailleList[7]),
]
brailleList1.reverse()
brailleHelperList=[128,64,32,16,8,4,2,1]
brailleNum=0
for num in range(len(brailleList1)):
if brailleList1[num] == 1:
brailleNum+=brailleHelperList[num]
brailleStart = 10240
return chr(brailleStart+brailleNum)
fileWrite=open('Write.txt','w',encoding="utf-16")
fileWrite.write(braille('11111111'))
fileWrite.close()
# Think Of The Braille Functions String Like It Has A Seperator In The Middle And The 1s And 0s Are Going Vertically
Related
I am trying to create a random unicode generator and made a function that can create 16bit unicode charaters. This is my code:
import random
import string
def rand_unicode():
list = []
list.append(str(random.randint(0,1)))
for i in range(0,3):
if random.randint(0,1):
list.append(string.ascii_letters[random.randint(0, \
len(string.ascii_letters))-1].upper())
else:
list.append(str(random.randint(0,9)))
return ''.join(list)
print(rand_unicode())
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
I tried raw strings but that only gives me output like '\u0070' without turning it into a unicode character. How can I properly connect the strings to create a unicode character? Any help is appreciated.
From:
The problem is that whenever I try to add a '\u' in the print statement, Python gives me the following error:
it sounds like the problem may be in code you haven't included in your question:
print('\u' + rand_unicode())
This won't do what you expect, because the '\u' is interpreted before the strings are concatenated. See Process escape sequences in a string in Python and try:
print(bytes('\\u' + rand_unicode(), 'us-ascii').decode('unicode_escape'))
A unicode escape sequence such as \u0070 is a single character. It is not the concatenation of \u and the ordinal.
>>> '\u0070' == 'p'
True
>>> '\u0070' == (r'\u' + '0070')
False
To convert an ordinal to a unicode character, you can pass the numerical ordinal to the chr builtin function. Use int(literal, 16) to convert a hex-literal ordinal to a numerical one:
>>> ordinal = '0070'
>>> chr(int(ordinal, 16)) # convert literal to number to unicode
'p'
>>> chr(int(rand_unicode(), 16))
'ᚈ'
Note that creating a literal ordinal is not required. You can directly create the numerical ordinal:
>>> chr(112) # convert decimal number to unicode
'p'
>>> chr(0x0070) # convert hexadecimal number to unicode
'p'
>>> chr(random.randint(0, 0x10FFF))
'嚟'
i know this type is asked alot but no answer was able to specifically help me with my problemsetup.
i have a list of ONLY Unicode codepoints so in this form:
304E
304F
...
No U+XXXX no '\XXXX' version.
Now i've tried to use stringmanipulation to recreate such strings
so i can simply print the corresponding unichar.
what i tried:
x = u'\\u' + listString
x = '\\u' + listString
x = '\u' + listString
the first 2 when printed just give me a '\uXXXX' string, but no idea
how to make it print the char not that string.
the last one gives me this error:
(unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
probably just something i dont get about unicode and stringmanipulation but i hope someone can help me out here.
Thanks in advance o/
You can use chr to get the character for a unicode code point:
>>> chr(0x304E)
'ぎ'
You can use int to convert a hexadecimal string to an integer:
>>> int('304E', 16)
12366
>>> chr(int('304E', 16))
'ぎ'
Given a list of hexadecimals that corresponds to the unicode, how to programmatically retrieve the unicode char?
E.g. Given the list:
>>> l = ['9359', '935A', '935B']
how to achieve this list:
>>> u = [u'\u9359', u'\u935A', u'\u935B']
>>> u
['鍙', '鍚', '鍛']
I've tried this but it throws a SyntaxError:
>>> u'\u' + l[0]
File "<stdin>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
\uhhhh escapes are only valid in string literals, you can't use those to turn arbitrary hex values into characters. In other words, they are part of a larger syntax, and can't be used stand-alone.
Decode the hex value to an integer and pass it to the chr() function (or, on Python 2, the unichr() function):
[chr(int(v, 16)) for v in l] #
You could ask Python to interpret a string containing literal \uhhhh text as a Unicode string literal with the unicode_escape codec, but feels like overkill for individual codepoints:
[(b'\\u' + v.encode('ascii')).decode('unicode_escape') for v in l]
Note the double backslash in the prefix added, and that we have to create byte strings for this to work at all.
Demo:
>>> l = ['9359', '935A', '935B']
>>> [chr(int(v, 16)) for v in l]
['鍙', '鍚', '鍛']
>>> [(b'\\u' + v.encode('ascii')).decode('unicode_escape') for v in l]
['鍙', '鍚', '鍛']
I am trying to generate random Unicode characters with two starting number+letter combination..
I have tried the following below but I am getting an error.
def rand_unicode():
b = ['03','20']
l = ''.join([random.choice('ABCDEF0123456789') for x in xrange(2)])
return unicode(u'\u'+random.choice(b)+l,'utf8')
The error I am getting:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: end of string in escape sequence
I use Python 2.6.
Yeah, uh, that's not how.
return unichr(random.choice((0x300, 0x2000)) + random.randint(0, 0xff))
ok so my issue is i have the string '\222\222\223\225' which is stored as latin-1 in the db. What I get from django (by printing it) is the following string, 'ââââ¢' which I assume is the UTF conversion of it. Now I need to pass the string into a function that
does this operation:
strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)
I get this error:
chr() arg not in range(256)
If I try to encode the string as latin-1 first I get this error:
'latin-1' codec can't encode characters in position 0-3: ordinal not
in range(256)
I have read a bunch on how character encoding works, and there is something I am missing because I just don't get it!
Your first error 'chr() arg not in range(256)' probably means you have underflowed the value, because chr cannot take negative numbers. I don't know what the encryption algorithm is supposed to do when the inputcounter + 33 is more than the actual character representation, you'll have to check what to do in that case.
About the second error. you must decode() and not encode() a regular string object to get a proper representation of your data. encode() takes a unicode object (those starting with u') and generates a regular string to be output or written to a file. decode() takes a string object and generate a unicode object with the corresponding code points. This is done with the unicode() call when generated from a string object, you could also call a.decode('latin-1') instead.
>>> a = '\222\222\223\225'
>>> u = unicode(a,'latin-1')
>>> u
u'\x92\x92\x93\x95'
>>> print u.encode('utf-8')
ÂÂÂÂ
>>> print u.encode('utf-16')
ÿþ
>>> print u.encode('latin-1')
>>> for c in u:
... print chr(ord(c) - 3 - 0 -30)
...
q
q
r
t
>>> for c in u:
... print chr(ord(c) - 3 -200 -30)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
ValueError: chr() arg not in range(256)
As Vinko notes, Latin-1 or ISO 8859-1 doesn't have printable characters for the octal string you quote. According to my notes for 8859-1, "C1 Controls (0x80 - 0x9F) are from ISO/IEC 6429:1992. It does not define names for 80, 81, or 99". The code point names are as Vinko lists them:
\222 = 0x92 => PRIVATE USE TWO
\223 = 0x93 => SET TRANSMIT STATE
\225 = 0x95 => MESSAGE WAITING
The correct UTF-8 encoding of those is (Unicode, binary, hex):
U+0092 = %11000010 %10010010 = 0xC2 0x92
U+0093 = %11000010 %10010011 = 0xC2 0x93
U+0095 = %11000010 %10010101 = 0xC2 0x95
The LATIN SMALL LETTER A WITH CIRCUMFLEX is ISO 8859-1 code 0xE2 and hence Unicode U+00E2; in UTF-8, that is %11000011 %10100010 or 0xC3 0xA2.
The CENT SIGN is ISO 8859-1 code 0xA2 and hence Unicode U+00A2; in UTF-8, that is %11000011 %10000010 or 0xC3 0x82.
So, whatever else you are seeing, you do not seem to be seeing a UTF-8 encoding of ISO 8859-1. All else apart, you are seeing but 5 bytes where you would have to see 8.
Added:
The previous part of the answer addresses the 'UTF-8 encoding' claim, but ignores the rest of the question, which says:
Now I need to pass the string into a function that does this operation:
strdecryptedPassword + chr(ord(c) - 3 - intCounter - 30)
I get this error: chr() arg not in range(256). If I try to encode the
string as Latin-1 first I get this error: 'latin-1' codec can't encode
characters in position 0-3: ordinal not in range(256).
You don't actually show us how intCounter is defined, but if it increments gently per character, sooner or later 'ord(c) - 3 - intCounter - 30' is going to be negative (and, by the way, why not combine the constants and use 'ord(c) - intCounter - 33'?), at which point, chr() is likely to complain. You would need to add 256 if the value is negative, or use a modulus operation to ensure you have a positive value between 0 and 255 to pass to chr(). Since we can't see how intCounter is incremented, we can't tell if it cycles from 0 to 255 or whether it increases monotonically. If the latter, then you need an expression such as:
chr(mod(ord(c) - mod(intCounter, 255) + 479, 255))
where 256 - 33 = 223, of course, and 479 = 256 + 223. This guarantees that the value passed to chr() is positive and in the range 0..255 for any input character c and any value of intCounter (and, because the mod() function never gets a negative argument, it also works regardless of how mod() behaves when its arguments are negative).
Well its because its been encrypted with some terrible scheme that just changes the ord() of the character by some request, so the string coming out of the database has been encrypted and this decrypts it. What you supplied above does not seem to work. In the database it is latin-1, django converts it to unicode, but I cannot pass it to the function as unicode, but when i try and encode it to latin-1 i see that error.