How to know if a Unicode identifier is valid?

How to know if a Unicode identifier is valid? - python

Here is a sample scenario -
>>> এক = 1
>>> ১ = 1
File "<stdin>", line 1
১ = 1
^
SyntaxError: invalid character in identifier
I'm running this from default python3 interpreter. Why do the first unicode string work as an identifier and not the second one?

Valid identifiers are explained in the Python 3 documentation: Lexical analysis - Identifiers and keywords.
The exact details can found in PEP 3131.

Related

Why too long variable names causes 'SyntaxError: invalid syntax'?

The next line of code caused SyntaxError: invalid syntax:
#coding=utf-8
result_3_logspace_mean_proportion_сorrect_answers = Exception('3_logspace_mean_proportion_сorrect_answers').get_result(result_1_main)
while the second line of code not:
#coding=utf-8
result_3 = Exception('3_logspace_mean_proportion_сorrect_answers').get_result(result_1_main)
How to deal with the problem? I strongly want the first variable name.

The letter at the beginning of correct is not a c, it's a Cyrillic с, and Python 2 only accepts ASCII by default in source code.

The first variable name contains a non-ascii character: the first "c" in "correct" is a small cyrillic es. You can see this if you decode the string to ascii:
#Python3
>>> 'result_3_logspace_mean_proportion_сorrect_answers'.encode()
'result_3_logspace_mean_proportion_\xd1\x81orrect_answers'
#Python2
>>> u'result_3_logspace_mean_proportion_сorrect_answers'.encode('utf8')
'result_3_logspace_mean_proportion_\xd1\x81orrect_answers'
Substituting it for a normal "c" fixes the issue.

Strange behavior of python Interpreter while assigning integer value with prefix 0 [duplicate]

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?

My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.

From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

How to iterate through arabic word in python? [duplicate]

In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters.
>>> from unicodedata import name
>>> name(u'\n')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: no such name
>>> name(u'a')
'LATIN SMALL LETTER A'
Certainly Unicode contains the character \n, and it has a name, specifically "LINE FEED".
NB. unicodedata.lookup('LINE FEED') and unicodedata.lookup(u'LINE FEED') both give a KeyError: undefined character name.

The unicodedata.name() lookup relies on column 2 of the UnicodeData.txt database in the standard (Python 2.7 uses Unicode 5.2.0).
If that name starts with < it is ignored. All control codes, including newlines, are in that category; the first column has no name other than <control>:
000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;
Column 10 is the old, Unicode 1.0 name, and should not be used, according to the standard. In other words, \n has no name, other than the generic <control>, which the Python database ignores (as it is not unique).
Python 3.3 added support for NameAliases.txt, which lets you look up names by alias; so lookup('LINE FEED'), lookup('new line') or lookup('eol'), etc, all reference \n. However, the unicodedata.name() method does not support aliases, nor could it (which would it pick?):
Added support for Unicode name aliases and named sequences. Both unicodedata.lookup() and '\N{...}' now resolve name aliases, and unicodedata.lookup() resolves named sequences too.
TL;DR: LINE FEED is not the official name for \n, it is but an alias for it. Python 3.3 and up let you look up characters by alias.

invalid token error on "01" in python [duplicate]

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?

My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.

From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

Python cannot handle numbers string starting with 0. Why?

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?

My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.

From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to know if a Unicode identifier is valid? - python

Here is a sample scenario - >>> এক = 1 >>> ১ = 1 File "<stdin>", line 1 ১ = 1 ^ SyntaxError: invalid character in identifier I'm running this from default python3 interpreter. Why do the first unicode string work as an identifier and not the second one?

Valid identifiers are explained in the Python 3 documentation: Lexical analysis - Identifiers and keywords. The exact details can found in PEP 3131.

Related

Why too long variable names causes 'SyntaxError: invalid syntax'?

Strange behavior of python Interpreter while assigning integer value with prefix 0 [duplicate]

How to iterate through arabic word in python? [duplicate]

invalid token error on "01" in python [duplicate]

Python cannot handle numbers string starting with 0. Why?

Categories

Resources