Python cannot handle numbers string starting with 0. Why? - python

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?

My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.

From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

Related

Strange behavior of python Interpreter while assigning integer value with prefix 0 [duplicate]

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?
My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.
From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

python version 3.4 does not support a 'ur' prefix

I have some python code writen in an older version of python(2.x) and I struggle to make it work. I'm using python 3.4
_eng_word = ur"[a-zA-Z][a-zA-Z0-9'.]*"
(it's part of a tokenizer)
http://bugs.python.org/issue15096
Title: Drop support for the "ur" string prefix
When PEP 414 restored support for explicit Unicode literals in Python 3, the "ur" string prefix was deemed to be a synonym for the "r" prefix.
So, use 'r' instead of 'ur'
Indeed, Python 3.4 only supports u'...' (to support code that needs to run on both Python 2 and 3) and r'....', but not both. That's because the semantics of how ur'..' works in Python 2 are different from how ur'..' would work in Python 3 (in Python 2, \uhhhh and \Uhhhhhhhh escapes still are processed, in Python 3 a `r'...' string would not).
Note that in this specific case there is no difference between the raw string literal and the regular! You can just use:
_eng_word = u"[a-zA-Z][a-zA-Z0-9'.]*"
and it'll work in both Python 2 and 3.
For cases where a raw string literal does matter, you could decode the raw string from raw_unicode_escape on Python 2, catching the AttributeError on Python 3:
_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
try:
# Python 2
_eng_word = _eng_word.decode('raw_unicode_escape')
except AttributeError:
# Python 3
pass
If you are writing Python 3 code only (so it doesn't have to run on Python 2 anymore), just drop the u entirely:
_eng_word = r"[a-zA-Z][a-zA-Z0-9'.]*"
This table compares (some of) the different string literal prefixes in Python 2(.7) and 3(.4+):
As you can see, in Python 3 there's no way to have a literal that doesn't process escapes, but does process unicode literals. To get such a string with code that works in both Python 2 and 3, use:
br"[a-zA-Z][a-zA-Z0-9'.]*".decode('raw_unicode_escape')
Actually, your example is not very good, since it doesn't have any unicode literals, or escape sequences. A better example would be:
br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
In python 2:
>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
u"[\u03b1-\u03c9\u0391-\u03a9][\\t'.]*"
In Python 3:
>>> br"[\u03b1-\u03c9\u0391-\u03a9][\t'.]*".decode('raw_unicode_escape')
"[α-ωΑ-Ω][\\t'.]*"
Which is really the same thing.

How to know if a Unicode identifier is valid?

Here is a sample scenario -
>>> এক = 1
>>> ১ = 1
File "<stdin>", line 1
১ = 1
^
SyntaxError: invalid character in identifier
I'm running this from default python3 interpreter. Why do the first unicode string work as an identifier and not the second one?
Valid identifiers are explained in the Python 3 documentation: Lexical analysis - Identifiers and keywords.
The exact details can found in PEP 3131.

Wrong math with Python?

Just starting out with Python, so this is probably my mistake, but...
I'm trying out Python. I like to use it as a calculator, and I'm slowly working through some tutorials.
I ran into something weird today. I wanted to find out 2013*2013, but I wrote the wrong thing and wrote 2013*013, and got this:
>>> 2013*013
22143
I checked with my calculator, and 22143 is the wrong answer! 2013 * 13 is supposed to be 26169.
Why is Python giving me a wrong answer? My old Casio calculator doesn't do this...
Because of octal arithmetic, 013 is actually the integer 11.
>>> 013
11
With a leading zero, 013 is interpreted as a base-8 number and 1*81 + 3*80 = 11.
Note: this behaviour was changed in python 3. Here is a particularly appropriate quote from PEP 3127
The default octal representation of integers is silently confusing to
people unfamiliar with C-like languages. It is extremely easy to
inadvertently create an integer object with the wrong value, because
'013' means 'decimal 11', not 'decimal 13', to the Python language
itself, which is not the meaning that most humans would assign to this
literal.
013 is an octal integer literal (equivalent to the decimal integer literal 11), due to the leading 0.
>>> 2013*013
22143
>>> 2013*11
22143
>>> 2013*13
26169
It is very common (certainly in most of the languages I'm familiar with) to have octal integer literals start with 0 and hexadecimal integer literals start with 0x. Due to the exact confusion you experienced, Python 3 raises a SyntaxError:
>>> 2013*013
File "<stdin>", line 1
2013*013
^
SyntaxError: invalid token
and requires either 0o or 0O instead:
>>> 2013*0o13
22143
>>> 2013*0O13
22143
Python's 'leading zero' syntax for octal literals is a common gotcha:
Python 2.7.3
>>> 010
8
The syntax was changed in Python 3.x http://docs.python.org/3.0/whatsnew/3.0.html#integers
This is mostly just expanding on #Wim's answer a bit, but Python indicates the base of integer literals using certain prefixes. Without a prefix, integers are interpreted as being in base-10. With an "0x", the integer will be interpreted as a hexadecimal int. The full grammar specification is here, though it's a bit tricky to understand if you're not familiar with formal grammars: http://docs.python.org/2/reference/lexical_analysis.html#integers
The table essentially says that if you want a long value (i.e. one that exceeds the capacity of a normal int), write the number followed by the letter "L" or "l"; if you want your number to be interpreted in decimal, write the number normally (with no leading 0); if you want it interpreted in octal, prefix it with "0", "0o", or "0O"; if you want it in hex, prefix it with "0x"; and if you want it in binary, prefix it with "0b" or "0B".

invalid token error on "01" in python [duplicate]

I just executed the following program on my python interpreter:
>>> def mylife(x):
... if x>0:
... print(x)
... else:
... print(-x)
...
>>> mylife(01)
File "<stdin>", line 1
mylife(01)
^
SyntaxError: invalid token
>>> mylife(1)
1
>>> mylife(-1)
1
>>> mylife(0)
0
Now, I have seen this but as the link says, the 0 for octal does not work any more in python (i.e. does not work in python3). But does that not mean that the the behaviour for numbers starting with 0 should be interpreted properly? Either in base-2 or in normal base-10 representation? Since it is not so, why does python behave like that? Is it an implementation issue? Or is it a semantic issue?
My guess is that since 012 is no longer an octal literal constant in python3.x, they disallowed the 012 syntax to avoid strange backward compatibility bugs. Consider your python2.x script which using octal literal constants:
a = 012 + 013
Then you port it to python 3 and it still works -- It just gives you a = 25 instead of a = 21 as you expected previously (decimal). Have fun tracking down that bug.
From the Python 3 release notes http://docs.python.org/3.0/whatsnew/3.0.html#integers
Octal literals are no longer of the form 0720; use 0o720 instead.
The 'leading zero' syntax for octal literals in Python 2.x was a common gotcha:
Python 2.7.3
>>> 010
8
In Python 3.x it's a syntax error, as you've discovered:
Python 3.3.0
>>> 010
File "<stdin>", line 1
010
^
SyntaxError: invalid token
You can still convert from strings with leading zeros same as ever:
>>> int("010")
10

Categories