Invalid token and Invalid syntax

Invalid token and Invalid syntax - python

I have a question in python's error case. I have tried here....
>>> 0o08
SyntaxError: invalid syntax
>>> 0o8
SyntaxError: invalid token
I want to know:
Which is the invalid token, 0, o or 8?
Why is 0o08 invalid syntax?

An integer literal starting with 0o is interpreted as octal. Per the documentation:
octinteger ::= "0" ("o" | "O") octdigit+ # '0o' or '0O' followed by one or more...
...
octdigit ::= "0"..."7" # ...digits 0 to 7 inclusive
The token 8 is not a valid octdigit, so is not allowed in an octal literal, hence "invalid token".
The reason that the error messages are different is that (related to Python's LL(1) parser, which only looks ahead one token at a time):
If the first token after 0o is not an octdigit, that is clearly an invalid token and parsing stops immediately; whereas
If subsequent tokens happen to be invalid, this isn't detected at such an early stage and parsing continues until the whole line gets rejected as invalid syntax.
You can see this difference in the highlighting in IDLE (only 0o highlighted vs. whole line highlighted), and if you try some alternatives:
>>> 0ok # first token is invalid
SyntaxError: invalid token
>>> 0o18 # subsequent token is invalid
SyntaxError: invalid syntax
>>> 0o10 # all tokens are valid
8

Related

Why too long variable names causes 'SyntaxError: invalid syntax'?

The next line of code caused SyntaxError: invalid syntax:
#coding=utf-8
result_3_logspace_mean_proportion_сorrect_answers = Exception('3_logspace_mean_proportion_сorrect_answers').get_result(result_1_main)
while the second line of code not:
#coding=utf-8
result_3 = Exception('3_logspace_mean_proportion_сorrect_answers').get_result(result_1_main)
How to deal with the problem? I strongly want the first variable name.

The letter at the beginning of correct is not a c, it's a Cyrillic с, and Python 2 only accepts ASCII by default in source code.

The first variable name contains a non-ascii character: the first "c" in "correct" is a small cyrillic es. You can see this if you decode the string to ascii:
#Python3
>>> 'result_3_logspace_mean_proportion_сorrect_answers'.encode()
'result_3_logspace_mean_proportion_\xd1\x81orrect_answers'
#Python2
>>> u'result_3_logspace_mean_proportion_сorrect_answers'.encode('utf8')
'result_3_logspace_mean_proportion_\xd1\x81orrect_answers'
Substituting it for a normal "c" fixes the issue.

Getting error on if and elif

Does anyone know why I keep getting an error with this section of my code?
if db_orientation2 =="Z":
a="/C=C\"
elif db_orientation2=="E":
a="\C=C\"
This is the error:
File "<ipython-input-7-25cda51c429e>", line 11
a="/C=C\"
^
SyntaxError: EOL while scanning string literal
The elif is highlighted as red as if the operation is not allowed...

String literals cannot end with a backslash. You'll have to double it:
a="/C=C\\"
# ^
The highlighting of your code also clearly shows the problem.

Why is 3 behaving differently from int(3)? [duplicate]

This question already has answers here:
Why doesn't 2.__add__(3) work in Python?
(2 answers)
Closed 8 years ago.
I was playing with the Python interpreter (Python 3.2.3) and tried the following:
>>> dir(1)
This gave me all the attributes and methods of the int object. Next I tried:
>>> 1.__class__
However this threw an exception:
File "<stdin>", line 1
1.__class__
^
SyntaxError: invalid syntax
When I tried out the same with a float I got what I expected:
>>> 2.0.__class__
<class 'float'>
Why do int and float literals behave differently?

It's probably a consequence of the parsing algorithm used. A simple mental model is that the tokenizer attempts to match all the token patterns there are, and recognizes the longest match it finds. On a lower-level, the tokenizer works character-by-character, and makes a decision based only on the current state and input character – there shouldn't be any backtracking or re-reading of input.
After joining patterns with common prefixes – in this case, the pattern for int literals and the integral part of the pattern of float literals – what happens in the tokenizer is that it:
Reads the 1, and enters the state that indicates "reading either a float or an int literal"
Reads the ., and enters the state "reading a float literal"
Reads the _, which can not be part of a float literal. The parser emits 1. as a float literal token.
Carries on parsing starting with the _, and eventually emits __class__ as an identifier token.
Aside: This tokenizing approach is also the reason why common languages have the syntax restrictions they have. E.g. identifiers
contain letters, digits, and underscores, but cannot start with a
digit. If that was allowed, 123abc could be intended as either an
identifier, or the integer 123 followed by the identifier abc.
A lex-like tokenizer would recognize this as the former since it leads
to the longest single token, but nobody likes having to keep details
like this in their head when trying to read code. Or when trying to
write and debug the tokenizer for that matter.
The parser then tries to process the token stream:
<FloatLiteral: '1.'> <Identifier: '__class__'>
In Python, a literal directly followed by an identifier – without an operator between the tokens – makes no sense, so the parser bails. This also means that the reason why Python would complain about 123abc being invalid syntax isn't the tokenizer error "the character a isn't valid in an integer literal", but the parser error "the identifier abc cannot directly follow the integer literal 123"
The reason why the tokenizer can't recognize the 1 as an int literal is that the character that makes it leave the float-or-int state determines what it just read. If it's ., it was the start of a float literal, which might continue afterwards. If it's something else, it was a complete int literal token.
It's not possible for the tokenizer to "go back" and re-read the previous input as something else. In fact, the tokenizer is at too low a level to care about what an "attribute access" is and handle such ambiguities.
Now, your second example is valid because the tokenizer knows a float literal can only have one . in it. More precisely: the first . makes it transition from the float-or-int state to the float state. In this state, it only expects digits (or an E for scientific/engineering notation, a j for complex numbers…) to continue the the float literal. The first character that's not a digit etc. (i.e. the .) is definitely no longer part of the float literal and the tokenizer can emit the finished token. The token stream for your second example will thus be:
<FloatLiteral: '1.'> <Operator: '.'> <Identifier: '__class__'>
Which, of course, the parser then recognizes as valid Python. Now we also know enough why the suggested workarounds help. In Python, separating tokens with whitespace is optional – unlike, say, in Lisp. Conversely, whitespace does separate tokens. (That is, no tokens except string literals may contain whitespace, it's merely skipped between tokens.) So the code:
1 .__class__
is always tokenized as
<IntLiteral: '1'> <Operator: '.'> <Identifier: '__class__'>
And since a closing parenthesis cannot appear in an int literal, this:
(1).__class__
gets read as this:
<Operator: '('> <IntLiteral: '1'> <Operator: ')'> <Operator: '.'> <Identifier: '__class__'>
The above implies that, amusingly, the following is also valid:
1..__class__ # => <type 'float'>
The decimal part of a float literal is optional, and the second . read will make the preceding input be recognized as one.

It is a tokenization issue... the . is parsed as the beginning of the fractional part of a floating point number.
You can use
(1).__class__
to avoid the problem

Because if there's a . after a number, python thinks you're creating a float. When it encounters something else that isn't a number, it will throw an error.
However, in a float, python doesn't expect another . to be a part of the value, hence the result! It works. :)
How do we get the attributes, then?
You can easily wrap it in parentheses. For example, see this console session:
>>> (1).__class__
<type 'int'>
Now, Python knows that you're not trying to make a float, but to refer to the int itself.
Bonus: putting a blank space after the number works as well.
>>> 1 .__class__
<type 'int'>
Also, if you only want to get the __class__, type(1) will do it for you.
Hope this helps!

Or you can even do this:
>>> getattr(1 , '__class__')
<type 'int'>

You need parenthesis to surround the number:
>>> (1).__class__
<type 'int'>
>>>
Otherwise, Python sees the . after the number and it tries to interpret the whole thing as a float.

Getting the attributes of integers and floats [duplicate]

This question already has answers here:
Why doesn't 2.__add__(3) work in Python?
(2 answers)
Closed 8 years ago.
I was playing with the Python interpreter (Python 3.2.3) and tried the following:
>>> dir(1)
This gave me all the attributes and methods of the int object. Next I tried:
>>> 1.__class__
However this threw an exception:
File "<stdin>", line 1
1.__class__
^
SyntaxError: invalid syntax
When I tried out the same with a float I got what I expected:
>>> 2.0.__class__
<class 'float'>
Why do int and float literals behave differently?

It is a tokenization issue... the . is parsed as the beginning of the fractional part of a floating point number.
You can use
(1).__class__
to avoid the problem

Because if there's a . after a number, python thinks you're creating a float. When it encounters something else that isn't a number, it will throw an error.
However, in a float, python doesn't expect another . to be a part of the value, hence the result! It works. :)
How do we get the attributes, then?
You can easily wrap it in parentheses. For example, see this console session:
>>> (1).__class__
<type 'int'>
Now, Python knows that you're not trying to make a float, but to refer to the int itself.
Bonus: putting a blank space after the number works as well.
>>> 1 .__class__
<type 'int'>
Also, if you only want to get the __class__, type(1) will do it for you.
Hope this helps!

Or you can even do this:
>>> getattr(1 , '__class__')
<type 'int'>

You need parenthesis to surround the number:
>>> (1).__class__
<type 'int'>
>>>
Otherwise, Python sees the . after the number and it tries to interpret the whole thing as a float.

How to know if a Unicode identifier is valid?

Here is a sample scenario -
>>> এক = 1
>>> ১ = 1
File "<stdin>", line 1
১ = 1
^
SyntaxError: invalid character in identifier
I'm running this from default python3 interpreter. Why do the first unicode string work as an identifier and not the second one?

Valid identifiers are explained in the Python 3 documentation: Lexical analysis - Identifiers and keywords.
The exact details can found in PEP 3131.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.