How to tell if a single line of python is syntactically valid?

How to tell if a single line of python is syntactically valid? - python

It is very similar to this:
How to tell if a string contains valid Python code
The only difference being instead of the entire program being given altogether, I am interested in a single line of code at a time.
Formally, we say a line of python is "syntactically valid" if there exists any syntactically valid python program that uses that particular line.
For instance, I would like to identify these as syntactically valid lines:
for i in range(10):
x = 1
Because one can use these lines in some syntactically valid python programs.
I would like to identify these lines as syntactically invalid lines:
for j in range(10 in range(10(
x =++-+ 1+-
Because no syntactically correct python programs could ever use these lines
The check does not need to be too strict, it just need to be good enough to filter out obviously bogus statements (like the ones shown above). The line is given as a string, of course.

This uses codeop.compile_command to attempt to compile the code. This is the same logic that the code module does to determine whether to ask for another line or immediately fail with a syntax error.
import codeop
def is_valid_code(line):
try:
codeop.compile_command(line)
except SyntaxError:
return False
else:
return True
It can be used as follows:
>>> is_valid_code('for i in range(10):')
True
>>> is_valid_code('')
True
>>> is_valid_code('x = 1')
True
>>> is_valid_code('for j in range(10 in range(10(')
True
>>> is_valid_code('x = ++-+ 1+-')
False
I'm sure at this point, you're saying "what gives? for j in range(10 in range(10( was supposed to be invalid!" The problem with this line is that 10() is technically syntactically valid, at least according to the Python interpreter. In the REPL, you get this:
>>> 10()
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
10()
TypeError: 'int' object is not callable
Notice how this is a TypeError, not a SyntaxError. ast.parse says it is valid as well, and just treats it as a call with the function being an ast.Num.
These kinds of things can't easily be caught until they actually run.
If some kind of monster managed to modify the value of the cached 10 value (which would technically be possible), you might be able to do 10(). It's still allowed by the syntax.
What about the unbalanced parentheses? This fits the same bill as for i in range(10):. This line is invalid on its own, but may be the first line in a multi-line expression. For example, see the following:
>>> is_valid_code('if x ==')
False
>>> is_valid_code('if (x ==')
True
The second line is True because the expression could continue like this:
if (x ==
3):
print('x is 3!')
and the expression would be complete. In fact, codeop.compile_command distinguishes between these different situations by returning a code object if it's a valid self-contained line, None if the line is expected to continue for a full expression, and throwing a SyntaxError on an invalid line.
However, you can also get into a much more complicated problem than initially stated. For example, consider the line ). If it's the start of the module, or the previous line is {, then it's invalid. However, if the previous line is (1,2,, it's completely valid.
The solution given here will work if you only work forward, and append previous lines as context, which is what the code module does for an interactive session. Creating something that can always accurately identify whether a single line could possibly exist in a Python file without considering surrounding lines is going to be extremely difficult, as the Python grammar interacts with newlines in non-trivial ways. This answer responds with whether a given line could be at the beginning of a module and continue on to the next line without failing.
It would be better to identify what the purpose of recognizing single lines is and solve that problem in a different way than trying to solve this for every case.

I am just suggesting, not sure if going to work... But maybe something with exec and try-except?
code_line += "\n" + ("\t" if code_line[-1] == ":" else "") + "pass"
try:
exec code_line
except SyntaxError:
print "Oops! Wrong syntax..."
except:
print "Syntax all right"
else:
print "Syntax all right"
Simple lines should cause an appropriate answer

Related

Weird syntax with continue

My question concern the code that was posted in this question Questions about a tic-tac-toe program I am writing.
More precisely this line:
stop = int(0)# 0 = continue
First I didn't understand what he was trying to do and thought that it was a SyntaxError. But when I tried to execute this line, it didn't raise a SyntaxError, it just set stop to 0. Note this line is not inside a loop.
>>> stop = int(0)# 0 = continue
>>> stop
0
But this, as I expected, raise a error:
>>> int(0) = continue
File "<stdin>", line 1
int(0) = continue
^
SyntaxError: invalid syntax
Do someone know why that line is valied, thanx.

# introduces a comment. Everything after it is a comment and has no meaning to the Python interpreter. The comment is likely trying to say "zero means to continue".
PEP8 advises that "inline comments should be separated by at least two spaces from the statement", which would probably have removed some confusion here.

Syntax error with exec call in Python

Quick python question about the exec command. I'm have Python 2.7.6 and am trying to make use of the exec to run some code stored in a .txt file. I've run into a syntax error and am not entirely sure what is causing it.
Traceback (most recent call last):
File "/Users/XYZ/Desktop/parser.py", line 46, in <module>
try_code(block)
File "<string>", line 1
x = 'Hello World!'
^
SyntaxError: invalid syntax
I initially thought it was complaining about carriage returns, but when I tried to edit them .replace them with ' ' I still received this error message. I've tried variations to see what appears to be the issue and it always declares the error as the first ' or " the program encounters when it runs exec.
Here is the try_code(block) method
def try_code(block):
exec block
And the main body of the program
inputFile = open('/Users/XYZ/Desktop/test.txt', 'r+')
starter = False
finished = False
check = 1
block = ""
for val in inputFile:
starter = lookForStart(val)
finished = lookForEnd(val)
if lookForStart:
check = 1
elif finished:
try_code(block)
if check == 1:
check = 0
elif finished == False:
block = block + val
Basically I'm trying to import a file (test.txt) and then look for some embedded code in it. To make it easier I surrounded it with indicators, thus starter and finished. Then I concatenate all the hidden code into one string and call try_code on it. Then try_code attempts to execute it (it does make it there, check with print statements) and fails with the Syntax error.
As a note it works fine if I have hidden something like...
x = 5
print x
so whatever the problem is appears to be dealing with strings for some reason.
EDIT
It would appear that textedit includes some extra characters that aren't displayed normally. I rewrote the test file in a different text editor (text wrangler) and it would seem that the characters have disappeared. Thank you all very much for helping me solve my problem, I appreciate it.

It is a character encoding issue. Unless you've explicitly declared character encoding of your Python source code at the top of the file; it is 'ascii' by default on Python 2.
The code that you're trying to execute contains non-ascii quotes:
>>> print b'\xe2\x80\x98Hello World\xe2\x80\x99'.decode('utf-8')
‘Hello World’
To fix it; use ordinary single quotes instead: 'Hello World'.
You can check that block doesn't contain non-ascii characters using decode method: block.decode('ascii') it raises an exception if there are.
A gentle introduction into the topic: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
by Joel Spolsky.

Why not python implicit line continuation on period?

Is there any reason Python does not allow implicit line continuations after (or before) periods? That is
data.where(lambda d: e.name == 'Obama').
count()
data.where(lambda d: e.name == 'Obama')
.count()
Does this conflict with some feature of Python? With the rise of method chaining APIs this seems like a nice feature.

Both of those situations can lead to valid, complete constructs, so continuing on them would complicate the parser.
print 3.
1415926
print 'Hello, world'
.lower()

Python allow line continuations within parentheticals (), so you might try:
(data.where(lambda d: e.name == 'Obama').
count())
I know that's not answering your question ("why?"), but maybe it's helpful.

Use a '\' at the end. (looks ugly though)
data.where(lambda d: e.name == 'Obama').\
count()

Not sure about after periods, but in your example the newline before a period leads to the first line being a valid statement on its own. Then Python would have to look ahead to the second line to know whether the first line was a statement or not.
One of the goals when defining the language syntax was to be able to parse it without having ambiguities that require looking ahead like that.
It'd get annoying in the interactive interpreter if you had to press enter twice after every single line just so Python knew you'd finished your statement and weren't going to put a .foo() after it.

In the cases where a period could be leading in to a method call, it will always(?) be a syntax error for it to just occur at the end of a line by itself. So it would be unambiguous to read it as starting a continuation.
However, Python generally speaking doesn't continue a line just because there's an incomplete binary operator there. For instance, the following is not valid:
2 +
4
In the second example, the first line is valid by itself and it would be really inconsistent for Python to look for a following line "just in case" there is one.
I would just break after the opening paren of the method call.

{Because python uses line breaks to end statements, not depending on braces or semi-colins;}

Mixing files and loops

I'm writing a script that logs errors from another program and restarts the program where it left off when it encounters an error. For whatever reasons, the developers of this program didn't feel it necessary to put this functionality into their program by default.
Anyways, the program takes an input file, parses it, and creates an output file. The input file is in a specific format:
UI - 26474845
TI - the title (can be any number of lines)
AB - the abstract (can also be any number of lines)
When the program throws an error, it gives you the reference information you need to track the error - namely, the UI, which section (title or abstract), and the line number relative to the beginning of the title or abstract. I want to log the offending sentences from the input file with a function that takes the reference number and the file, finds the sentence, and logs it. The best way I could think of doing it involves moving forward through the file a specific number of times (namely, n times, where n is the line number relative to the beginning of the seciton). The way that seemed to make sense to do this is:
i = 1
while i <= lineNumber:
print original.readline()
i += 1
I don't see how this would make me lose data, but Python thinks it would, and says ValueError: Mixing iteration and read methods would lose data. Does anyone know how to do this properly?

You get the ValueError because your code probably has for line in original: in addition to original.readline(). An easy solution which fixes the problem without making your program slower or consume more memory is changing
for line in original:
...
to
while True:
line = original.readline()
if not line: break
...

Use for and enumerate.
Example:
for line_num, line in enumerate(file):
if line_num < cut_off:
print line
NOTE: This assumes you are already cleaning up your file handles, etc.
Also, the takewhile function could prove useful if you prefer a more functional flavor.

Assuming you need only one line, this could be of help
import itertools
def getline(fobj, line_no):
"Return a (1-based) line from a file object"
return itertools.islice(fobj, line_no-1, line_no).next() # 1-based!
>>> print getline(open("/etc/passwd", "r"), 4)
'adm:x:3:4:adm:/var/adm:/bin/false\n'
You might want to catch StopIteration errors (if the file has less lines).

Here's a version without the ugly while True pattern and without other modules:
for line in iter(original.readline, ''):
if …: # to the beginning of the title or abstract
for i in range(lineNumber):
print original.readline(),
break

Problem with Boolean Expression with a string value from a lIst

I have the following problem:
# line is a line from a file that contains ["baa","beee","0"]
line = TcsLine.split(",")
NumPFCs = eval(line[2])
if NumPFCs==0:
print line
I want to print all the lines from the file if the second position of the list has a value == 0.
I print the lines but after that the following happens:
Traceback (most recent call last):
['baaa', 'beee', '0', '\n']
BUT after I have the next ERROR
ilation.py", line 141, in ?
getZeroPFcs()
ilation.py", line 110, in getZeroPFcs
NumPFCs = eval(line[2])
File "<string>", line 0
Can you please help me?
thanks
What0s

Let me explain a little what you do here.
If you write:
NumPFCs = eval(line[2])
the order of evaluation is:
take the second character of the string line, i.e. a quote '"'
eval this quote as a python expression, which is an error.
If you write it instead as:
NumPFCs = eval(line)[2]
then the order of evaluation is:
eval the line, producing a python list
take the second element of that list, which is a one-character string: "0"
a string cannot be compared with a number; this is an error too.
In your terms, you want to do the following:
NumPFCs = eval(eval(line)[2])
or, slightly better, compare NumPFCs to a string:
if NumPFCs == "0":
but the ways this could go wrong are almost innumerable. You should forget about eval and try to use other methods: string splitting, regular expressions etc. Others have already provided some suggestions, and I'm sure more will follow.

Your question is kind of hard to read, but using eval there is definitely not a good idea. Either just do a direct string comparison:
line=TcsLine.split(",")
if line[2] == "0":
print line
or use int
line=TcsLine.split(",")
if int(line[2]) == 0:
print line
Either way, your bad data will fail you.
I'd also recomment reading PEP 8.

There are a few issues I see in your code segment:
you make an assumption that list always has at least 3 elements
eval will raise exception if containing string isn't valid python
you say you want second element, but you access the 3rd element.
This is a safer way to do this
line=TcsLine.split(",")
if len(line) >=3 and line[2].rfind("0") != -1:
print line

I'd recommend using a regular expression to capture all of the variants of how 0 can be specified: with double-quotes, without any quotes, with single quotes, with extra whitespace outside the quotes, with whitespace inside the quotes, how you want the square brackets handled, etc.

There are many ways of skinning a cat, as it were :)
Before we begin though, don't use eval on strings that are not yours so if the string has ever left your program; i.e. it has stayed in a file, sent over a network, someone can send in something nasty. And if someone can, you can be sure someone will.
And you might want to look over your data format. Putting strings like ["baa","beee","0", "\n"] in a file does not make much sense to me.
The first and simplest way would be to just strip away the stuff you don't need and to a string comparison. This would work as long as the '0'-string always looks the same and you're not really after the integer value 0, only the character pattern:
TcsLine = '["baa","beee","0"]'
line = TcsLine.strip('[]').split(",")
if line[2] == '"0"':
print line
The second way would be to similar to the first except that we cast the numeric string to an integer, yielding the integer value you were looking for (but printing 'line' without all the quotation marks):
TcsLine = '["baa","beee","0"]'
line = [e.strip('"') for e in TcsLine.strip('[]').split(",")]
NumPFCs = int(line[2])
if NumPFCs==0:
print line
Could it be that the string is actually a json array? Then I would probably go get simplejson to parse it properly if I were running Python<2.6 or just import json on Python>=2.6. Then cast the resulting '0'-string to an integer as in the previous example.
TcsLine = '["baa","beee","0"]'
#import json # for >= Python2.6
import simplejson as json # for <Python2.6
line = json.loads(TcsLine)
NumPFCs = int(line[2])
if NumPFCs==0:
print line

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to tell if a single line of python is syntactically valid? - python

Related

Weird syntax with continue

Syntax error with exec call in Python

Why not python implicit line continuation on period?

Mixing files and loops

Problem with Boolean Expression with a string value from a lIst

Categories

Resources