Pythonically check if a variable name is valid - python

tldr; see the final line; the rest is just preamble.
I am developing a test harness, which parses user scripts and generates a Python script which it then runs. The idea is for non-techie folks to be able to write high-level test scripts.
I have introduced the idea of variables, so a user can use the LET keyword in his script. E.g. LET X = 42, which I simply expand to X = 42. They can then use X later in their scripts - RELEASE CONNECTION X
But what if someone writes LET 2 = 3? That's going to generate invalid Python.
If I have that X in a variable variableName, then how can I check whether variableName is a valid Python variable?

In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.
>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True
The last example shows that you should also check whether the variable name clashes with a Python keyword:
>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True
So you could put that together in a function:
from keyword import iskeyword
def is_valid_variable_name(name):
return name.isidentifier() and not iskeyword(name)
Another option, which works in Python 2 and 3, is to use the ast module:
from ast import parse
def is_valid_variable_name(name):
try:
parse('{} = None'.format(name))
return True
except SyntaxError, ValueError, TypeError:
return False
>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False
This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

EDIT: this is wrong and implementation dependent - see comments.
Just have Python do its own check by making a dictionary with the variable holding the name as the key and splatting it as keyword arguments:
def _dummy_function(**kwargs):
pass
def is_valid_variable_name(name):
try:
_dummy_function(**{name: None})
return True
except TypeError:
return False
Notably, TypeError is consistently raised whenever a dict splats into keyword arguments but has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key, so this will work correctly on anything you pass to it.

I don't think you need the exact same naming syntax as python itself.
Would rather go for a simple regexp like:
\w+
to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:
LET return = 12
should probably become after your parsing:
userspace_return = 12
or
userspace['return'] = 12

In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.
The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.
The docs say that an identifier is defined by this grammar:
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"
Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)
And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).
So:
import keyword
if hasattr(str, 'isidentifier'):
_isidentifier = str.isidentifier
else:
import re
_fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
try:
import tokenize
except ImportError:
_isidentifier = re.compile(_fallback_pattern + '$').match
else:
_isidentifier = re.compile(
getattr(tokenize, 'Name', _fallback_pattern) + '$'
).match
del _fallback_pattern
def isname(s):
return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'

You could try a test assignment and see if it raises a SyntaxError:
>>> 2fg = 5
File "<stdin>", line 1
2fg = 5
^
SyntaxError: invalid syntax

You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

Related

Protect against null environment variables when using os.path.expandvars

How can I protect against Python's os.path.expandvars() treatment of null/unset environment variables?
From os.path:
Malformed variable names and references to non-existing variables are left unchanged.
>>> os.path.expandvars('$HOME/stuff')
'/home/dennis/stuff'
>>> os.path.expandvars('foo/$UNSET/bar')
'foo/$UNSET/bar'
I could perform this step separately from other path processing (expanduser(), realpath(), normpath(), etc.) instead of chaining them all together and check to see if the result is unchanged, but that is normal when there are no variables present - so I would also have to parse the string to see if it has any variables. I fear that may not be robust enough.
The issue comes into play when creating a file using the result. I end up with a file with the variable name as a literal part of the file's name. I want to instead reject the input with an exception.
You could use string.Template, which uses a similar dollar-sign syntax for interpolation of variables but will raise KeyError if something doesn't exist rather than leaving it in.
import os
from string import Template
print(Template('$HOME/stuff').substitute(os.environ))
Extending on Jason's:
def expand_user_vars(s, variants='$%s ${%s} %%%s%%'):
'''Return a string expanded for both leading "~/" or "~username/" and
environment variables in the forms given by variants.
>>> s = "~roland/.local/%XYZ%$XYZ${XYZ}"
>>> expand_user_vars(s)
'/home/roland/.local/'
>>> s = "$HOME/.local/%XYZ%$XYZ${XYZ}"
>>> expand_user_vars(s)
'/home/roland/.local/'
>>> s = "$EDITOR"
>>> 'EDITOR' not in expand_user_vars(s)
True
'''
s = os.path.expanduser(s)
#python2 does not have KeyError in str(e)
remx = re.compile(r"(?:KeyError:)?\s*'(\w+)'")
while True:
try:
s = string.Template(s).substitute(os.environ)
break
except KeyError as e:
reme = str(e)
remxo = remx.match(reme)
if remxo:
g1 = remxo.group(1)
for v in variants.split():
s = s.replace(v%g1,'')
continue
return s
Else, echo is there on Linux and MacOS and Windows.
Replace below example with ...'echo '+s...:
import subprocess
netrc_file = subprocess.check_output('echo ${NETRC:-~/.netrc}',shell=True)

Why are multiline comments allowed in empty functions but not single comments?

I am using python 2.7
My ide will display indent expected if I write a function like this
def foo():
#
but not if I write this
def foo():
'''
'''
Is there any reason why this happens?
A comment is something that's ignored by the compiler. When you put a comment on the line the compiler basically pretends it doesn't exist. But a multi-line string is a physical element of the code. Python recognizes its presence and makes no complaint.
If you want to write a function that doesn't do anything, at least for the moment, use pass.
def Foo():
#Comment goes here
pass
pass is a keyword that says 'something should go here, but I'm purposefully not putting anything here'.
''' is not actually a comment. It acts like one, but is, in fact, a string delimiter.
Try:
>>> s = '''
... '''
>>> print(s)
>>> repr(s)
"'\\n'"
>>> s = #
File "<stdin>", line 1
s = #
^
SyntaxError: invalid syntax
>>> s = '''foo'''
>>> print(s)
foo

Triple quotation in python

So I understand that if I do the following
print """ Anything I
type in here
works. Multiple LINES woohoo!"""
But what if following is my python script
""" This is my python Script. Just this much """
What does the above thing do? Is it taken as comment? Why is it not a syntax error?
Similarly, if I do
"This is my Python Script. Just this. Even with single quotes."
How are the above two scripts interpreted?
Thanks
The triple quotes ''' or """ are just different ways of representing strings. The advantage of triple quotes is that it can span multiple lines and sometimes serve as docstrings.
The reason:
"hadfasdfas"
doesn't raise any error is because python simply creates the string and then doesn't assign it to anything. For the python interpreter, it is perfectly fine if you have a pointless statement in your code as long as there are no syntax or semantics errors
Hope that helps.
The string is just evaluated, and the interpreter noticing it wasn't assigned to anything, throws it away.
But in some special places, this string is actually assigned to the __doc__ property of the item:
def func(arg):
"""
Does stuff. This string will be evaluated and assigned to func.__doc__.
"""
pass
class Test:
"""
Same for Test.__doc__
"""
pass
At the top of module.py:
"""
module does stuff. this will be assigned to module.__doc__
"""
def func():
...
In addition to #sshashank124 answer I have to add that triple quoted strings are also used in testing https://docs.python.org/2/library/doctest.html
So consider this code snippet:
def some_function(x, y):
"""This function should simply return sum of arguments.
It should throw an error if you pass string as argument
>>> some_function(5, 4)
9
>>> some_function(-5, 4)
-1
>>> some_function("abc", 4)
Traceback (most recent call last):
...
ValueError: arguments must numbers
"""
if type(x, str) or type(y, str):
raise ValueError("arguments must numbers")
else:
return x + y
if __name__ == "__main__":
import doctest
doctest.testmod()
If you import this tiny module, you'll get the some_function function.
But if you invoke this script directly from shell, tests given in the triple quoted string will be evaluated and the report will be printed to the output.
So triple quoted strings can be treated as values of type string, as comment, as docstrings and as containers for unittests.

What is 'print' in Python?

I understand what print does, but of what "type" is that language element? I think it's a function, but why does this fail?
>>> print print
SyntaxError: invalid syntax
Isn't print a function? Shouldn't it print something like this?
>>> print print
<function print at ...>
In 2.7 and down, print is a statement. In python 3, print is a function. To use the print function in Python 2.6 or 2.7, you can do
>>> from __future__ import print_function
>>> print(print)
<built-in function print>
See this section from the Python Language Reference, as well as PEP 3105 for why it changed.
In Python 3, print() is a built-in function (object)
Before this, print was a statement. Demonstration...
Python 2.x:
% pydoc2.6 print
The ``print`` statement
***********************
print_stmt ::= "print" ([expression ("," expression)* [","]]
| ">>" expression [("," expression)+ [","]])
``print`` evaluates each expression in turn and writes the resulting
object to standard output (see below). If an object is not a string,
it is first converted to a string using the rules for string
conversions. The (resulting or original) string is then written. A
space is written before each object is (converted and) written, unless
the output system believes it is positioned at the beginning of a
line. This is the case (1) when no characters have yet been written
to standard output, (2) when the last character written to standard
output is a whitespace character except ``' '``, or (3) when the last
write operation on standard output was not a ``print`` statement. (In
some cases it may be functional to write an empty string to standard
output for this reason.)
-----8<-----
Python 3.x:
% pydoc3.1 print
Help on built-in function print in module builtins:
print(...)
print(value, ..., sep=' ', end='\n', file=sys.stdout)
Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file: a file-like object (stream); defaults to the current sys.stdout.
sep: string inserted between values, default a space.
end: string appended after the last value, default a newline.
print is a mistake that has been rectified in Python 3. In Python 3 it is a function. In Python 1.x and 2.x it is not a function, it is a special form like if or while, but unlike those two it is not a control structure.
So, I guess the most accurate thing to call it is a statement.
In Python all statements (except assignment) are expressed with reserved words, not addressible objects. That is why you cannot simply print print and you get a SyntaxError for trying. It's a reserved word, not an object.
Confusingly, you can have a variable named print. You can't address it in the normal way, but you can setattr(locals(), 'print', somevalue) and then print locals()['print'].
Other reserved words that might be desirable as variable names but are nonetheless verboten:
class
import
return
raise
except
try
pass
lambda
In Python 2, print is a statement, which is a whole different kind of thing from a variable or function. Statements are not Python objects that can be passed to type(); they're just part of the language itself, even more so than built-in functions. For example, you could do sum = 5 (even though you shouldn't), but you can't do print = 5 or if = 7 because print and if are statements.
In Python 3, the print statement was replaced with the print() function. So if you do type(print), it'll return <class 'builtin_function_or_method'>.
BONUS:
In Python 2.6+, you can put from __future__ import print_function at the top of your script (as the first line of code), and the print statement will be replaced with the print() function.
>>> # Python 2
>>> from __future__ import print_function
>>> type(print)
<type 'builtin_function_or_method'>

python's print function not exactly an ordinary function?

Environment: python 2.x
If print is a built-in function, why does it not behave like other functions ? What is so special about print ?
-----------start session--------------
>>> ord 'a'
Exception : invalid syntax
>>> ord('a')
97
>>> print 'a'
a
>>> print('a')
a
>>> ord
<built-in function ord>
>>> print
-----------finish session--------------
The short answer is that in Python 2, print is not a function but a statement.
In all versions of Python, almost everything is an object. All objects have a type. We can discover an object's type by applying the type function to the object.
Using the interpreter we can see that the builtin functions sum and ord are exactly that in Python's type system:
>>> type(sum)
<type 'builtin_function_or_method'>
>>> type(ord)
<type 'builtin_function_or_method'>
But the following expression is not even valid Python:
>>> type(print)
SyntaxError: invalid syntax
This is because the name print itself is a keyword, like if or return. Keywords are not objects.
The more complete answer is that print can be either a statement or a function depending on the context.
In Python 3, print is no longer a statement but a function.
In Python 2, you can replace the print statement in a module with the equivalent of Python 3's print function by including this statement at the top of the module:
from __future__ import print_function
This special import is available only in Python 2.6 and above.
Refer to the documentation links in my answer for a more complete explanation.
print in Python versions below 3, is not a function. There's a separate print statement which is part of the language grammar. print is not an identifier. It's a keyword.
The deal is that print is built-in function only starting from python 3 branch. Looks like you are using python2.
Check out:
print "foo"; # Works in python2, not in python3
print("foo"); # Works in python3
print is more treated like a keyword than a function in python. The parser "knows" the special syntax of print (no parenthesis around the argument) and how to deal with it. I think the Python creator wanted to keep the syntax simple by doing so. As maverik already mentioned, in python3 print is being called like any other function and a syntx error is being thrown if you do it the old way.

Categories