Protect against null environment variables when using os.path.expandvars

Protect against null environment variables when using os.path.expandvars - python

How can I protect against Python's os.path.expandvars() treatment of null/unset environment variables?
From os.path:
Malformed variable names and references to non-existing variables are left unchanged.
>>> os.path.expandvars('$HOME/stuff')
'/home/dennis/stuff'
>>> os.path.expandvars('foo/$UNSET/bar')
'foo/$UNSET/bar'
I could perform this step separately from other path processing (expanduser(), realpath(), normpath(), etc.) instead of chaining them all together and check to see if the result is unchanged, but that is normal when there are no variables present - so I would also have to parse the string to see if it has any variables. I fear that may not be robust enough.
The issue comes into play when creating a file using the result. I end up with a file with the variable name as a literal part of the file's name. I want to instead reject the input with an exception.

You could use string.Template, which uses a similar dollar-sign syntax for interpolation of variables but will raise KeyError if something doesn't exist rather than leaving it in.
import os
from string import Template
print(Template('$HOME/stuff').substitute(os.environ))

Extending on Jason's:
def expand_user_vars(s, variants='$%s ${%s} %%%s%%'):
'''Return a string expanded for both leading "~/" or "~username/" and
environment variables in the forms given by variants.
>>> s = "~roland/.local/%XYZ%$XYZ${XYZ}"
>>> expand_user_vars(s)
'/home/roland/.local/'
>>> s = "$HOME/.local/%XYZ%$XYZ${XYZ}"
>>> expand_user_vars(s)
'/home/roland/.local/'
>>> s = "$EDITOR"
>>> 'EDITOR' not in expand_user_vars(s)
True
'''
s = os.path.expanduser(s)
#python2 does not have KeyError in str(e)
remx = re.compile(r"(?:KeyError:)?\s*'(\w+)'")
while True:
try:
s = string.Template(s).substitute(os.environ)
break
except KeyError as e:
reme = str(e)
remxo = remx.match(reme)
if remxo:
g1 = remxo.group(1)
for v in variants.split():
s = s.replace(v%g1,'')
continue
return s
Else, echo is there on Linux and MacOS and Windows.
Replace below example with ...'echo '+s...:
import subprocess
netrc_file = subprocess.check_output('echo ${NETRC:-~/.netrc}',shell=True)

Related

Storing the entire workspace output? [duplicate]

I want to save all the variables in my current python environment. It seems one option is to use the 'pickle' module. However, I don't want to do this for 2 reasons:
I have to call pickle.dump() for each variable
When I want to retrieve the variables, I must remember the order in which I saved the variables, and then do a pickle.load() to retrieve each variable.
I am looking for some command which would save the entire session, so that when I load this saved session, all my variables are restored. Is this possible?
Edit: I guess I don't mind calling pickle.dump() for each variable that I would like to save, but remembering the exact order in which the variables were saved seems like a big restriction. I want to avoid that.

If you use shelve, you do not have to remember the order in which the objects are pickled, since shelve gives you a dictionary-like object:
To shelve your work:
import shelve
T='Hiya'
val=[1,2,3]
filename='/tmp/shelve.out'
my_shelf = shelve.open(filename,'n') # 'n' for new
for key in dir():
try:
my_shelf[key] = globals()[key]
except TypeError:
#
# __builtins__, my_shelf, and imported modules can not be shelved.
#
print('ERROR shelving: {0}'.format(key))
my_shelf.close()
To restore:
my_shelf = shelve.open(filename)
for key in my_shelf:
globals()[key]=my_shelf[key]
my_shelf.close()
print(T)
# Hiya
print(val)
# [1, 2, 3]

Having sat here and failed to save the globals() as a dictionary, I discovered you can pickle a session using the dill library.
This can be done by using:
import dill #pip install dill --user
filename = 'globalsave.pkl'
dill.dump_session(filename)
# and to load the session again:
dill.load_session(filename)

One very easy way that might satisfy your needs. For me, it did pretty well:
Simply, click on this icon on the Variable Explorer (right side of Spider):

Here is a way saving the Spyder workspace variables using the spyderlib functions
#%% Load data from .spydata file
from spyderlib.utils.iofuncs import load_dictionary
globals().update(load_dictionary(fpath)[0])
data = load_dictionary(fpath)
#%% Save data to .spydata file
from spyderlib.utils.iofuncs import save_dictionary
def variablesfilter(d):
from spyderlib.widgets.dicteditorutils import globalsfilter
from spyderlib.plugins.variableexplorer import VariableExplorer
from spyderlib.baseconfig import get_conf_path, get_supported_types
data = globals()
settings = VariableExplorer.get_settings()
get_supported_types()
data = globalsfilter(data,
check_all=True,
filters=tuple(get_supported_types()['picklable']),
exclude_private=settings['exclude_private'],
exclude_uppercase=settings['exclude_uppercase'],
exclude_capitalized=settings['exclude_capitalized'],
exclude_unsupported=settings['exclude_unsupported'],
excluded_names=settings['excluded_names']+['settings','In'])
return data
def saveglobals(filename):
data = globalsfiltered()
save_dictionary(data,filename)
#%%
savepath = 'test.spydata'
saveglobals(savepath)
Let me know if it works for you.
David B-H

What you're trying to do is to hibernate your process. This was discussed already. The conclusion is that there are several hard-to-solve problems exist while trying to do so. For example with restoring open file descriptors.
It is better to think about serialization/deserialization subsystem for your program. It is not trivial in many cases, but is far better solution in long-time perspective.
Although if I've exaggerated the problem. You can try to pickle your global variables dict. Use globals() to access the dictionary. Since it is varname-indexed you haven't to bother about the order.

If you want the accepted answer abstracted to function you can use:
import shelve
def save_workspace(filename, names_of_spaces_to_save, dict_of_values_to_save):
'''
filename = location to save workspace.
names_of_spaces_to_save = use dir() from parent to save all variables in previous scope.
-dir() = return the list of names in the current local scope
dict_of_values_to_save = use globals() or locals() to save all variables.
-globals() = Return a dictionary representing the current global symbol table.
This is always the dictionary of the current module (inside a function or method,
this is the module where it is defined, not the module from which it is called).
-locals() = Update and return a dictionary representing the current local symbol table.
Free variables are returned by locals() when it is called in function blocks, but not in class blocks.
Example of globals and dir():
>>> x = 3 #note variable value and name bellow
>>> globals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'x': 3, '__doc__': None, '__package__': None}
>>> dir()
['__builtins__', '__doc__', '__name__', '__package__', 'x']
'''
print 'save_workspace'
print 'C_hat_bests' in names_of_spaces_to_save
print dict_of_values_to_save
my_shelf = shelve.open(filename,'n') # 'n' for new
for key in names_of_spaces_to_save:
try:
my_shelf[key] = dict_of_values_to_save[key]
except TypeError:
#
# __builtins__, my_shelf, and imported modules can not be shelved.
#
#print('ERROR shelving: {0}'.format(key))
pass
my_shelf.close()
def load_workspace(filename, parent_globals):
'''
filename = location to load workspace.
parent_globals use globals() to load the workspace saved in filename to current scope.
'''
my_shelf = shelve.open(filename)
for key in my_shelf:
parent_globals[key]=my_shelf[key]
my_shelf.close()
an example script of using this:
import my_pkg as mp
x = 3
mp.save_workspace('a', dir(), globals())
to get/load the workspace:
import my_pkg as mp
x=1
mp.load_workspace('a', globals())
print x #print 3 for me
it worked when I ran it. I will admit I don't understand dir() and globals() 100% so I am not sure if there might be some weird caveat, but so far it seems to work. Comments are welcome :)
after some more research if you call save_workspace as I suggested with globals and save_workspace is within a function it won't work as expected if you want to save the veriables in a local scope. For that use locals(). This happens because globals takes the globals from the module where the function is defined, not from where it is called would be my guess.

You can save it as a text file or a CVS file. People use Spyder for example to save variables but it has a known issue: for specific data types it fails to import down in the road.

Executing an import statement string and using the import [duplicate]

How do I execute a string containing Python code in Python?
Do not ever use eval (or exec) on data that could possibly come from outside the program in any form. It is a critical security risk. You allow the author of the data to run arbitrary code on your computer. If you are here because you want to create multiple variables in your Python program following a pattern, you almost certainly have an XY problem. Do not create those variables at all - instead, use a list or dict appropriately.

For statements, use exec(string) (Python 2/3) or exec string (Python 2):
>>> my_code = 'print("hello world")'
>>> exec(my_code)
Hello world
When you need the value of an expression, use eval(string):
>>> x = eval("2+2")
>>> x
4
However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs.

In the example a string is executed as code using the exec function.
import sys
import StringIO
# create file-like string to capture output
codeOut = StringIO.StringIO()
codeErr = StringIO.StringIO()
code = """
def f(x):
x = x + 1
return x
print 'This is my output.'
"""
# capture output and errors
sys.stdout = codeOut
sys.stderr = codeErr
exec code
# restore stdout and stderr
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print f(4)
s = codeErr.getvalue()
print "error:\n%s\n" % s
s = codeOut.getvalue()
print "output:\n%s" % s
codeOut.close()
codeErr.close()

eval and exec are the correct solution, and they can be used in a safer manner.
As discussed in Python's reference manual and clearly explained in this tutorial, the eval and exec functions take two extra parameters that allow a user to specify what global and local functions and variables are available.
For example:
public_variable = 10
private_variable = 2
def public_function():
return "public information"
def private_function():
return "super sensitive information"
# make a list of safe functions
safe_list = ['public_variable', 'public_function']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# add any needed builtins back in
safe_dict['len'] = len
>>> eval("public_variable+2", {"__builtins__" : None }, safe_dict)
12
>>> eval("private_variable+2", {"__builtins__" : None }, safe_dict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'private_variable' is not defined
>>> exec("print \"'%s' has %i characters\" % (public_function(), len(public_function()))", {"__builtins__" : None}, safe_dict)
'public information' has 18 characters
>>> exec("print \"'%s' has %i characters\" % (private_function(), len(private_function()))", {"__builtins__" : None}, safe_dict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'private_function' is not defined
In essence you are defining the namespace in which the code will be executed.

Remember that from version 3 exec is a function!
so always use exec(mystring) instead of exec mystring.

Avoid exec and eval
Using exec and eval in Python is highly frowned upon.
There are better alternatives
From the top answer (emphasis mine):
For statements, use exec.
When you need the value of an expression, use eval.
However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs.
From Alternatives to exec/eval?
set and get values of variables with the names in strings
[while eval] would work, it is generally not advised to use variable names bearing a meaning to the program itself.
Instead, better use a dict.
It is not idiomatic
From http://lucumr.pocoo.org/2011/2/1/exec-in-python/ (emphasis mine)
Python is not PHP
Don't try to circumvent Python idioms because some other language does it differently. Namespaces are in Python for a reason and just because it gives you the tool exec it does not mean you should use that tool.
It is dangerous
From http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html (emphasis mine)
So eval is not safe, even if you remove all the globals and the builtins!
The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there's just one item left off the list, you can attack the system.
So, can eval be made safe? Hard to say. At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...
It is hard to read and understand
From http://stupidpythonideas.blogspot.it/2013/05/why-evalexec-is-bad.html (emphasis mine):
First, exec makes it harder to human beings to read your code. In order to figure out what's happening, I don't just have to read your code, I have to read your code, figure out what string it's going to generate, then read that virtual code. So, if you're working on a team, or publishing open source software, or asking for help somewhere like StackOverflow, you're making it harder for other people to help you. And if there's any chance that you're going to be debugging or expanding on this code 6 months from now, you're making it harder for yourself directly.

eval() is just for expressions, while eval('x+1') works, eval('x=1') won't work for example. In that case, it's better to use exec, or even better: try to find a better solution :)

It's worth mentioning that exec's brother exists as well, called execfile, if you want to call a Python file. That is sometimes good if you are working in a third party package which have terrible IDE's included and you want to code outside of their package.
Example:
execfile('/path/to/source.py')
or:
exec(open("/path/to/source.py").read())

You accomplish executing code using exec, as with the following IDLE session:
>>> kw = {}
>>> exec( "ret = 4" ) in kw
>>> kw['ret']
4

As the others mentioned, it's "exec" ..
but, in case your code contains variables, you can use "global" to access it, also to prevent the compiler to raise the following error:
NameError: name 'p_variable' is not defined
exec('p_variable = [1,2,3,4]')
global p_variable
print(p_variable)

I tried quite a few things, but the only thing that worked was the following:
temp_dict = {}
exec("temp_dict['val'] = 10")
print(temp_dict['val'])
output:
10

Use eval.

Check out eval:
x = 1
print eval('x+1')
->2

The most logical solution would be to use the built-in eval() function .Another solution is to write that string to a temporary python file and execute it.

Ok .. I know this isn't exactly an answer, but possibly a note for people looking at this as I was. I wanted to execute specific code for different users/customers but also wanted to avoid the exec/eval. I initially looked to storing the code in a database for each user and doing the above.
I ended up creating the files on the file system within a 'customer_filters' folder and using the 'imp' module, if no filter applied for that customer, it just carried on
import imp
def get_customer_module(customerName='default', name='filter'):
lm = None
try:
module_name = customerName+"_"+name;
m = imp.find_module(module_name, ['customer_filters'])
lm = imp.load_module(module_name, m[0], m[1], m[2])
except:
''
#ignore, if no module is found,
return lm
m = get_customer_module(customerName, "filter")
if m is not None:
m.apply_address_filter(myobj)
so customerName = "jj"
would execute apply_address_filter from the customer_filters\jj_filter.py file

Pythonically check if a variable name is valid

tldr; see the final line; the rest is just preamble.
I am developing a test harness, which parses user scripts and generates a Python script which it then runs. The idea is for non-techie folks to be able to write high-level test scripts.
I have introduced the idea of variables, so a user can use the LET keyword in his script. E.g. LET X = 42, which I simply expand to X = 42. They can then use X later in their scripts - RELEASE CONNECTION X
But what if someone writes LET 2 = 3? That's going to generate invalid Python.
If I have that X in a variable variableName, then how can I check whether variableName is a valid Python variable?

In Python 3 you can use str.isidentifier() to test whether a given string is a valid Python identifier/name.
>>> 'X'.isidentifier()
True
>>> 'X123'.isidentifier()
True
>>> '2'.isidentifier()
False
>>> 'while'.isidentifier()
True
The last example shows that you should also check whether the variable name clashes with a Python keyword:
>>> from keyword import iskeyword
>>> iskeyword('X')
False
>>> iskeyword('while')
True
So you could put that together in a function:
from keyword import iskeyword
def is_valid_variable_name(name):
return name.isidentifier() and not iskeyword(name)
Another option, which works in Python 2 and 3, is to use the ast module:
from ast import parse
def is_valid_variable_name(name):
try:
parse('{} = None'.format(name))
return True
except SyntaxError, ValueError, TypeError:
return False
>>> is_valid_variable_name('X')
True
>>> is_valid_variable_name('123')
False
>>> is_valid_variable_name('for')
False
>>> is_valid_variable_name('')
False
>>> is_valid_variable_name(42)
False
This will parse the assignment statement without actually executing it. It will pick up invalid identifiers as well as attempts to assign to a keyword. In the above code None is an arbitrary value to assign to the given name - it could be any valid expression for the RHS.

EDIT: this is wrong and implementation dependent - see comments.
Just have Python do its own check by making a dictionary with the variable holding the name as the key and splatting it as keyword arguments:
def _dummy_function(**kwargs):
pass
def is_valid_variable_name(name):
try:
_dummy_function(**{name: None})
return True
except TypeError:
return False
Notably, TypeError is consistently raised whenever a dict splats into keyword arguments but has a key which isn't a valid function argument, and whenever a dict literal is being constructed with an invalid key, so this will work correctly on anything you pass to it.

I don't think you need the exact same naming syntax as python itself.
Would rather go for a simple regexp like:
\w+
to make sure it's something alphanumeric, and then add a prefix to keep away from python's own syntax. So the non-techie user's declaration:
LET return = 12
should probably become after your parsing:
userspace_return = 12
or
userspace['return'] = 12

In Python 3, as above, you can simply use str.isidentifier. But in Python 2, this does not exist.
The tokenize module has a regex for names (identifiers): tokenize.Name. But I couldn't find any documentation for it, so it may not be available everywhere. It is simply r'[a-zA-Z_]\w*'. A single $ after it will let you test strings with re.match.
The docs say that an identifier is defined by this grammar:
identifier ::= (letter|"_") (letter | digit | "_")*
letter ::= lowercase | uppercase
lowercase ::= "a"..."z"
uppercase ::= "A"..."Z"
digit ::= "0"..."9"
Which is equivalent to the regex above. But we should still import tokenize.Name in case this ever changes. (Which is very unlikely, but maybe in older versions of Python it was different?)
And to filter out keywords, like pass, def and return, use keyword.iskeyword. There is one caveat: None is not a keyword in Python 2, but still can't be assigned to. (keyword.iskeyword('None') in Python 2 is False).
So:
import keyword
if hasattr(str, 'isidentifier'):
_isidentifier = str.isidentifier
else:
import re
_fallback_pattern = '[a-zA-Z_][a-zA-Z0-9_]*'
try:
import tokenize
except ImportError:
_isidentifier = re.compile(_fallback_pattern + '$').match
else:
_isidentifier = re.compile(
getattr(tokenize, 'Name', _fallback_pattern) + '$'
).match
del _fallback_pattern
def isname(s):
return bool(_isidentifier(s)) and not keyword.iskeyword(s) and s != 'None'

You could try a test assignment and see if it raises a SyntaxError:
>>> 2fg = 5
File "<stdin>", line 1
2fg = 5
^
SyntaxError: invalid syntax

You could use exceptions handling and catch actually NameError and SyntaxError. Test it inside try/except block and inform user if there is some invalid input.

How to extract and then refer to variables defined in a python module?

I'm trying to build a simple environment check script for my firm's test environment. My goal is to be able to ping each of the hosts defined for a given test environment instance. The hosts are defined in a file like this:
#!/usr/bin/env python
host_ip = '192.168.100.10'
router_ip = '192.168.100.254'
fs_ip = '192.168.200.10'
How can I obtain all of these values in a way that is iterable (i.e. I need to loop through and ping each ip address)?
I've looked at local() and vars(), but trying do something like this:
for key, value in vars():
print key, value
generates this error:
ValueError: too many values to unpack
I have been able to extract the names of all variables by checking dir(local_variables) for values that don't contain a '__' string, but then I have a list of strings, and I can't figure out how to get from the string to the value of the same-named variable.

First off, I strongly recommend not doing it that way. Instead, do:
hosts = {
"host_ip": '192.168.100.10',
"router_ip": '192.168.100.254',
"fs_ip": '192.168.200.10',
}
Then you can simply import the module and reference it normally--this gives an ordinary, standard way to access this data from any Python code:
import config
for host, ip in config.hosts.iteritems():
...
If you do access variables directly, you're going to get a bunch of stuff you don't want: the builtins (__builtins__, __package__, etc); anything that was imported while setting up the other variables, etc.
You'll also want to make sure that the context you're running in is different from the one whose variables you're iterating over, or you'll be creating new variables in locals() (or vars(), or globals()) while you're iterating over it, and you'll get "RuntimeError: dictionary changed size during iteration".

You need to do vars().iteritems(). (or .items())
Looping over a dictionary like vars() will only extract the keys in the dictionary.
Example.
>>> for key in vars(): print key
...
__builtins__
__name__
__doc__
key
__package__
>>> for key, value in vars().items(): print key, value
...
__builtins__ <module '__builtin__' (built-in)>
value None
__package__ None
key __doc__
__name__ __main__
__doc__ None

Glenn Maynard makes a good point, using a dictionary is simpler and more standard. That being said, here are a couple of tricks I've used sometimes:
In the file hosts.py:
#!/usr/bin/env python
host_ip = '192.168.100.10'
router_ip = '192.168.100.254'
fs_ip = '192.168.200.10'
and in another file:
hosts_dict = {}
execfile('hosts.py', hosts_dict)
or
import hosts
hosts_dict = hosts.__dict__
But again, those are both rather hackish.

If this really is all that your imported file contains, you could also read it as just a text file, and then parse the input lines using basic string methods.
iplistfile = open("iplist.py")
host_addr_map = {}
for line in iplistfile:
if not line or line[0] == '#':
continue
host, ipaddr = map(str.strip, line.split('='))
host_addr_map[host] = ipaddr
iplistfile.close()
Now you have a dict of your ip addresses, addressable by host name. To get them all, just use basic dict-style methods:
for hostname in host_addr_map:
print hostname, host_addr_map[hostname]
print host_addr_map.keys()
This also has the advantage that it removes any temptation any misguided person might have to add more elaborate Python logic to what you thought was just a configuration file.

Here's a hack I use all the time. You got really close when you returned the list of string names, but you have to use the eval() function to return the actual object that bears the name represented by the string:
hosts = [eval('modulename.' + x) for x in dir(local_variables) if '_ip' in x]
If I'm not mistaken, this method also doesn't pose the same drawbacks as locals() and vars() explained by Glen Maynard.

dynamic if boolean with module lookup

I'm relatively new to python, and I'm trying to figure out an issue, where I'm trying to write a generalised function.
In this scenario, I have a settings_file.py file alongside my app.py program. The settings_file.py is optional.
Content of settings_file.py:
myvariable = "foo"
Earlier in my program I (optionally) import the settings file:
try:
import settings_file
except ImportError:
pass
native version:
if settings_file.myvariable:
print "myvariable found from settings_file!"
return settings_file.myvariable
What I'm trying to do:
element_name = "myvariable"
if settings_file.eval(element_name):
print "{} found from settings_file!".format(element_name)
return settings_file.element_name
I think what I'm struggling with, is the if statement line, more spefically, the settings_file.eval(element_name) part.
I'm quite new to this, and my understanding is python should resolve the element_name back to a string for me. Maybe I have not typeset correctly.
I cant get it to resolve the variable name.
Edit: Perhaps I've not been clear, or misleading. I'm trying to return the value in the settings file back from a generalised function that I'm writing this element within. My desired outcome is that "foo" will be returned, which is the value of variable "myvariable" in settings_file.py.
Edit2: I think I worked out something based on response from #9769953
if getattr(settings_file, element_name, None):
print "{} found from settings_file!".format(element_name)
return getattr(settings_file, element_name, None)

If you're evaluating variable names, or otherwise want flexible variable names, 9 out of 10 times, you should use dicts.
Here, there is (even) more going on: you should first check that settings_file actually exist, before trying to get an attribute. Then, you could use getattr(settings_file, element_name):
try:
import settings_file
except ImportError:
settings_file = None
...
element_name = "myvariable"
return getattr(settings_file, element_name, None)
which returns None in case "myvariable" was not found, or the import failed. It doesn't explicitly test for settings_file not to be None, since getattr() will then just return None whatever the value of element_name.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.