KeyError when formatting locals

KeyError when formatting locals - python

I'm facing a KeyError I can't explain or understand.
I have a notebook, in which I define a variable PREFIX in a cell:
PREFIX = "/home/mavax/Documents/info/notebook/log_study"
which is simply a path to a folder containing logs, so people using the notebook just need to change the path if they want to execute the code below.
Then, later (quite a bunch of cells beneath), I use it, without any problem:
for basename in ["log_converted_full.txt", "log_converted_trimmed.txt"]:
entries = load_log_for_insertion("%(PREFIX)s/datasets/logs/%(basename)s" % locals())
pprint(entries)
I then get the output I expect, meaning files are found and the (very long) output from the logs is being printed.
I have some more cells describing the structure I implement for this problem, and when the time comes to execute again the same piece of code, I get the KeyError:
Code bringing the error:
def demo_synthetic_dig_dag(data_size):
for basename in ["alert_converted_trimmed.txt"]:
###
entries = load_log_for_insertion("%(PREFIX)s/datasets/logs/%(basename)s" % locals())[:data_size]
g = AugmentedDigDag()
g.build(entries)
html(
"""
<table>
<tr><td>%s</td></tr>
</table>
""" % (
synthetic_graph_to_html(g, 2, 0.03)
)
)
and, in the next cell:
demo_synthetic_dig_dag(200)
Jupyter output:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-179-7c2a79d0afd6> in <module>()
----> 1 demo_synthetic_dig_dag_armen(200)
<ipython-input-178-d17f57de3c01> in demo_synthetic_dig_dag(data_size)
18 for basename in ["log_converted_trimmed.txt"]:
19 ###
---> 20 entries = load_log_for_insertion("%(PREFIX)s/datasets/logs/%(basename)s" % locals())[:data_size]
21 g = AugmentedDigDag()
22 g.build(entries)
KeyError: 'PREFIX'
I'm pretty sure the mistake is quite simple and plain stupid, but still, if someone could open my eyes, i'd be very thankful !

Outside a function, locals() is the same as globals(), so you have no issue.
When placed inside a function, though, locals() doesn't contain PREFIX in any way (it is stored in globals(), it contains the local names for that function. That's why formatting these fail, it's trying to get a key named PREFIX from the dictionary returned from the locals() dict.
Instead of formatting with %, why not just use .format:
"{}/datasets/logs/{}s".format(PREFIX, basename)
Alternatively, you could bring PREFIX in the local scope with an additional parameter to your function:
def demo_synthetic_dig_dag(data_size, PREFIX=PREFIX):
but I don't see much of an upside to that. (Yes, there is a small performance boost for local look-up but I doubt it would play a role)

Related

Executing an import statement string and using the import [duplicate]

How do I execute a string containing Python code in Python?
Do not ever use eval (or exec) on data that could possibly come from outside the program in any form. It is a critical security risk. You allow the author of the data to run arbitrary code on your computer. If you are here because you want to create multiple variables in your Python program following a pattern, you almost certainly have an XY problem. Do not create those variables at all - instead, use a list or dict appropriately.

For statements, use exec(string) (Python 2/3) or exec string (Python 2):
>>> my_code = 'print("hello world")'
>>> exec(my_code)
Hello world
When you need the value of an expression, use eval(string):
>>> x = eval("2+2")
>>> x
4
However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs.

In the example a string is executed as code using the exec function.
import sys
import StringIO
# create file-like string to capture output
codeOut = StringIO.StringIO()
codeErr = StringIO.StringIO()
code = """
def f(x):
x = x + 1
return x
print 'This is my output.'
"""
# capture output and errors
sys.stdout = codeOut
sys.stderr = codeErr
exec code
# restore stdout and stderr
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
print f(4)
s = codeErr.getvalue()
print "error:\n%s\n" % s
s = codeOut.getvalue()
print "output:\n%s" % s
codeOut.close()
codeErr.close()

eval and exec are the correct solution, and they can be used in a safer manner.
As discussed in Python's reference manual and clearly explained in this tutorial, the eval and exec functions take two extra parameters that allow a user to specify what global and local functions and variables are available.
For example:
public_variable = 10
private_variable = 2
def public_function():
return "public information"
def private_function():
return "super sensitive information"
# make a list of safe functions
safe_list = ['public_variable', 'public_function']
safe_dict = dict([ (k, locals().get(k, None)) for k in safe_list ])
# add any needed builtins back in
safe_dict['len'] = len
>>> eval("public_variable+2", {"__builtins__" : None }, safe_dict)
12
>>> eval("private_variable+2", {"__builtins__" : None }, safe_dict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'private_variable' is not defined
>>> exec("print \"'%s' has %i characters\" % (public_function(), len(public_function()))", {"__builtins__" : None}, safe_dict)
'public information' has 18 characters
>>> exec("print \"'%s' has %i characters\" % (private_function(), len(private_function()))", {"__builtins__" : None}, safe_dict)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'private_function' is not defined
In essence you are defining the namespace in which the code will be executed.

Remember that from version 3 exec is a function!
so always use exec(mystring) instead of exec mystring.

Avoid exec and eval
Using exec and eval in Python is highly frowned upon.
There are better alternatives
From the top answer (emphasis mine):
For statements, use exec.
When you need the value of an expression, use eval.
However, the first step should be to ask yourself if you really need to. Executing code should generally be the position of last resort: It's slow, ugly and dangerous if it can contain user-entered code. You should always look at alternatives first, such as higher order functions, to see if these can better meet your needs.
From Alternatives to exec/eval?
set and get values of variables with the names in strings
[while eval] would work, it is generally not advised to use variable names bearing a meaning to the program itself.
Instead, better use a dict.
It is not idiomatic
From http://lucumr.pocoo.org/2011/2/1/exec-in-python/ (emphasis mine)
Python is not PHP
Don't try to circumvent Python idioms because some other language does it differently. Namespaces are in Python for a reason and just because it gives you the tool exec it does not mean you should use that tool.
It is dangerous
From http://nedbatchelder.com/blog/201206/eval_really_is_dangerous.html (emphasis mine)
So eval is not safe, even if you remove all the globals and the builtins!
The problem with all of these attempts to protect eval() is that they are blacklists. They explicitly remove things that could be dangerous. That is a losing battle because if there's just one item left off the list, you can attack the system.
So, can eval be made safe? Hard to say. At this point, my best guess is that you can't do any harm if you can't use any double underscores, so maybe if you exclude any string with double underscores you are safe. Maybe...
It is hard to read and understand
From http://stupidpythonideas.blogspot.it/2013/05/why-evalexec-is-bad.html (emphasis mine):
First, exec makes it harder to human beings to read your code. In order to figure out what's happening, I don't just have to read your code, I have to read your code, figure out what string it's going to generate, then read that virtual code. So, if you're working on a team, or publishing open source software, or asking for help somewhere like StackOverflow, you're making it harder for other people to help you. And if there's any chance that you're going to be debugging or expanding on this code 6 months from now, you're making it harder for yourself directly.

eval() is just for expressions, while eval('x+1') works, eval('x=1') won't work for example. In that case, it's better to use exec, or even better: try to find a better solution :)

It's worth mentioning that exec's brother exists as well, called execfile, if you want to call a Python file. That is sometimes good if you are working in a third party package which have terrible IDE's included and you want to code outside of their package.
Example:
execfile('/path/to/source.py')
or:
exec(open("/path/to/source.py").read())

You accomplish executing code using exec, as with the following IDLE session:
>>> kw = {}
>>> exec( "ret = 4" ) in kw
>>> kw['ret']
4

As the others mentioned, it's "exec" ..
but, in case your code contains variables, you can use "global" to access it, also to prevent the compiler to raise the following error:
NameError: name 'p_variable' is not defined
exec('p_variable = [1,2,3,4]')
global p_variable
print(p_variable)

I tried quite a few things, but the only thing that worked was the following:
temp_dict = {}
exec("temp_dict['val'] = 10")
print(temp_dict['val'])
output:
10

Use eval.

Check out eval:
x = 1
print eval('x+1')
->2

The most logical solution would be to use the built-in eval() function .Another solution is to write that string to a temporary python file and execute it.

Ok .. I know this isn't exactly an answer, but possibly a note for people looking at this as I was. I wanted to execute specific code for different users/customers but also wanted to avoid the exec/eval. I initially looked to storing the code in a database for each user and doing the above.
I ended up creating the files on the file system within a 'customer_filters' folder and using the 'imp' module, if no filter applied for that customer, it just carried on
import imp
def get_customer_module(customerName='default', name='filter'):
lm = None
try:
module_name = customerName+"_"+name;
m = imp.find_module(module_name, ['customer_filters'])
lm = imp.load_module(module_name, m[0], m[1], m[2])
except:
''
#ignore, if no module is found,
return lm
m = get_customer_module(customerName, "filter")
if m is not None:
m.apply_address_filter(myobj)
so customerName = "jj"
would execute apply_address_filter from the customer_filters\jj_filter.py file

Get variable name from NameError object

if I catch a NameError exception using except:
try:
print(unknownVar)
except NameError as ne:
print(ne)
I get a string like :
NameError: name 'unknownVar' is not defined
I work in the context of eval'ed expressions and it whould be a useful information to me if I could obtain only the variable name (here "unknownVar" alone) and not the full string. I did not find an attribute for example in the NameError object to get it (perhaps does it exists, but I did not find it). Is there something better than parsing this string to do to get the information I need ?
Best Regards
Mikhaël

You can extract it using regex:
import re
try:
print(unknownVar)
except NameError as ne:
var_name = re.findall(r"'([^']*)'", str(ne))[0]
print(var_name) # output: unknownVar

Extract it from the string:
ne.args[0].split()[1].strip("'")

Unfortunately, error messages are not exactly Python's strong suit. However, there is actually an alternative to parsing the string, but it is quite "hacky" and only works with CPython (i.e. this will fail with PyPy, Jython, etc.).
The idea is to extract the name of whatever you wanted to load from the underlying code object.
import sys
import opcode
def extract_name():
tb = sys.exc_info()[2] # get the traceback
while tb.tb_next is not None:
tb = tb.tb_next
instr_pos = tb.tb_lasti # the index of the "current" instruction
frame = tb.tb_frame
code = frame.f_code # the code object
instruction = opcode.opname[code.co_code[instr_pos]]
arg = code.co_code[instr_pos + 1]
if instruction == 'LOAD_FAST':
return code.co_varnames[arg]
else:
return code.co_names[arg]
def test(s):
try:
exec(s)
except NameError:
name = extract_name()
print(name)
test("print(x + y)")
1. The Background of Code Object
Python compiles the original Python source code into bytecode and then executes that bytecode. The code is stored in "code objects", which are (partly) documented here. For our purpose, the following will suffice:
class CodeObject:
co_code: bytes # the bytecode instructions
co_varnames: tuple # names of local variables and parameters
co_names: tuple # all other names
If some code produces a NameError, it failed to load a specific name. That name must be either in the co_names or co_varnames tuple. All we have to figure out is which one.
While the code objects desribe the code statically, we also need a dynamic object that tells us the value of local variables and which instruction we are currently executing. This role is fulfilled by the "frame" (leaving out irrelevant details):
class Frame:
f_code: CodeObject # the code object (see above)
f_lasti: int # the instruction currently executed
You could think of the interpreter as basically doing the following:
def runCode(code):
frame = create_new_frame(code)
while True:
i = frame.f_lasti
opcode = frame.f_code.co_code[i]
arg = frame.f_code.co_code[i+1]
exec_opcode(opcode, arg)
frame.f_lasti += 2
The code to load a name then has a form like this:
LOAD_NAME 3 (the actual name is co_names[3])
LOAD_GLOBAL 3 (the actual name is co_names[3])
LOAD_FAST 3 (the actual name is co_varnames[3])
You can see that we have to distinguish between LOAD_FAST (i.e. load a local variable) and all other LOAD_X opcodes.
2. Getting The Right Name
When an error occurs, we need to go through the stacktrace/traceback until we find the frame in which the error occurred. From the frame we then get the code object with the list of all names and instructions, extract the instruction and argument that led to the error and thus the name.
We retrieve the traceback with sys.exc_info()[2]. The actual frame and traceback we are interested in is the very last one (this is what you can read in the line Traceback (most recent call last): whenever a runtime error occurs):
tb = sys.exc_info()[2] # get the traceback
while tb.tb_next is not None:
tb = tb.tb_next
This traceback object then contains two information of importance to us: the frame tb_frame and the instruction pointer tb_last where the error occurred. From the frame we then extract the code object:
instr_pos = tb.tb_lasti # the index of the "current" instruction
frame = tb.tb_frame
code = frame.f_code # the code object
Since the byte encoding the instruction can change with different Python versions, we want to get the human-readable form, which is more stable. We need that so that we can distinguish between local variables all others:
instruction = opcode.opname[code.co_code[instr_pos]]
arg = code.co_code[instr_pos + 1]
if instruction == 'LOAD_FAST':
return code.co_varnames[arg]
else:
return code.co_names[arg]
3. Caveat
If the code object uses more than 255 names, a single byte will no longer be enough as index into the tuples with all names. In that case, the bytecode allows for an extension prefix, which is not taken into account here. But for most code objects, this should work just fine.
As mentioned in the beginning, this is a rather hacky method that is based on internals of Python that might change (although this is rather unlikely). Nonetheless, it is fun taking Python apart this way, isn't it ;-).

How to call variables from two different python modules(bi-directional)

I am stuck in resolving a problem using python. Problem is I have to pass a variable value of module(python_code1.py) to a different module(python_code2.py). Based on the variable value, need to do some calculation in the module python_code2.py and then need to capture output value in the same module(python_code1.py) for further calculations.
Below is the snapshot of my code logic :
python_code2.py
import python_code1
data = python_code1.json_data
'''
Lines of code
'''
output = "some variable attribues"
python_code1.py
import python_code2
json_data = {"val1": "abc3","val1": "abc3","val1": "abc3"}
input_data = python_code2.output
''''
Lines of code using input_data variable
'''''
when I execute python python_code1.py, this is giving error:
AttributeError: module 'python_code2' has no attribute 'output'
I feel like I am not doing it in write way, but considering my code complexity and lines of code, I have to use these 2 module method.

Putting your code at the top-level is fine for quick throw away scripts, but that's about all. The proper way to organize your code for anything non-trivial is to define functions, so you can pass values as arguments and get results as return value.
If there's only on script using those functions, you can keep them in the script itself.
If you have multiple scripts needing to use the same functions, move those functions to a module and import this module from your scripts.

executing python code from string loaded into a module

I found the following code snippet that I can't seem to make work for my scenario (or any scenario at all):
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
I created a file called "dynamic_module" and put this code in it, which I then used to try to execute the following code which is a placeholder for some dynamically created string I would like to execute.
import random
import datetime
class MyClass(object):
def main(self, a, b):
r = random.Random(datetime.datetime.now().microsecond)
a = r.randint(a, b)
return a
Then I tried executing the following:
import dynamic_module
dynamic_module.load(code_string)
return_value = dynamic_module.MyClass().main(1,100)
When this runs it should return a random number between 1 and 100. However, I can't seem to get the initial snippet I found to work for even the simplest of code strings. I think part of my confusion in doing this is that I may misunderstand how globals and locals work and therefore how to properly fix the problems I'm encountering. I need the code string to use its own imports and variables and not have access to the ones where it is being run from, which is the reason I am going through this somewhat over-complicated method.

You should not be using the code you found. It is has several big problems, not least that most of it doesn't actually do anything (locals() is a proxy, deleting from it has no effect on the actual locals, it puts any code you execute in the same shared globals, etc.)
Use the accepted answer in that post instead; recast as a function that becomes:
import sys, imp
def load_module_from_string(code, name='dynamic_module')
module = imp.new_module(name)
exec(code, mymodule.__dict__)
return module
then just use that:
dynamic_module = load_module_from_string(code_string)
return_value = dynamic_module.MyClass().main(1, 100)
The function produces a new, clean module object.

In general, this is not how you should dynamically import and use external modules. You should be using __import__ within your function to do this. Here's a simple example that worked for me:
plt = __import__('matplotlib.pyplot', fromlist = ['plt'])
plt.plot(np.arange(5), np.arange(5))
plt.show()
I imagine that for your specific application (loading from code string) it would be much easier to save the dynamically generated code string to a file (in a folder containing an __init__.py file) and then to call it using __import__. Then you could access all variables and functions of the code as parts of the imported module.
Unless I'm missing something?

Calling types via their name as a string in Python

I'm aware of using globals(), locals() and getattr to referance things in Python by string (as in this question) but unless I'm missing something obvious I can't seem to use this with calling types.
e.g.:
In [12]: locals()['int']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
e:\downloads_to_access\<ipython console> in <module>()
KeyError: 'int'
In [13]: globals()['int']
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
e:\downloads_to_access\<ipython console> in <module>()
KeyError: 'int'
getattr(???, 'int')...
What's the best way of doing this?

There are locals,globals, and then builtins.
Perhaps you are looking for the builtin:
import __builtin__
getattr(__builtin__,'int')

You've already gotten a solution using builtins, but another worthwhile technique to hold in your toolbag is a dispatch table. If your CSV is designed to be used by multiple applications written in multiple languages, it might look like this:
Integer,15
String,34
Float,1.0
Integer,8
In such a case you might want something like this, where csv is a list of tuples containing the data above:
mapping = {
'Integer': int,
'String': str,
'Float': float,
'Unicode': unicode
}
results = []
for row in csv:
datatype = row[0]
val_string = row[1]
results.append(mapping[datatype](val_string))
return results
That gives you the flexibility of allowing arbitrary strings to map to useful types. You don't have to massage your data to give you the exact values python expects.

getattr(__builtins__,'int')

The issue here is that int is part of the __builtins__ module, not just part of the global namespace. You can get a built-in type, such as int, using the following bit of code:
int_gen = getattr(globals()["__builtins__"], "int")
i = int_gen(4)
# >>> i = 4
Similarly, you can access any other (imported) module by passing the module's name as a string index to globals(), and then using getattr to extract the desired attributes.

Comments suggest that you are unhappy with the idea of using eval to generate data. looking for a function in __builtins__ allows you to find eval.
the most basic solution given looks like this:
import __builtin__
def parseInput(typename, value):
return getattr(__builtins__,typename)(value)
You would use it like so:
>>> parseInput("int", "123")
123
cool. works pretty ok. how about this one though?
>>> parseInput("eval", 'eval(compile("print \'Code injection?\'","","single"))')
Code injection?
does this do what you expect? Unless you explicitly want this, you need to do something to prevent untrustworthy inputs from poking about in your namespace. I'd strongly recommend a simple whitelist, gracefully raising some sort of exception in the case of invalid input, like so:
import __builtin__
def parseInput(typename, value):
return {"int":int, "float":float, "str":str}[typename](value)
but if you just can't bear that, you can still add just a bit of armor by verifying that the requested function is actually a type:
import __builtin__
def parseInput(typename, value):
typector = getattr(__builtins__,typename)
if type(typector) is type:
return typector(value)
else:
return None

If you have a string that is the name of a thing, and you want the thing, you can also use:
thing = 'int'
eval(thing)
Keep in mind though, that this is very powerful, and you need to understand what thing might contain, and where it came from. For example, if you accept user input as thing, a malicious user could do unlimited damage to your machine with this code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.