Load python source code into function object - python

I need to create a unit-test like framework for light-weight auto grading of single-function python assignments. I'd love to do something like the following:
test = """
def foo(x):
return x * x
"""
compiled_module = magic_compile(test)
result = compiled_module.foo(3)
assert result == 9
Is there such a magic_compile function? The best I have so far is
exec(test)
result = getattr(sys.modules[__name__], 'foo')(3)
But this seems dangerous and unstable. This won't be run in production, so I'm not super-concerned about sandboxing and safety. Just wanted to make sure there wasn't a better solution I'm missing.

In Python 3.x:
module = {}
exec(test, module)
assert module['foo'](3) == 9
module isn't a real Python module, it's just a namespace to hold the code's globals to avoid polluting your module's namespace.

Related

Programmatically register function as a test function in pytest

I would like to programmatically add or mark a function as a test-case in pytest, so instead of writing
def test_my_function():
pass
I would like to do something like (pseudo-api, I know neither pytest.add_test nor pytest.testcase exist by that identifier).
def a_function_specification():
pass
pytest.add_test(a_function_specification)
or
I would like to do something like
#pytest.testcase
def a_function_specification():
pass
Basically I would like to write some test-case-generating decorator that isn't exactly working like pytest.mark/parametrizing which is why I started to dig into the internals but I haven't found an obvious way how this can be done for python code.
The YAML example in the pytest docs seem to use pytest.Item but I have a hard time mapping this to something that would work within python and not as part of a non-Python-file test collection.
Starting from version 2.6, pytest support:
nose-style __test__ attribute on modules, classes and functions, including unittest-style Classes. If set to False, the test will not be collected.
So, you need to add this attribute.
One approach is:
def not_a_test1():
assert 1 + 2 == 3
not_a_test1.__test__ = True
Another is:
def make_test(func):
func.__test__ = True
return func
#make_test
def not_a_test2():
assert 1 + 2 == 3

Safely execute script coming from user input in Python

I'm looking for a safe mechanism in Python to execute potentially unsafe script code coming from a user.
Code example:
def public(v):
print('Allowed to trigger this with '+v)
def secret():
print('Not allowed to trigger this')
unsafe_user_code = '''
def user_function(a):
if 'o' in a:
public('parameter')
a = 'hello'
user_function(a)
'''
run_code(unsafe_user_code, allowed=['public'])
This could be easily achieved with exec() but as I understand there is no way of using exec() in a safe way in Python.
Here are my requirements:
The syntax of the script should ideally be similar to Python (but could also be something like JavaScript)
Standard mechanisms like string operations, if/else/while and definition of own variables and functions need to be available in the script
The script should only be able to execute certain functions (in the example, only public() would be allowed)
I would like to rely on Python-based implementations/libraries only as I don't want to have dependencies on Python-external software
It should not introduce a security risk
The only way I found so far is to use a parsing library, where I would have to define everything by myself (e.g. this one: https://github.com/lark-parser/lark ).
Is there a better way to achieve something like this?
Thanks!
Answer
Here's the full python file if you want to copy:
from copy import copy
def public(v):
print('Allowed to trigger this with '+v)
def secret():
print('Not allowed to trigger this')
unsafe_user_code = '''
def user_function(a):
if 'o' in a:
public('parameter')
# secret()
# If you uncomment it, it'll throw an error saying 'secret' does not exist
a = 'hello'
user_function(a)
'''
def run_code(code_to_run,allowed:list[str]):
g = globals()
allowed_dict = {f"{name}": g[name] for name in allowed}
code = compile(code_to_run,'','exec')
def useless():
pass
useless.__code__ = code
g = copy(useless.__globals__)
for x in g:
del useless.__globals__[x]
for x in allowed_dict:
useless.__globals__[x] = allowed_dict[x]
useless()
run_code(unsafe_user_code, allowed=['public'])
Explaination
I don't think you need a parsing library.
You can use one of python's Built-in Functions called compile().
You can compile code like this:
text_to_compile = "print('hello world')"
code = compile(
text_to_compile, # text to compile
'file_name', # file name
'exec' # compile mode
)
more about compile modes
and then you can run it simply by:
exec(code) # prints 'hello world' in the terminal
or, you can put that code inside of a function and run the function:
def useless():
pass
useless.__code__ = code
useless() # prints 'hello world' in the terminal
now for changing the scope of the function, we can access its __global__ dictionary and remove everything from it.
then we add the functions/variables that we want to it.
# from copy import copy
def run_code(code_to_run,allowed:list[str]):
g = globals()
allowed_dict = {f"{name}": g[name] for name in allowed}
code = compile(code_to_run,'','exec')
def useless():
pass
useless.__code__ = code
g = copy(useless.__globals__)
for x in g:
del useless.__globals__[x]
for x in allowed_dict:
useless.__globals__[x] = allowed_dict[x]
useless()
alright let's talk about what's happening:
def run_code(code_to_run,allowed:list[str]):
the :list[str] is just defining the type of the variable allowed.
g = globals()
using the globals() method we are able to get all of the variables that the function run_code has access to. this includes the public and the secret functions.
allowed_dict = {f"{name}": g[name] for name in allowed}
and use the variable names that you passed down into the run_code function and get the variables from the globals of this function.
the it's pretty much the same as saying:
allowed_dict = {}
for name in allowed:
allowed_dict[name] = g[name]
alright then we compile the code.
we also make a useless function and put the compiled code into the function:
code = compile(code_to_run,'','exec')
def useless():
pass
useless.__code__ = code
after that, we copy the globals of the useless function:
g = copy(useless.__globals__)
we copy the globals in the function so that later in the code we can loop over it and delete it.
we do this because we cannot delete something from a dictionary while a for loop is iterating it.
for x in g:
del useless.__globals__[x]
we delete everything in the useless function's global.
the useless function is an empty function but it's global is filled with all sort of stuff, like the public and the secret functions.
and then we just put all the allowed variables into the globals of the function so that the code inside of the function can use the variables/functions.
for x in allowed_dict:
useless.__globals__[x] = allowed_dict[x]
then we can just run the useless function:
useless()
and that's pretty much it all.

Execute REPL code in the context of a Python module and not __main__?

I have a Python instance, with a REPL open and several modules imported. Can I run code as if it was part of one of these modules?
Example: my.module includes code like
some_module_var = 123
def my_function():
return 7
I want to be able to type
new_module_var = my_function(some_module_var)
in some form into the REPL and have it executed as if it was part of the module, instead of
my.module.new_module_var = my.module.my_function(my.module.some_module_var)
Is there a nice solution for this?
The one thing I already tried is
exec(compile("my_function(some_module_var)", "<fake_file>", "exec"),
my.module, {})
with the module as the global namespace, but, apparently, namespaces can't be modules.
Also, as a workaround, we could just copy every symbol from the module to the global namespace, run the eval, then copy back changes... it doesn't feel as elegant though as e.g. Common Lisp's "just switch the REPL package" solution though.
Or is there a custom REPL that can do this?
(... my goal is to be able to send entire functions to a running Python instance & have them show up in the right module, not in __main__.)
As it turns out, exec() can do this. You pass in not the module but it's dictionary:
d = my.module.__dict__
exec("some_module_var = 42", d, d)
exec("print(some_module_var)", d, d)
In fact, this is roughly what Python's interpreter does (as of 3.6), in pythonrun.c (see run_mod() and PyRun_InteractiveOneObjectEx()), with __main__ built in as the module name.

executing python code from string loaded into a module

I found the following code snippet that I can't seem to make work for my scenario (or any scenario at all):
def load(code):
# Delete all local variables
globals()['code'] = code
del locals()['code']
# Run the code
exec(globals()['code'])
# Delete any global variables we've added
del globals()['load']
del globals()['code']
# Copy k so we can use it
if 'k' in locals():
globals()['k'] = locals()['k']
del locals()['k']
# Copy the rest of the variables
for k in locals().keys():
globals()[k] = locals()[k]
I created a file called "dynamic_module" and put this code in it, which I then used to try to execute the following code which is a placeholder for some dynamically created string I would like to execute.
import random
import datetime
class MyClass(object):
def main(self, a, b):
r = random.Random(datetime.datetime.now().microsecond)
a = r.randint(a, b)
return a
Then I tried executing the following:
import dynamic_module
dynamic_module.load(code_string)
return_value = dynamic_module.MyClass().main(1,100)
When this runs it should return a random number between 1 and 100. However, I can't seem to get the initial snippet I found to work for even the simplest of code strings. I think part of my confusion in doing this is that I may misunderstand how globals and locals work and therefore how to properly fix the problems I'm encountering. I need the code string to use its own imports and variables and not have access to the ones where it is being run from, which is the reason I am going through this somewhat over-complicated method.
You should not be using the code you found. It is has several big problems, not least that most of it doesn't actually do anything (locals() is a proxy, deleting from it has no effect on the actual locals, it puts any code you execute in the same shared globals, etc.)
Use the accepted answer in that post instead; recast as a function that becomes:
import sys, imp
def load_module_from_string(code, name='dynamic_module')
module = imp.new_module(name)
exec(code, mymodule.__dict__)
return module
then just use that:
dynamic_module = load_module_from_string(code_string)
return_value = dynamic_module.MyClass().main(1, 100)
The function produces a new, clean module object.
In general, this is not how you should dynamically import and use external modules. You should be using __import__ within your function to do this. Here's a simple example that worked for me:
plt = __import__('matplotlib.pyplot', fromlist = ['plt'])
plt.plot(np.arange(5), np.arange(5))
plt.show()
I imagine that for your specific application (loading from code string) it would be much easier to save the dynamically generated code string to a file (in a folder containing an __init__.py file) and then to call it using __import__. Then you could access all variables and functions of the code as parts of the imported module.
Unless I'm missing something?

Why does this not pickle functions as redefined through code? [duplicate]

I'm trying to transfer a function across a network connection (using asyncore). Is there an easy way to serialize a python function (one that, in this case at least, will have no side effects) for transfer like this?
I would ideally like to have a pair of functions similar to these:
def transmit(func):
obj = pickle.dumps(func)
[send obj across the network]
def receive():
[receive obj from the network]
func = pickle.loads(s)
func()
You could serialise the function bytecode and then reconstruct it on the caller. The marshal module can be used to serialise code objects, which can then be reassembled into a function. ie:
import marshal
def foo(x): return x*x
code_string = marshal.dumps(foo.__code__)
Then in the remote process (after transferring code_string):
import marshal, types
code = marshal.loads(code_string)
func = types.FunctionType(code, globals(), "some_func_name")
func(10) # gives 100
A few caveats:
marshal's format (any python bytecode for that matter) may not be compatable between major python versions.
Will only work for cpython implementation.
If the function references globals (including imported modules, other functions etc) that you need to pick up, you'll need to serialise these too, or recreate them on the remote side. My example just gives it the remote process's global namespace.
You'll probably need to do a bit more to support more complex cases, like closures or generator functions.
Check out Dill, which extends Python's pickle library to support a greater variety of types, including functions:
>>> import dill as pickle
>>> def f(x): return x + 1
...
>>> g = pickle.dumps(f)
>>> f(1)
2
>>> pickle.loads(g)(1)
2
It also supports references to objects in the function's closure:
>>> def plusTwo(x): return f(f(x))
...
>>> pickle.loads(pickle.dumps(plusTwo))(1)
3
Pyro is able to do this for you.
The most simple way is probably inspect.getsource(object) (see the inspect module) which returns a String with the source code for a function or a method.
It all depends on whether you generate the function at runtime or not:
If you do - inspect.getsource(object) won't work for dynamically generated functions as it gets object's source from .py file, so only functions defined before execution can be retrieved as source.
And if your functions are placed in files anyway, why not give receiver access to them and only pass around module and function names.
The only solution for dynamically created functions that I can think of is to construct function as a string before transmission, transmit source, and then eval() it on the receiver side.
Edit: the marshal solution looks also pretty smart, didn't know you can serialize something other thatn built-ins
In modern Python you can pickle functions, and many variants. Consider this
import pickle, time
def foobar(a,b):
print("%r %r"%(a,b))
you can pickle it
p = pickle.dumps(foobar)
q = pickle.loads(p)
q(2,3)
you can pickle closures
import functools
foobar_closed = functools.partial(foobar,'locked')
p = pickle.dumps(foobar_closed)
q = pickle.loads(p)
q(2)
even if the closure uses a local variable
def closer():
z = time.time()
return functools.partial(foobar,z)
p = pickle.dumps(closer())
q = pickle.loads(p)
q(2)
but if you close it using an internal function, it will fail
def builder():
z = 'internal'
def mypartial(b):
return foobar(z,b)
return mypartial
p = pickle.dumps(builder())
q = pickle.loads(p)
q(2)
with error
pickle.PicklingError: Can't pickle <function mypartial at 0x7f3b6c885a50>: it's not found as __ main __.mypartial
Tested with Python 2.7 and 3.6
The cloud package (pip install cloud) can pickle arbitrary code, including dependencies. See https://stackoverflow.com/a/16891169/1264797.
code_string = '''
def foo(x):
return x * 2
def bar(x):
return x ** 2
'''
obj = pickle.dumps(code_string)
Now
exec(pickle.loads(obj))
foo(1)
> 2
bar(3)
> 9
Cloudpickle is probably what you are looking for.
Cloudpickle is described as follows:
cloudpickle is especially useful for cluster computing where Python
code is shipped over the network to execute on remote hosts, possibly
close to the data.
Usage example:
def add_one(n):
return n + 1
pickled_function = cloudpickle.dumps(add_one)
pickle.loads(pickled_function)(42)
You can do this:
def fn_generator():
def fn(x, y):
return x + y
return fn
Now, transmit(fn_generator()) will send the actual definiton of fn(x,y) instead of a reference to the module name.
You can use the same trick to send classes across network.
The basic functions used for this module covers your query, plus you get the best compression over the wire; see the instructive source code:
y_serial.py module :: warehouse Python objects with SQLite
"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."
http://yserial.sourceforge.net
Here is a helper class you can use to wrap functions in order to make them picklable. Caveats already mentioned for marshal will apply but an effort is made to use pickle whenever possible. No effort is made to preserve globals or closures across serialization.
class PicklableFunction:
def __init__(self, fun):
self._fun = fun
def __call__(self, *args, **kwargs):
return self._fun(*args, **kwargs)
def __getstate__(self):
try:
return pickle.dumps(self._fun)
except Exception:
return marshal.dumps((self._fun.__code__, self._fun.__name__))
def __setstate__(self, state):
try:
self._fun = pickle.loads(state)
except Exception:
code, name = marshal.loads(state)
self._fun = types.FunctionType(code, {}, name)

Categories