Use user provided python code during runtime - python

I'm developing a system that operates on (arbitrary) data from databases. The data may need some preprocessing before the system can work with it. To allow the user the specify possibly complex rules I though of giving the user the possibility to input Python code which is used to do this task. The system is pure Python.
My plan is to introduce the tables and columns as variables and let the user to anything Python can do (including access to the standard libs). Now to my problem:
How do I take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output. I think the easiest way would be to use the user-entered data a the body of a method and take the return value of that function a my new data.
Is this possible? If yes, how? It's unimportant that the user may enter malicious code since the worst thing that could happen is, that he screws up his own system, which is thankfully not my problem ;)

Python provides an exec() statement which should do what you want. You will want to pass in the variables that you want available as the second and/or third arguments to the function (globals and locals respectively) as those control the environment that the exec is run in.
For example:
env = {'somevar': 'somevalue'}
exec(code, env)
Alternatively, execfile() can be used in a similar way, if the code that you want executed is stored in its own file.
If you only have a single expression that you want to execute, you can also use eval.

Is this possible?
If it doesn't involve time travel, anti-gravity or perpetual motion the answer to this question is always "YES". You don't need to ask that.
The right way to proceed is as follows.
You build a framework with some handy libraries and packages.
You build a few sample applications that implement this requirement: "The data may need some preprocessing before the system can work with it."
You write documentation about how that application imports and uses modules from your framework.
You turn the framework, the sample applications and the documentation over to users to let them build these applications.
Don't waste time on "take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output".
The user should write applications like this.
from your_framework import the_file_loop
def their_function( one_line_as_dict ):
one_line_as_dict['field']= some stuff
the_file_loop( their_function )
That can actually be the entire program.
You'll have to write the_file_loop, which will look something like this.
def the_file_loop( some_function ):
with open('input') as source:
with open('output') as target:
for some_line in source:
the_data = make_a_dictionary( some_line )
some_function( the_data )
target.write( make_a_line( the_data ) )
By creating a framework, and allowing users to write their own programs, you'll be a lot happier with the results. Less magic.

2 choices:
You take his input and put it in a file, then you execute it.
You use exec()

If you just want to set some local values and then provide a python shell, check out the code module.
You can start an instance of a shell that is similar to the python shell, as well as initialize it with whatever local variables you want. This would assume that whatever functionality you want to use the resulting values is built into the classes you are passing in as locals.
Example:
shell = code.InteractiveConsole({'foo': myVar1, 'bar': myVar2})

What you actually want is exec, since eval is limited to taking an expression and returning a value. With exec, you can have code blocks (statements) and work on arbitrarily complex data, passed in as the globals and locals of the code.
The result is then returned by the code via some convention (like binding it to result).

well, you're describing compile()
But... I think I'd still implement this using regular python source files. Add a special location to the path, say '~/.myapp/plugins', and just __import__ everything there. Probably you'll want to provide some convenient base classes that expose the interface you're trying to offer, so that your users can inherit from them.

Related

Emulating user input in python for testing

I have written a simple python program that takes user input, with the use of input(). There are some different commands available.
I want to make sure that all available commands function as intended and that the program catches invalid commands. Since there are quite a few different commands, this is very time-consuming to do manually (i.e., start the program, and enter all commands, one by one). (I have separate test functions for the actual execution of all commands, but I'm struggling to find a nice way to test this functionality together with the input() loop.)
How can I automate the process of giving (predetermined) user inputs, without messing up the rest of the code? In addition to using this to test the program, it would also serve as an example to the user, in order to see the possible usage of the program.
My current solution is that I have two versions of the main() function, which is basically just an infinite loop that takes inputs until an exit command is given. The first version, "main()", is the version intended for use and takes inputs from input(), until the user decides to quit. The second version, "main_test()" is only used for testing, and takes the inputs from a predetermined list, specified in the code. This does the job, but I do not want the main_test() code in the final version. I also do not want to "pollute" main() by adding things only used for testing.
def main():
while True:
user_input = input()
...
def main_test():
test_input = [...]
test_iter = 0
while True:
user_input = test_input[test_iter]
test_iter += 1
...
I have not been able to find a nice way to do this in python, although I'm sure there must be a smart way. I'd prefer a way that does not need any additional imports. But if there is a nice way to do it with additional imports, I'm all ears.
Anyway, striking out with python, my next thought was to specify the commands in a Makefile, where I would start the program and feed the program text input, emulating the user. The main benefit of this is that I would only need the "main()" function, and I would not have to change anything in the python code. The disadvantage is that the example/test is specified outside of the *.py files, which may confuse the user.
If you are looking to confirm that X input results in Y output, then what you are looking for is called unit testing; this methodology allows you to define functions that assess the result of your function against your expected results.
Module options:
- pytest
- unittest
First of all, I would like to point out that there are many methods for testing your code and some are better practice than others, while each one can be preferred by the programmer for different reasons. You should study how to test your code using the existing ways. Depending on the structure of your code, patterns that can be exploited, and your implementation principles that you follow (dependency injections, mvc, mvvm, etc).
As for your specific case, I would recommend to unittest your fraction of the code that uses the user input and define a collection of cases with predefined outputs that your can assert. Then, use the same code (maybe add cases) whenever you edit this fraction, in order to be sure that your program continues to work fluently.
Using Python's Unittest package you can implement a nice script that checks your code consistently and incorporate it to your project. See also Wikipedia page about unit testing to learn more about the principles you have to follow. You can also check Tensorflow's testing page for directions on how to incorporate this kind of testing
in a bigger project, and use these ideas in yours.
Good luck!
The answer from here should work if you don't mind importing unittest, python mocking raw input in unittests.
The main_test() part can then be removed from you projects and taken care of in the testing script. The "return_values" then correspond to you mock inputs.

Use of eval in Python, MATLAB, etc [duplicate]

This question already has answers here:
Why is using 'eval' a bad practice?
(8 answers)
Closed 9 years ago.
I do know that one shouldn't use eval. For all the obvious reasons (performance, maintainability, etc.). My question is more on the side – is there a legitimate use for it? Where one should use it rather than implement the code in another way.
Since it is implemented in several languages and can lead to bad programming style, I assume there is a reason why it's still available.
First, here is Mathwork's list of alternatives to eval.
You could also be clever and use eval() in a compiled application to build your mCode interpreter, but the Matlab compiler doesn't allow that for obvious reasons.
One place where I have found a reasonable use of eval is in obtaining small predicates of code that consumers of my software need to be able to supply as part of a parameter file.
For example, there might be an item called "Data" that has a location for reading and writing the data, but also requires some predicate applied to it upon load. In a Yaml file, this might look like:
Data:
Name: CustomerID
ReadLoc: some_server.some_table
WriteLoc: write_server.write_table
Predicate: "lambda x: x[:4]"
Upon loading and parsing the objects from Yaml, I can use eval to turn the predicate string into a callable lambda function. In this case, it implies that CustomerID is a long string and only the first 4 characters are needed in this particular instance.
Yaml offers some clunky ways to magically invoke object constructors (e.g. using something like !Data in my code above, and then having defined a class for Data in the code that appropriately uses Yaml hooks into the constructor). In fact, one of the biggest criticisms I have of the Yaml magic object construction is that it is effectively like making your whole parameter file into one giant eval statement. And this is very problematic if you need to validate things and if you need flexibility in the way multiple parts of the code absorb multiple parts of the parameter file. It also doesn't lend itself easily to templating with Mako, whereas my approach above makes that easy.
I think this simpler design which can be easily parsed with any XML tools is better, and using eval lets me allow the user to pass in whatever arbitrary callable they want.
A couple of notes on why this works in my case:
The users of the code are not Python programmers. They don't have the ability to write their own functions and then just pass a module location, function name, and argument signature (although, putting all that in a parameter file is another way to solve this that wouldn't rely on eval if the consumers can be trusted to write code.)
The users are responsible for their bad lambda functions. I can do some validation that eval works on the passed predicate, and maybe even create some tests on the fly or have a nice failure mode, but at the end of the day I am allowed to tell them that it's their job to supply valid predicates and to ensure the data can be manipulated with simple predicates. If this constraint wasn't in place, I'd have to shuck this for a different system.
The users of these parameter files compose a small group mostly willing to conform to conventions. If that weren't true, it would be risky that folks would hi-jack the predicate field to do many inappropriate things -- and this would be hard to guard against. On big projects, it would not be a great idea.
I don't know if my points apply very generally, but I would say that using eval to add flexibility to a parameter file is good if you can guarantee your users are a small group of convention-upholders (a rare feat, I know).
In MATLAB the eval function is useful when functions make use of the name of the input argument via the inputname function. For example, to overload the builtin display function (which is sensitive to the name of the input argument) the eval function is required. For example, to call the built in display from an overloaded display you would do
function display(X)
eval([inputname(1), ' = X;']);
eval(['builtin(''display'', ', inputname(1), ');']);
end
In MATLAB there is also evalc. From the documentation:
T = evalc(S) is the same as EVAL(S) except that anything that would
normally be written to the command window, except for error messages,
is captured and returned in the character array T (lines in T are
separated by '\n' characters).
If you still consider this eval, then it is very powerful when dealing with closed source code that displays useful information in the command window and you need to capture and parse that output.

Using Python's basic I/O to manipulate or create Python Files?

Would the most efficient way-and I know it's not very efficient, but I honestly can't find any better way-to manipulate a Python (.py) file, to add/subtract/append code, be to use the basic file I/O module included in Python?
For an example:
obj = open('Codemanipulationtest.py', 'w+')
obj.write("print 'This shows you can do basic I/O?'")
obj.close()
Will manipulate a file I have, named "codemanipulationtest.py", and add to it a print statement. Is this something that can be worked upon or are there any easier or more safe/efficient methods for manipulating/creating new python code?
I've read over this: Parse a .py file, read the AST, modify it, then write back the modified source code
And honestly it seems like the I/O method is easier. I am kind of newbish to Python so I may just be acting stupid.....thanks in advance for any responses.
Edit
The point of it all was simply to play around with the effects playing around with the code. I was thinking of hooking up whatever I end up using to some sort of learning algorithm and seeing how well it could generate little bits of code at a time, and seeing where it could go from there....
To go about with generating the code I would break it out into various classes, IF class, FOR class, and so on. Then you can use the output wherein each class has a to_str() method that you can call in turn.
statements = [ ... ]
obj = open( "some.py", "w+" )
for s in statements:
obj.write( s.to_str() )
obj.close()
This way you can extend your project easily and it will be more understandable and flexible. And, it keeps with the use of the simple write method that you wanted.
Depending on the learning algorithm this break out of the various classes can lead quite well into a sort of pseudo genetic algorithm for code. You can encode the genome as a sequence of statements and then you just have to find a way to go about passing parameters to each statement if they are required and such.
It depends on what you'll be doing with the code you're generating. You have a few options, each more advanced than the last.
Create a file and import it
Create a string and exec it
Write code to create classes (or modules) on the fly directly rather than as text, inserting whatever functions you need into them
Generate Python bytecode directly and execute that!
If you are writing code that will be used and modified by other programmers, then the first approach is probably best. Otherwise I recommend the third for most use cases. The last is only to masochists and former assembly language programmers.
If you want to modify existing Python source code, you can sometimes get away with doing simple modifications with basic search-and-replace, especially if you know something about the source file you're working with, but a better approach is the ast module. This gives you an abstract representation of the Python source that you can modify and then compile directly into Python objects.

Python cmd interpreter adding if statements

For one of my projects I have a python program built around the python cmd class. This allowed me to craft a mini language around sql statements that I was sending to a database. Besides making it far easier to connect with python, I could do things that sql can't do. This was very important for several projects. However, I now need to add in if blocks for greater control flow.
My current thinking is that I will just add two new commands to the language, IF and END. These set a variable which determines whether or not to skip a line. I would like to know if anyone else has done this with the cmd module, and if so, is there a standard method I'm missing? Google doesn't seem to reveal anything, and the cmd docs don't reveal anything either.
For an idea that's similar to what I'm doing, go here. Questions and comments welcome. :)
Hmm, a little more complicated than what I was thinking, though having python syntax would be nice. I debated building a mini language for quite some time before I finally did it. The problem primarily comes in from the external limitations. I have a bunch of "data", which is being generous, to turn into sql. This is based on other "data" that won't pass through. It's also unique to each specific "version" of the problem. Doing straight data to sql would have been my first inclination, but was not practical.
For the curious, I spent a great deal of time going over the mini languages chapter in the art of unix programming, found here.
If I had built the thing in pure python, I wouldn't have had the flexibility I absolutely needed for the problem set.
The limitations of making a "mini language" have become apparent.
Proper languages have a tree-like structure and more complex syntax than cmd can handle easily.
Sometimes it's actually easier to use Python directly than it is to invent your own DSL.
Currently, your DSL probably reads a script-like file of commands.
Because of the way cmd works, your little comments get a string argument, which must be parsed. Then the command gets executed. And, further, each command is a method of the cmd.Cmd subclass.
Here's what you can do.
Each do_foo( self, args ) method becomes a stand-alone callable object. It will follow the Command design pattern. It will do exactly what the method function does now. No less. Exactly the same.
class Foo( object ):
def __init__( self, global_context ):
self.context= global_context
def __call__( self, args ):
... The rest of do_foo ...
Additionally, your existing cmd.Cmd subclass probably maintains some internal state.
All of the self.this and self.that instance variables must be changed to reference
and explicit context object.
class Context( object ): pass
Change self.this or self.that to self.context.this or self.context.that
Now, you can create your context and your various commands.
ctx = Context()
foo= Foo(ctx)
Your script changes syntax slightly. From:
foo argstring
bar argstring
to:
from mylanguage import foo, bar
foo( "argstring" )
bar( "argstring" )
This does Exactly what the CLI does now. No more. No less. Exactly the same. Slightly different syntax.
Now your script is no longer in a DSL that's hard to expand. It's in Python.
Having done that, you can now use Python syntax if statements.
You have the Exact functionality currently implemented in cmd with better syntax.
After examining the problem set some more, I've come to the conclusion that I can leave the minilanguage alone. It has all the features I need, and I don't have the time to rebuild the project from the ground up. This has been an interesting problem and I'm no longer sure I would build another minilanguage if I encountered the same situation. OTOH, it works very well here, and I am loathe to give up the advantages it has conferred.

What's the best way to record the type of every variable assignment in a Python program?

Python is so dynamic that it's not always clear what's going on in a large program, and looking at a tiny bit of source code does not always help. To make matters worse, editors tend to have poor support for navigating to the definitions of tokens or import statements in a Python file.
One way to compensate might be to write a special profiler that, instead of timing the program, would record the runtime types and paths of objects of the program and expose this data to the editor.
This might be implemented with sys.settrace() which sets a callback for each line of code and is how pdb is implemented, or by using the ast module and an import hook to instrument the code, or is there a better strategy? How would you write something like this without making it impossibly slow, and without runnning afoul of extreme dynamism e.g side affects on property access?
I don't think you can help making it slow, but it should be possible to detect the address of each variable when you encounter a STORE_FAST STORE_NAME STORE_* opcode.
Whether or not this has been done before, I do not know.
If you need debugging, look at PDB, this will allow you to step through your code and access any variables.
import pdb
def test():
print 1
pdb.set_trace() # you will enter an interpreter here
print 2
What if you monkey-patched object's class or another prototypical object?
This might not be the easiest if you're not using new-style classes.
You might want to check out PyChecker's code - it does (i think) what you are looking to do.
Pythoscope does something very similar to what you describe and it uses a combination of static information in a form of AST and dynamic information through sys.settrace.
BTW, if you have problems refactoring your project, give Pythoscope a try.

Categories