How dangerous is using eval in an in-house desktop application. I understand the problem in a web app. is it really a problem in a desktop thick client application.
We have a scenario we we allow users to create queries using an in-house DSL and dynamically compiling into python code using eval
As the comment said, it depends on what you mean by "safe". From a security standpoint eval is the end of all hope; once you have it, there is no going back, the user can do anything he wants.
Consider for example
eval('(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__() if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"KABOOM",(), (),(),"","",0,""),{})())()')
which will segfault CPython2 (see? no hands!). It could also have overwritten your OS with cat-pictures or solve NP vs. P and turn your PC into a black hole. The point being that once you may allow user-supplied input to get into eval(), you are in danger. Don't even bother trying to correctly escape user-supplied input.
Related
I was just toying around with an idea, and I couldn't think of a way to resolve this in the backend without daunting security issues.
Say I want to give users the opportunity to create simple algorithms via a webservice and test these over small lists, e.g. range(0, 5) then report back the results back via another webservice, a template, or an email, doesn't really matter, it's the evaluation that bothers me.
Using python:
class Algorithm(whatever):
function = whatever.CharField(max_length=75)
A user might enter something like:
'f(x)=x+(x/5)**0.75'
Of course I could use eval, stripping any built-ins, strings other than "x" etc., but this would still likely be unfortunate practice.
The only thing I could come up with is to move any evaluation functionality to a JavaScript front end.
eval() is evil, and doing this on the backend is very dangerous. However it could be done safely using a python sandbox. But if the sandbox where to fail, you would get owned. Which is not a very "defense in depth" approach.
A better approach would be to evaluate the payload on the client side... However this is Cross-Site Scripting (XSS). One way to prevent an attacker from being able to exploit this issue, is have an event handler on the client side that evaluates the form when a button is pressed. If the attacker can build a GET or POST request that executes JavaScript then he can exploit the XSS vulnerability. Also make sure to set x-frame-options: deny to prevent clickjacking.
Was looking over a developer's code. He did something that I have never seen before in a Python application. His background is in PHP and is just learning python, so I don't know if this is perhaps a holdover from the different system architectures that he is used to working with.
He told me that the purpose of this code is to prevent the user from attacking the application via code insertion. I'm pretty sure this is unnecessary for our use case since we are never evaluating the data as code, but I just wanted to make sure and ask the community.
# Import library
from cgi import escape
# Get information that the client submitted
fname = GET_request.get('fname', [''] )[0]
# Make sure client did not submit malicious code <- IS THIS NECESSARY?
if fname:
fname = escape(fname)
Is this typically necessary in a Python application?
In what situations is it necessary?
In what situations is it not necessary?
If user input is going into a database, or anywhere else it might be executed, then code injection could be a problem.
This question asks about ways to prevent code injection in php, but the principle is the same - SQL queries containing malicious code get executed, potentially doing things like deleting all your data.
The escape function converts <, > and & characters into html-safe sequences.
From those two links it doesn't look like escape() is enough on it's own, but something does need to be done to stop malicious code. Of course this may well be being taken care of elsewhere in your code.
I'm developing a system that operates on (arbitrary) data from databases. The data may need some preprocessing before the system can work with it. To allow the user the specify possibly complex rules I though of giving the user the possibility to input Python code which is used to do this task. The system is pure Python.
My plan is to introduce the tables and columns as variables and let the user to anything Python can do (including access to the standard libs). Now to my problem:
How do I take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output. I think the easiest way would be to use the user-entered data a the body of a method and take the return value of that function a my new data.
Is this possible? If yes, how? It's unimportant that the user may enter malicious code since the worst thing that could happen is, that he screws up his own system, which is thankfully not my problem ;)
Python provides an exec() statement which should do what you want. You will want to pass in the variables that you want available as the second and/or third arguments to the function (globals and locals respectively) as those control the environment that the exec is run in.
For example:
env = {'somevar': 'somevalue'}
exec(code, env)
Alternatively, execfile() can be used in a similar way, if the code that you want executed is stored in its own file.
If you only have a single expression that you want to execute, you can also use eval.
Is this possible?
If it doesn't involve time travel, anti-gravity or perpetual motion the answer to this question is always "YES". You don't need to ask that.
The right way to proceed is as follows.
You build a framework with some handy libraries and packages.
You build a few sample applications that implement this requirement: "The data may need some preprocessing before the system can work with it."
You write documentation about how that application imports and uses modules from your framework.
You turn the framework, the sample applications and the documentation over to users to let them build these applications.
Don't waste time on "take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output".
The user should write applications like this.
from your_framework import the_file_loop
def their_function( one_line_as_dict ):
one_line_as_dict['field']= some stuff
the_file_loop( their_function )
That can actually be the entire program.
You'll have to write the_file_loop, which will look something like this.
def the_file_loop( some_function ):
with open('input') as source:
with open('output') as target:
for some_line in source:
the_data = make_a_dictionary( some_line )
some_function( the_data )
target.write( make_a_line( the_data ) )
By creating a framework, and allowing users to write their own programs, you'll be a lot happier with the results. Less magic.
2 choices:
You take his input and put it in a file, then you execute it.
You use exec()
If you just want to set some local values and then provide a python shell, check out the code module.
You can start an instance of a shell that is similar to the python shell, as well as initialize it with whatever local variables you want. This would assume that whatever functionality you want to use the resulting values is built into the classes you are passing in as locals.
Example:
shell = code.InteractiveConsole({'foo': myVar1, 'bar': myVar2})
What you actually want is exec, since eval is limited to taking an expression and returning a value. With exec, you can have code blocks (statements) and work on arbitrarily complex data, passed in as the globals and locals of the code.
The result is then returned by the code via some convention (like binding it to result).
well, you're describing compile()
But... I think I'd still implement this using regular python source files. Add a special location to the path, say '~/.myapp/plugins', and just __import__ everything there. Probably you'll want to provide some convenient base classes that expose the interface you're trying to offer, so that your users can inherit from them.
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
I'm setting up a web application to use IronPython for scripting various user actions and I'll be exposing various business objects ready for accessing by the script. I want to make it impossible for the user to import the CLR or other assemblies in order to keep the script's capabilities simple and restricted to the functionality I expose in my business objects.
How do I prevent the CLR and other assemblies/modules from being imported?
This would prevent imports of both python modules and .Net objects so may not be what you want. (I'm relatively new to Python so I might be missing some things as well):
Setup the environment.
Import anything you need the user to have access to.
Either prepend to their script or execute:
__builtins__.__import__ = None #Stops imports working
reload = None #Stops reloading working (specifically stops them reloading builtins
#giving back an unbroken __import___!
then execute their script.
You'll have to search the script for the imports you don't want them to use, and reject the script in toto if it contains any of them.
Basically, just reject the script if it contains Assembly.Load, import or AddReference.
You might want to implement the protection using Microsoft's Code Access Security. I myself am not fully aware of its workings (or how to make it work with IPy), but its something which I feel you should consider.
There's a discussion thread on the IPy mailing list which you might want to look at. The question asked is similar to yours.
If you'd like to disable certain built-in modules I'd suggest filing a feature request over at ironpython.codeplex.com. This should be an easy enough thing to implement.
Otherwise you could simply look at either Importer.cs and disallow the import there or you could simply delete ClrModule.cs from IronPython and re-build (and potentially remove any references to it).
In case anyone comes across this thread from google still (like i did)
I managed to disable 'import clr' in python scripts by commenting out the line
//[assembly: PythonModule("clr", typeof(IronPython.Runtime.ClrModule))]
in ClrModule.cs, but i'm not convinced this is a full solution to preventing unwanted access, since you will still need to override things like the file builtin.