Was looking over a developer's code. He did something that I have never seen before in a Python application. His background is in PHP and is just learning python, so I don't know if this is perhaps a holdover from the different system architectures that he is used to working with.
He told me that the purpose of this code is to prevent the user from attacking the application via code insertion. I'm pretty sure this is unnecessary for our use case since we are never evaluating the data as code, but I just wanted to make sure and ask the community.
# Import library
from cgi import escape
# Get information that the client submitted
fname = GET_request.get('fname', [''] )[0]
# Make sure client did not submit malicious code <- IS THIS NECESSARY?
if fname:
fname = escape(fname)
Is this typically necessary in a Python application?
In what situations is it necessary?
In what situations is it not necessary?
If user input is going into a database, or anywhere else it might be executed, then code injection could be a problem.
This question asks about ways to prevent code injection in php, but the principle is the same - SQL queries containing malicious code get executed, potentially doing things like deleting all your data.
The escape function converts <, > and & characters into html-safe sequences.
From those two links it doesn't look like escape() is enough on it's own, but something does need to be done to stop malicious code. Of course this may well be being taken care of elsewhere in your code.
Related
How dangerous is using eval in an in-house desktop application. I understand the problem in a web app. is it really a problem in a desktop thick client application.
We have a scenario we we allow users to create queries using an in-house DSL and dynamically compiling into python code using eval
As the comment said, it depends on what you mean by "safe". From a security standpoint eval is the end of all hope; once you have it, there is no going back, the user can do anything he wants.
Consider for example
eval('(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__() if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"KABOOM",(), (),(),"","",0,""),{})())()')
which will segfault CPython2 (see? no hands!). It could also have overwritten your OS with cat-pictures or solve NP vs. P and turn your PC into a black hole. The point being that once you may allow user-supplied input to get into eval(), you are in danger. Don't even bother trying to correctly escape user-supplied input.
I was just toying around with an idea, and I couldn't think of a way to resolve this in the backend without daunting security issues.
Say I want to give users the opportunity to create simple algorithms via a webservice and test these over small lists, e.g. range(0, 5) then report back the results back via another webservice, a template, or an email, doesn't really matter, it's the evaluation that bothers me.
Using python:
class Algorithm(whatever):
function = whatever.CharField(max_length=75)
A user might enter something like:
'f(x)=x+(x/5)**0.75'
Of course I could use eval, stripping any built-ins, strings other than "x" etc., but this would still likely be unfortunate practice.
The only thing I could come up with is to move any evaluation functionality to a JavaScript front end.
eval() is evil, and doing this on the backend is very dangerous. However it could be done safely using a python sandbox. But if the sandbox where to fail, you would get owned. Which is not a very "defense in depth" approach.
A better approach would be to evaluate the payload on the client side... However this is Cross-Site Scripting (XSS). One way to prevent an attacker from being able to exploit this issue, is have an event handler on the client side that evaluates the form when a button is pressed. If the attacker can build a GET or POST request that executes JavaScript then he can exploit the XSS vulnerability. Also make sure to set x-frame-options: deny to prevent clickjacking.
I'm developing a system that operates on (arbitrary) data from databases. The data may need some preprocessing before the system can work with it. To allow the user the specify possibly complex rules I though of giving the user the possibility to input Python code which is used to do this task. The system is pure Python.
My plan is to introduce the tables and columns as variables and let the user to anything Python can do (including access to the standard libs). Now to my problem:
How do I take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output. I think the easiest way would be to use the user-entered data a the body of a method and take the return value of that function a my new data.
Is this possible? If yes, how? It's unimportant that the user may enter malicious code since the worst thing that could happen is, that he screws up his own system, which is thankfully not my problem ;)
Python provides an exec() statement which should do what you want. You will want to pass in the variables that you want available as the second and/or third arguments to the function (globals and locals respectively) as those control the environment that the exec is run in.
For example:
env = {'somevar': 'somevalue'}
exec(code, env)
Alternatively, execfile() can be used in a similar way, if the code that you want executed is stored in its own file.
If you only have a single expression that you want to execute, you can also use eval.
Is this possible?
If it doesn't involve time travel, anti-gravity or perpetual motion the answer to this question is always "YES". You don't need to ask that.
The right way to proceed is as follows.
You build a framework with some handy libraries and packages.
You build a few sample applications that implement this requirement: "The data may need some preprocessing before the system can work with it."
You write documentation about how that application imports and uses modules from your framework.
You turn the framework, the sample applications and the documentation over to users to let them build these applications.
Don't waste time on "take a string (the user entered), compile it to Python (after adding code to provide the input data) and get the output".
The user should write applications like this.
from your_framework import the_file_loop
def their_function( one_line_as_dict ):
one_line_as_dict['field']= some stuff
the_file_loop( their_function )
That can actually be the entire program.
You'll have to write the_file_loop, which will look something like this.
def the_file_loop( some_function ):
with open('input') as source:
with open('output') as target:
for some_line in source:
the_data = make_a_dictionary( some_line )
some_function( the_data )
target.write( make_a_line( the_data ) )
By creating a framework, and allowing users to write their own programs, you'll be a lot happier with the results. Less magic.
2 choices:
You take his input and put it in a file, then you execute it.
You use exec()
If you just want to set some local values and then provide a python shell, check out the code module.
You can start an instance of a shell that is similar to the python shell, as well as initialize it with whatever local variables you want. This would assume that whatever functionality you want to use the resulting values is built into the classes you are passing in as locals.
Example:
shell = code.InteractiveConsole({'foo': myVar1, 'bar': myVar2})
What you actually want is exec, since eval is limited to taking an expression and returning a value. With exec, you can have code blocks (statements) and work on arbitrarily complex data, passed in as the globals and locals of the code.
The result is then returned by the code via some convention (like binding it to result).
well, you're describing compile()
But... I think I'd still implement this using regular python source files. Add a special location to the path, say '~/.myapp/plugins', and just __import__ everything there. Probably you'll want to provide some convenient base classes that expose the interface you're trying to offer, so that your users can inherit from them.
I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.
One of my favorite features about python is that you can write configuration files in python that are very simple to read and understand. If you put a few boundaries on yourself, you can be pretty confident that non-pythonistas will know exactly what you mean and will be perfectly capable of reconfiguring your program.
My question is, what exactly are those boundaries? My own personal heuristic was
Avoid flow control. No functions, loops, or conditionals. Those wouldn't be in a text config file and people aren't expecting to have understand them. In general, it probably shouldn't matter the order in which your statements execute.
Stick to literal assignments. Methods and functions called on objects are harder to think through. Anything implicit is going to be a mess. If there's something complicated that has to happen with your parameters, change how they're interpreted.
Language keywords and error handling are right out.
I guess I ask this because I came across a situation with my Django config file where it seems to be useful to break these rules. I happen to like it, but I feel a little guilty. Basically, my project is deployed through svn checkouts to a couple different servers that won't all be configured the same (some will share a database, some won't, for example). So, I throw a hook at the end:
try:
from settings_overrides import *
LOCALIZED = True
except ImportError:
LOCALIZED = False
where settings_overrides is on the python path but outside the working copy. What do you think, either about this example, or about python config boundaries in general?
There is a Django wiki page, which addresses exactly the thing you're asking.
http://code.djangoproject.com/wiki/SplitSettings
Do not reinvent the wheel. Use configparser and INI files. Python files are to easy to break by someone, who doesn't know Python.
Your heuristics are good. Rules are made so that boundaries are set and only broken when it's obviously a vastly better solution than the alternate.
Still, I can't help but wonder that the site checking code should be in the parser, and an additional configuration item added that selects which option should be taken.
I don't think that in this case the alternative is so bad that breaking the rules makes sense...
-Adam
I think it's a pain vs pleasure argument.
It's not wrong to put code in a Python config file because it's all valid Python, but it does mean you could confuse a user who comes in to reconfigure an app. If you're that worried about it, rope it off with comments explaining roughly what it does and that the user shouldn't edit it, rather edit the settings_overrides.py file.
As for your example, that's nigh on essential for developers to test then deploy their apps. Definitely more pleasure than pain. But you should really do this instead:
LOCALIZED = False
try:
from settings_overrides import *
except ImportError:
pass
And in your settings_overrides.py file:
LOCALIZED = True
... If nothing but to make it clear what that file does.. What you're doing there splits overrides into two places.
As a general practice, see the other answers on the page; it all depends. Specifically for Django, however, I see nothing fundamentally wrong with writing code in the settings.py file... after all, the settings file IS code :-)
The Django docs on settings themselves say:
A settings file is just a Python module with module-level variables.
And give the example:
assign settings dynamically using normal Python syntax. For example:
MY_SETTING = [str(i) for i in range(30)]
Settings as code is also a security risk. You import your "config", but in reality you are executing whatever code is in that file. Put config in files that you parse first and you can reject nonsensical or malicious values, even if it is more work for you. I blogged about this in December 2008.