I've seen this multiple times in multiple places, but never have found a satisfying explanation as to why this should be the case.
So, hopefully, one will be presented here. Why should we (at least, generally) not use exec() and eval()?
EDIT: I see that people are assuming that this question pertains to web servers – it doesn't. I can see why an unsanitized string being passed to exec could be bad. Is it bad in non-web-applications?
There are often clearer, more direct ways to get the same effect. If you build a complex string and pass it to exec, the code is difficult to follow, and difficult to test.
Example: I wrote code that read in string keys and values and set corresponding fields in an object. It looked like this:
for key, val in values:
fieldName = valueToFieldName[key]
fieldType = fieldNameToType[fieldName]
if fieldType is int:
s = 'object.%s = int(%s)' % (fieldName, fieldType)
#Many clauses like this...
exec(s)
That code isn't too terrible for simple cases, but as new types cropped up it got more and more complex. When there were bugs they always triggered on the call to exec, so stack traces didn't help me find them. Eventually I switched to a slightly longer, less clever version that set each field explicitly.
The first rule of code clarity is that each line of your code should be easy to understand by looking only at the lines near it. This is why goto and global variables are discouraged. exec and eval make it easy to break this rule badly.
When you need exec and eval, yeah, you really do need them.
But, the majority of the in-the-wild usage of these functions (and the similar constructs in other scripting languages) is totally inappropriate and could be replaced with other simpler constructs that are faster, more secure and have fewer bugs.
You can, with proper escaping and filtering, use exec and eval safely. But the kind of coder who goes straight for exec/eval to solve a problem (because they don't understand the other facilities the language makes available) isn't the kind of coder that's going to be able to get that processing right; it's going to be someone who doesn't understand string processing and just blindly concatenates substrings, resulting in fragile insecure code.
It's the Lure Of Strings. Throwing string segments around looks easy and fools naïve coders into thinking they understand what they're doing. But experience shows the results are almost always wrong in some corner (or not-so-corner) case, often with potential security implications. This is why we say eval is evil. This is why we say regex-for-HTML is evil. This is why we push SQL parameterisation. Yes, you can get all these things right with manual string processing... but unless you already understand why we say those things, chances are you won't.
eval() and exec() can promote lazy programming. More importantly it indicates the code being executed may not have been written at design time therefore not tested. In other words, how do you test dynamically generated code? Especially across browsers.
Security aside, eval and exec are often marked as undesirable because of the complexity they induce. When you see a eval call you often don't know what's really going on behind it, because it acts on data that's usually in a variable. This makes code harder to read.
Invoking the full power of the interpreter is a heavy weapon that should be only reserved for very tricky cases. In most cases, however, it's best avoided and simpler tools should be employed.
That said, like all generalizations, be wary of this one. In some cases, exec and eval can be valuable. But you must have a very good reason to use them. See this post for one acceptable use.
In contrast to what most answers are saying here, exec is actually part of the recipe for building super-complete decorators in Python, as you can duplicate everything about the decorated function exactly, producing the same signature for the purposes of documentation and such. It's key to the functionality of the widely used decorator module (http://pypi.python.org/pypi/decorator/). Other cases where exec/eval are essential is when constructing any kind of "interpreted Python" type of application, such as a Python-parsed template language (like Mako or Jinja).
So it's not like the presence of these functions are an immediate sign of an "insecure" application or library. Using them in the naive javascripty way to evaluate incoming JSON or something, yes that's very insecure. But as always, its all in the way you use it and these are very essential functions.
I have used eval() in the past (and still do from time-to-time) for massaging data during quick and dirty operations. It is part of the toolkit that can be used for getting a job done, but should NEVER be used for anything you plan to use in production such as any command-line tools or scripts, because of all the reasons mentioned in the other answers.
You cannot trust your users--ever--to do the right thing. In most cases they will, but you have to expect them to do all of the things you never thought of and find all of the bugs you never expected. This is precisely where eval() goes from being a tool to a liability.
A perfect example of this would be using Django, when constructing a QuerySet. The parameters passed to a query accepts keyword arguments, that look something like this:
results = Foo.objects.filter(whatever__contains='pizza')
If you're programmatically assigning arguments, you might think to do something like this:
results = eval("Foo.objects.filter(%s__%s=%s)" % (field, matcher, value))
But there is always a better way that doesn't use eval(), which is passing a dictionary by reference:
results = Foo.objects.filter( **{'%s__%s' % (field, matcher): value} )
By doing it this way, it's not only faster performance-wise, but also safer and more Pythonic.
Moral of the story?
Use of eval() is ok for small tasks, tests, and truly temporary things, but bad for permanent usage because there is almost certainly always a better way to do it!
Allowing these function in a context where they might run user input is a security issue, and sanitizers that actually work are hard to write.
Same reason you shouldn't login as root: it's too easy to shoot yourself in the foot.
Don't try to do the following on your computer:
s = "import shutil; shutil.rmtree('/nonexisting')"
eval(s)
Now assume somebody can control s from a web application, for example.
Reason #1: One security flaw (ie. programming errors... and we can't claim those can be avoided) and you've just given the user access to the shell of the server.
Try this in the interactive interpreter and see what happens:
>>> import sys
>>> eval('{"name" : %s}' % ("sys.exit(1)"))
Of course, this is a corner case, but it can be tricky to prevent things like this.
How dangerous is using eval in an in-house desktop application. I understand the problem in a web app. is it really a problem in a desktop thick client application.
We have a scenario we we allow users to create queries using an in-house DSL and dynamically compiling into python code using eval
As the comment said, it depends on what you mean by "safe". From a security standpoint eval is the end of all hope; once you have it, there is no going back, the user can do anything he wants.
Consider for example
eval('(lambda fc=(lambda n: [c for c in ().__class__.__bases__[0].__subclasses__() if c.__name__ == n][0]): fc("function")(fc("code")(0,0,0,0,"KABOOM",(), (),(),"","",0,""),{})())()')
which will segfault CPython2 (see? no hands!). It could also have overwritten your OS with cat-pictures or solve NP vs. P and turn your PC into a black hole. The point being that once you may allow user-supplied input to get into eval(), you are in danger. Don't even bother trying to correctly escape user-supplied input.
So, first up, a Gist (incase text below is not clear) - https://gist.github.com/chozabu/86b60caa0ce211f232da
basically, it seems fairly simple to let any client pass a few dicts to my server (for filters, excludes and sorts) and have loads of my API done with a tiny amount of code that also supports future complex queries I have not thought of!
A client can ask for posts that have tax X but not tag Y within a date range and a rating greater than Z or nearly anything else (except statistical aggregation).
A query dict can look like:
{
'filters : [{ post__stats__score__gte : 0.3 }],
'sort_by' : 'post__author__created_at' '
}
My concern is that a client could abuse this, to filter for only people with a certain email, pw-hash or, again, something I have not thought of.
Think it is practical to make something like this secure, with careful use of black/white-listing? And perhaps altering the query serverside to exclude any data a client should not be viewing?
FWIW, My current plan is to build a system with django-rest-framework, adding something like this as an extra option only if needed, and I can find a way to make it secure.
At the bottom of DRFs filtering page: http://www.django-rest-framework.org/api-guide/filtering/
There are some 3rd party extensions like
https://github.com/miki725/django-url-filter
https://github.com/philipn/django-rest-framework-filters
which seem to be extensions to fix this exact need (I'm amazed neither in core DRF!)
edit: Alasdair commented above, about similar functionality being removed from django-admin in version 1.2.4, and use of a whitelist to ensure security
So, I guess the answer to my original question is really "Yes, as long as you know what you are doing, and are careful - probably using a whitelist rather than a blacklist"
Was looking over a developer's code. He did something that I have never seen before in a Python application. His background is in PHP and is just learning python, so I don't know if this is perhaps a holdover from the different system architectures that he is used to working with.
He told me that the purpose of this code is to prevent the user from attacking the application via code insertion. I'm pretty sure this is unnecessary for our use case since we are never evaluating the data as code, but I just wanted to make sure and ask the community.
# Import library
from cgi import escape
# Get information that the client submitted
fname = GET_request.get('fname', [''] )[0]
# Make sure client did not submit malicious code <- IS THIS NECESSARY?
if fname:
fname = escape(fname)
Is this typically necessary in a Python application?
In what situations is it necessary?
In what situations is it not necessary?
If user input is going into a database, or anywhere else it might be executed, then code injection could be a problem.
This question asks about ways to prevent code injection in php, but the principle is the same - SQL queries containing malicious code get executed, potentially doing things like deleting all your data.
The escape function converts <, > and & characters into html-safe sequences.
From those two links it doesn't look like escape() is enough on it's own, but something does need to be done to stop malicious code. Of course this may well be being taken care of elsewhere in your code.
I was wondering if there's a pythonic equivalent of the RequestDispatcher.forward(request, response) that I'm used to from Java servlet programming? It's a common enough technique in Java, and enables you to do, say, a little preprocessing of a particular type of request, and then hand over to another url handler. This all happens inside the server, which is an enormous time saver.
The nearest thing I can see in the GAE/Python documentation is RequestHandler.redirect(), but that's hopeless. For one thing, there's an extra round trip to the browser. For another, there's no guarantee the redirect will actually be followed once it's out of my hands, which makes me a little twitchy. (Semantically it's just wrong too, since a redirect implies that the original resource may be unavailable or have moved, which ain't the case.)
There's something that sounds tantalisingly close in webapp2, described at http://webapp-improved.appspot.com/api/webapp2.html#webapp2.RedirectHandler; but from what I can guess from the rather sketchy documentation, it's just for doing a kind of url-rewriting, which is not at all what I want.
I don't know about GAE, but in django you can do something like this:
def view2(request):
do_some_stuff()
def view1(request):
do_some_stuff()
view2(request) # <-- note this line
In GAE maybe you can do this:
return View.action.__func__(self, ...)
(though this, depending on the function, might screw things up, if, for example, View.action uses self.something_which_should_be_here_but_isnt_because_self_is_some_other_thing)
The easiest way to do this is simply to refactor the relevant code into a function that you call from both handlers, or to put the functionality in a base class both handlers extend. Which is more suitable depends on your app.