Why is exec() dangerous? [duplicate] - python

I've seen this multiple times in multiple places, but never have found a satisfying explanation as to why this should be the case.
So, hopefully, one will be presented here. Why should we (at least, generally) not use exec() and eval()?
EDIT: I see that people are assuming that this question pertains to web servers – it doesn't. I can see why an unsanitized string being passed to exec could be bad. Is it bad in non-web-applications?

There are often clearer, more direct ways to get the same effect. If you build a complex string and pass it to exec, the code is difficult to follow, and difficult to test.
Example: I wrote code that read in string keys and values and set corresponding fields in an object. It looked like this:
for key, val in values:
fieldName = valueToFieldName[key]
fieldType = fieldNameToType[fieldName]
if fieldType is int:
s = 'object.%s = int(%s)' % (fieldName, fieldType)
#Many clauses like this...
exec(s)
That code isn't too terrible for simple cases, but as new types cropped up it got more and more complex. When there were bugs they always triggered on the call to exec, so stack traces didn't help me find them. Eventually I switched to a slightly longer, less clever version that set each field explicitly.
The first rule of code clarity is that each line of your code should be easy to understand by looking only at the lines near it. This is why goto and global variables are discouraged. exec and eval make it easy to break this rule badly.

When you need exec and eval, yeah, you really do need them.
But, the majority of the in-the-wild usage of these functions (and the similar constructs in other scripting languages) is totally inappropriate and could be replaced with other simpler constructs that are faster, more secure and have fewer bugs.
You can, with proper escaping and filtering, use exec and eval safely. But the kind of coder who goes straight for exec/eval to solve a problem (because they don't understand the other facilities the language makes available) isn't the kind of coder that's going to be able to get that processing right; it's going to be someone who doesn't understand string processing and just blindly concatenates substrings, resulting in fragile insecure code.
It's the Lure Of Strings. Throwing string segments around looks easy and fools naïve coders into thinking they understand what they're doing. But experience shows the results are almost always wrong in some corner (or not-so-corner) case, often with potential security implications. This is why we say eval is evil. This is why we say regex-for-HTML is evil. This is why we push SQL parameterisation. Yes, you can get all these things right with manual string processing... but unless you already understand why we say those things, chances are you won't.

eval() and exec() can promote lazy programming. More importantly it indicates the code being executed may not have been written at design time therefore not tested. In other words, how do you test dynamically generated code? Especially across browsers.

Security aside, eval and exec are often marked as undesirable because of the complexity they induce. When you see a eval call you often don't know what's really going on behind it, because it acts on data that's usually in a variable. This makes code harder to read.
Invoking the full power of the interpreter is a heavy weapon that should be only reserved for very tricky cases. In most cases, however, it's best avoided and simpler tools should be employed.
That said, like all generalizations, be wary of this one. In some cases, exec and eval can be valuable. But you must have a very good reason to use them. See this post for one acceptable use.

In contrast to what most answers are saying here, exec is actually part of the recipe for building super-complete decorators in Python, as you can duplicate everything about the decorated function exactly, producing the same signature for the purposes of documentation and such. It's key to the functionality of the widely used decorator module (http://pypi.python.org/pypi/decorator/). Other cases where exec/eval are essential is when constructing any kind of "interpreted Python" type of application, such as a Python-parsed template language (like Mako or Jinja).
So it's not like the presence of these functions are an immediate sign of an "insecure" application or library. Using them in the naive javascripty way to evaluate incoming JSON or something, yes that's very insecure. But as always, its all in the way you use it and these are very essential functions.

I have used eval() in the past (and still do from time-to-time) for massaging data during quick and dirty operations. It is part of the toolkit that can be used for getting a job done, but should NEVER be used for anything you plan to use in production such as any command-line tools or scripts, because of all the reasons mentioned in the other answers.
You cannot trust your users--ever--to do the right thing. In most cases they will, but you have to expect them to do all of the things you never thought of and find all of the bugs you never expected. This is precisely where eval() goes from being a tool to a liability.
A perfect example of this would be using Django, when constructing a QuerySet. The parameters passed to a query accepts keyword arguments, that look something like this:
results = Foo.objects.filter(whatever__contains='pizza')
If you're programmatically assigning arguments, you might think to do something like this:
results = eval("Foo.objects.filter(%s__%s=%s)" % (field, matcher, value))
But there is always a better way that doesn't use eval(), which is passing a dictionary by reference:
results = Foo.objects.filter( **{'%s__%s' % (field, matcher): value} )
By doing it this way, it's not only faster performance-wise, but also safer and more Pythonic.
Moral of the story?
Use of eval() is ok for small tasks, tests, and truly temporary things, but bad for permanent usage because there is almost certainly always a better way to do it!

Allowing these function in a context where they might run user input is a security issue, and sanitizers that actually work are hard to write.

Same reason you shouldn't login as root: it's too easy to shoot yourself in the foot.

Don't try to do the following on your computer:
s = "import shutil; shutil.rmtree('/nonexisting')"
eval(s)
Now assume somebody can control s from a web application, for example.

Reason #1: One security flaw (ie. programming errors... and we can't claim those can be avoided) and you've just given the user access to the shell of the server.

Try this in the interactive interpreter and see what happens:
>>> import sys
>>> eval('{"name" : %s}' % ("sys.exit(1)"))
Of course, this is a corner case, but it can be tricky to prevent things like this.

Related

Pass by object reference good practices

I come from C++, and I am struggling to get a sense of safety when programming in Python (for instance misspelling can create extremely hard to find bugs, but that is not the point here).
Here I would like to understand how I can avoid doing horrible things by adhering to good practices.
The simple function below is perfectly fine in c++ but creates what I can only call a monstrosity in Python.
def fun(x):
x += 1
x = x + 1
return x
When I call it
var1 = 1;
print(fun(var1), var1)
var2 = np.array([1]);
print(fun(var2), var2)
I get
3 1
[3] [2]
Apart from the lack of homogeneous behaviour (which is already terrible), the second case is particularly hideous. The external variable is modified only by some of the instructions!
I know in details why it happens. So that is not my question. The point is that when constructing a complex program, I do not want to have to be extra careful with all these context-dependent and highly implicit technicalities.
There must be some good practice I can strictly adhere to that will prevent me from inadvertently producing the code above. I can think of ways, but they seem to overcomplicate the code, making C++ look like a more high level language.
What good practice should I follow to avoid that monstrosity?
Thanks!
[EDIT] Some clarification: What I struggle with is the fact that Python makes a type-dependent and context-dependent choice of creating a temporary. Again, I know the rules. However in C++ the choice is done by the programmer and clear throughout the whole function, while that is not the case in Python. Python requires the programmer to know quite some technicalities of the operations done on the argument in order to figure out if at that point Python is working on a temporary or on the original.
Notice that I constructed a function which both returns a value and has a side effect just to show my point.
The point is that a programmer might want to write that function to simply have side effects (no return statement), and midway through the function Python decides to build a temporary, so some side effects are not applied.
On the other hand the programmer might not want side effects, and instead get some (and hard to predict ones).
In C++ the above is simply and clearly handled. In Python it is rather technical and requires knowing what triggers the generation of temporaries and what not. As I need to explain this to my students, I would like to give them a simple rule that will prevent them from falling into those traps.
Good practices to avoid such pitfalls:
Functions which modify inputs should not return anything (e.g. list.sort)
Functions which do not modify the input should return the modified value (e.g. sorted)
Your fun does both, which goes against the conventions followed by most standard library code and popular third-party Python libraries. Breaking this "unwritten rule" is the cause of the particularly hideous result there.
Generally speaking, it's best if functions are kept "pure" when possible. It's easier to reason about a pure and stateless function, and they're easier to test.
A "sense of safety" when programming in Python comes from having a good test suite. As an interpreted and dynamic programming language, almost everything in Python happens at runtime. There is very little to protect you at compile time - pretty much only the syntax errors will be found. This is great for flexibility, e.g. virtually anything can be monkeypatched at runtime. With great power comes great responsibility. It is not unusual for a Python project to have twice as much test code as there is library code.
The one good practice that jumps to mind is command-query separation:
A function or method should only ever either compute and return something, or do something, at least when it comes to outside-observable behavior.
There's very few exceptions acceptable (think e.g. the pop method of a Stack data structure: It returns something, and does something) but those tend to be in places where it's so idiomatic, you wouldn't expect it any other way.
And when a function does something to its input values, that should be that function's sole purpose. That way, there's no nasty surprises.
Now for the inconsistent behavior between a "primitive" type and a more complex type, it's easiest to code defensively and assume that it's a reference anyway.

Is it a good idea to dynamically create variables?

I recently found out how to dynamically create variables in python through this method:
vars()['my_variable'] = 'Some Value'
Thus creating the variable my_variable.
My question is, is this a good idea? Or should I always declare the variables ahead of time?
I think it's preferable to use a dictionnary if it's possible:
vars_dict = {}
vars_dict["my_variable"] = 'Some Value'
vars_dict["my_variable2"] = 'Some Value'
I think it's more pythonic.
This is a bad idea, since it gets much harder to analyze the code, both for a human being looking at the source, and for tools like pylint or pychecker. You'll be a lot more likely to introduce bugs if you use tricks like that. If you think you need that feature at some time, think really hard if you can solve your problem in a simpler and more conventional way. I've used Python for almost 20 years, and never felt a need to do that.
If you have more dynamic needs, just use a normal dictionary, or possibly something like json.
One of the great things with Python, its dynamic nature and good standard collection types, is that you can avoid putting logic in text strings. Both the Python interpreter, syntax highlighting in your IDE, intellisense and code analysis tools look at your source code, provides helpful suggestions and finds bugs and weaknesses. This doesn't work if your data structure or logic has been hidden in text strings.
More stupid and rigid languages, such as C++ and Java, often makes developers resort to string based data structures such as XML or json, since they don't have convenient collections like Python lists or dicts. This means that you hide business logic from the compiler and other safety checks built into the language or tools, and have to do a lot of checks that your development tools would otherwise do for you. In Python you don't have to do that ... so don't!
There is no guarantee that vars()['myvariable'] = 'Some value' and my variable = 'Some value' have the same effect. From the documentation:
Without an argument, vars() acts like locals(). Note, the locals
dictionary is only useful for reads since updates to the locals
dictionary are ignored.
This code is simply wrong.
Pros:
adds another level of indirection, makes the environment more dynamic
in particular, allows to avoid more code duplication
Cons:
not applicable for function namespaces (due to optimization)
adds another level of indirection, makes the environment more dynamic
"lexical references" are much harder to track and maintain
if created names are arbitrary, conflicts are waiting to happen
it's hard to find the ins and outs in the code base and predict its behaviour
that's why these tricks may upset code checking tools like pylint
if variables are processed in a similar way, they probably belong together separately from others (in a dedicated dict) rather than reusing a namespace dict, making it a mess in the process
In brief, at the abstraction level Python's language and runtime features are designed for, it's only good in small, well-defined amounts.
I don't see what would be the advantage of it, also would make your code harder to understand.
So no I don't think it is a good idea.

Enums as constants or string comparisons

My Python project has the ability to perform operations on two different destinations, let's call them SF and LA. Which is the better way to accomplish this?
Option A:
destinations.py
LA = 1
SF = 2
example_operation.py
import destinations
run_operation(destination=destinations.LA)
def run_operation(destination):
assert destination in [destinations.LA, destinations.SF]
...
OR
Option B:
example_operation.py
run_operation(destination='LA')
def run_operation(destination):
assert destination in ['LA', 'SF']
...
I realize I can also use a dictionary or many other methods to accomplish this. I'd like to know which is the best practice for declaring and validating these.
Since it’s very subjective, I’ll avoid commenting on which way would be better. You could try to argue from a performance point (integers are faster than strings, but variable lookups are slower than constants?), or from a code completion view (editors could auto-complete the variables), or extensibility (you can easily use a new string for something new), but in the end, it doesn’t really matter much: It’s mostly personal preference.
What I want to try to comment on however is the validation question: How to validate such arguments? My usual answer to that is: Don’t.
Python is usually used without many fail-safes. For example, Python doesn’t have true private members, and large parts of the stdlib even go without hiding their internals completely. If you want, you can use those internals, mess with all the things—but if something breaks, it’s your problem. So often, you would just expect users to use your code correctly: If they pass in a parameter your function doesn’t expect, then, well, something will fail. Of course it is not bad to have some kind of validation but you usually don’t need to put asserts everywhere.

Exaggerated use of functions in Python?

I am currently reading this book on Python and one thing I've noticed is that it takes functional programming rather seriously. I mean, if you take a look for example, at this chapter's source code, looking at lines 14-16 you see that the writer used a function just to get some input, instead of having it somewhere around line 53.
I just don't understand what's the point of abusing functions so much, and I wanted to know what does Python's ideology say about this matter, about functional programming.
It's generally considered good style to write small functions which do a single thing.
This has four main advantages:
it helps write 'self-documenting code' (eg message = get_message() is incredilby clear)
it makes your code significantly easier to debug and to test
it promotes code reuse (what if you want to get a message from multiple places in your code?)
it allows you to later add to or change the functionality of small snippets of code easily (eg, what if you later want to get a message over the network?)
As sepp2k points out in a comment, this is definitely not 'functional programming'; it's simply good style in a procedural language.
I agree that this can sometimes look contrived in simple examples, like the ones you linked to, but it's a very good practice to get into. Failing to break things up in large programs can make your code really hard to maintain.
(As an aside, it's a good idea to follow the PEP8 style guide, which suggests that function names should_use_underscores()).

Is there a way to create a function that does not need parentheses?

Well, probably a strange question, I know. But searching google for python and braces gives only one type of answers.
What I want to as is something low-level and, probably, not very pythonic. Is there a clear way to write a function working with:
>>>my_function arg1, arg2
instead of
>>>my_function(arg1, arg2)
?
I search a way to make function work like an old print (in Python <3.0), where you don't need to use parentheses. If that's not so simple, is there a way to see the code for "print"?
You can do that sort of thing in Ruby, but you can't in Python. Python values clean language and explicit and obvious structure.
>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
The requirement for braces lies in the Python interpreter and not in the code for the print method (or any other method) itself.
(And as eph points out in the comments, print is a statement not a method.)
As you've already been told, the print in Python 2.x was not a function, but a statement, like if or for. It was a "1st class" citizen, with its own special syntax. You are not allowed to create any statement, and all functions must use parentheses (both in Python 2.x and in 3.x).
No. Function call needs parentheses and two directly consecutive identifiers (that excludes reserved words) are a syntax error. That's set in stone in the grammar and won't change. The only way you could support this is making your own language implementation, at least the frontend of one - that's likely more trouble than it's worth, and would require some significant learning on your side (unless of course you aren't already a parsing expert). See numerous stack overflow questions on compiler construction for material if you want to try it nevertheless.
What are you trying to do? If you are trying to embed this code into another program (non-python) or invoke from the interpreter somehow, can you use sys.argv as an alternative instead? Here is an example of how sys.argv works.

Categories