I need to store source code for a basic function in a database and allow it to be modified through an admin interface. This code will take several numbers and strings as parameters, and return a number or None. I know that eval is evil, so I need to implement a safe way to execute a very basic subset of python, or something syntactically similar at least, from within a python based web-app.
The obvious answer is to implement a DSL (Domain Specific Language), however, I have no experience with that, nor do I have any idea where to begin, and a lot of the resources available seem to go a little over my head. I'm hoping that maybe there is something already out there which will allow me to essentially generate a secure python-callable function from a string in a database. the language really only needs to support assignment, basic math, if/else, and case insensitive string comparisons. any other features are a bonus, but I think most things can be done with just that, no need for complex data structures, classes, functions, etc.
If no such thing currently exists, I'm willing to look into the possibility of creating one, but as I said, I have no idea how to go about that, and any advice in that regard would be appreciated as well.
Restricted Python environments are hard to make really safe.
Maybe something like lua is a better fit for you
PySandbox might help. I haven't tested it, just found it linked elsewhere.
You could use Pyparsing to implement your DSL, provided the expressions involved won't be too complex (you don't give full details on that but you imply the requirements are pretty simple). See the examples page including specifically fourFn.py or simpleCalc.py.
You could implement a subset of Python by using the ast module to parse Python code into an abstract syntax tree then walk the tree checking that it only uses the subset of Python that you allow. This will only work in Python 2.x since Python 3 has removed the ast module.
However even using this method it will be hard to create something that is 100% secure, since even the most innocuous code could allow the user to write something that could blow up your application, e.g. by allocating more memory than you have available or putting the program into an infinite loop using all the CPU.
Related
I want to create a SQL autocompleter for use with rlwrap: https://github.com/hanslub42/rlwrap
This could then be used with sqlite3 & osqueri for example (I know they already have some autocompletion facility, but it's not good enough, especially under rlwrap).
In fact, more generally I would like to know the best approach for building autocompleters based on BNF grammar descriptions; I may want to produce autocompleters for other rlwrapped REPLs at some point in the future.
I have no experience with parsers, but I have read some stuff online about the different types of parsers and how they work, and this Pyleri tutorial: https://tomassetti.me/pyleri-tutorial/
Pyleri looks fairly straightforward, and has the expecting property which makes it easy to create a auto-completer, but AFAIK it would involve translating the sqlite BNF (and any other BNF's that I might want to use in the future) into python code, which is a drag.
ANTLR has lots of predefined grammar files for many different languages, and the ability to output python code, but I'm not sure how easy it is to produce an autocompleter, and I don't want to read through all the documentation only to find out I've wasted my time.
So can anyone advise me? What's the best approach?
see the title. For a small tool I am writing I wanted to introduce a simple boolean filter language and decided to do that "properly" and use a parser-generator. After playing around with grako a bit I found I like it and got the filter-language done fairly quick (which is also nice :))
The problem is now, if I want to use the tool on other computers or give it to other people I first have to somehow make grako available there, which is a bit bothersome, because everything else is standard python3 stuff.
I guess it is possible by co-packaging the necessary grako-classes, but that seems a bit messy (licensing would be mentioned in any way). Maybe I have overlooked some built-in method.
The short answer is No.
Grako-generated parsers do require the grako library.
For example:
with self._group():
with self._choice():
with self._option():
self._token('nameguard')
with self._option():
self._token('ignorecase')
with self._option():
self._token('left_recursion')
self._error('expecting one of: ignorecase left_recursion nameguard')
All the self._xyz() come from either grako.contexts.ParseContext or grako.parsing.Parser. The backtracking, caching, and the book-keeping required are all hidden behind context managers and decorators.
Having generated parsers depend on grako was a design choice aimed at making the parsers smaller and easier to understand, which was one of the primary objectives of the project (as there are many otherwise-great parser generators that produce obfuscated code).
The other option was to copy the code that the generated parsers could depend on onto each parser, but that seemed a bit unpythonic.
I have read that Julia has access to the AST of the code it runs. What exactly does this mean? Is it that the runtime can access it, that code itself can access it, or both?
Building on this:
Is this a key difference of Julia with respect to other dynamic languages, specifically Python?
What are the practical benefits of being able to access the AST?
What would be a good example of something that you can't easily do in Python, but that you can do in Julia, because of this?
What distinguishes Julia from languages like Python is that Julia allows you to intercept code before it is evaluated. Macros are just functions, written in Julia, which let you access that code and manipulate it before it runs. Furthermore, rather than treating code as a string (like "f(x)"), it's provided as a Julian object (like Expr(:call, :f, :x)).
There are plenty of things this allows which just aren't possible in Python. The main ones are:
You can do more work at compile time, increasing performance
Two good examples of this are regexes and printf. Both of these take a format specification of some kind and interpret it in some way. Now, these can fairly straightforwardly be implemented as functions, which might look like this:
match(Regex(".*"), str)
printf("%d", num)
The problem with this is that these specifications must be re-interpreted every time the statement is run. Every time the interpreter goes over this block, the regex must be re-compiled into a state machine, and the format must be run through a mini-interpreter. On the other hand, if we implement these as macros:
match(r".*", str)
#printf("%d", num)
Then the r and #printf macros will intercept the code at compile time, and run their respective interpreters then. The regex turns into a fast state machine, and the #printf statement turns into a simple println(num). At run time the minimum of work is done, so the code is blazing fast. Now, other languages are able to provide fast regexes, for example, by providing special syntax for it – but the fact that they're not special-cased in Julia means that developers can use the same techniques in their own code.
You can make mini-compilers for, well, pretty much anything
Languages with macros tend to have more capable embedded DSLs, because you can change the semantics of the language at will. For example, the algebraic modelling language, JuMP.jl. Clojure also has some neat examples of this too, like its embedded logic programming language. Mathematica.jl even embeds Mathematica's semantics in Julia, so that you can write really natural symbolic expressions like #Integrate(log(x), {x,0,2}). You can fake this to a point in Python (SymPy does a good job), but not as cleanly or as efficiently.
If that doesn't convince you, consider that someone managed to implement an interactive Julia debugger in pure Julia using macros. Try that in Python.
Edit: Another great example of something that's difficult in other languages is Cartestian.jl, which lets you write generic algorithms across arrays of any number of dimensions.
I am not familiar with Julia and only first heard of it with your question, but this sounded an awfully lot like Lisp (and indeed Julia seems to be a new grandchild/dialect of Lisp from what I'm reading) and it's powerful macros. The ability to access the AST at run/compile time brings a whole new dimension to the programmers code: metaprogramming.
See http://docs.julialang.org/en/latest/manual/metaprogramming/ and especially http://docs.julialang.org/en/latest/manual/metaprogramming/#macros for some of the practical uses. Basically you can 'inject/modify' code in places where it would be impossible for python/R to do the same.
Example: loop unrolling without any copy & paste, which takes a compile time argument to easily vary how much you want to unroll the loop.
Here's an excellent resource on Julia metaprogramming: https://en.wikibooks.org/wiki/Introducing_Julia/Metaprogramming
I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?
There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.
Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.
You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.
Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.
If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.
You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.
I am writing an application where one of the features is to allow the user to write an email template using Markdown syntax.
Besides formatting, the user must be able to use placeholders for a couple of variables that would get replaced at runtime.
The way this is currently working is very simple: the templates have the Pythonic %(var)s placeholders and I replace those with a dictionary before applying Markdown2 formatting.
Turns out that the end user of this system will be a tech-savvy user and I wouldn't like to make it obvious to everyone that it's written in Python.
It's not that I don't like Python... I actually think Python is the perfect tool for the job, I just don't want to expose that to the user (would like the same even if it were written in Java, Perl, Ruby or anything else).
So I'd like to ask for insights on what would be, in your opinion, the best way to expose placeholders for the users:
What do you think is the best placeholder format (thinks like ${var}, $(var) or #{var})?
What would be the best way to replace those placeholders?
I though of using a Regular Expression to change - for instance - ${var} into %(var)s and then applying the regular Python templating substitution, but I am not sure that's the best approach.
If you go that way, it would be very nice if you could indicate me what is a draft of that regular expression as well.
Thanks!
Update: An user pointed out using full-blown templating systems, but I think that may not be worth it, since all I need is placeholders substitution: I won't have loops or anything like that.
Final Update: I have chosen not to use any template engines at this time. I chose to go with the simpler string.Template approach (as pointed out on a comment by hyperboreean). Truth is that I don't like to pick a solution because sometime in the future there may be a need. I will keep all those suggestions on my sleeve, and if on the lifespan of the application there is a clear need for one or more features offered by them, I'll revisit the idea. Right now, I really think it's an overkill. Having full blown templates that the end user can edit as he wants is, at least on my point of view, more trouble than benefit. Nevertheless, it feels much nicer being aware of the reasons I did not went down that path, than just not researching anything and choosing it.
Thanks a lot for all the input.
Use a real template tool: mako or jinja. Don't roll your own. Not worth it.
Have a light templating system ... I am not sure if you can use some of the ones TurboGears provides (Kid or Genshi)
I would recommend jinja2.
It shouldn't create runtime performance issue since it compiles templates to python. It would offer much greater flexibility. As for maintainability; it depends much on the coder, but theoretically (and admittedly superficially) I can't see why it should be harder to maintain.
With a little more work you can give your tech-savvy user the ability to define the substitution style herself. Or choose one from defaults you supply.
You could try Python 2.6/3.0's new str.format() method: http://docs.python.org/library/string.html#formatstrings
This looks a bit different to %-formatting and might not be as instantly recognisable as Python:
'fish{chips}'.format(chips= 'x')