Canonical way to build a plug-in system in Python

Canonical way to build a plug-in system in Python - python

I'll try to keep this question objective. What is the canonical way to build a plugin-system for a Python desktop application?
Is it possible to have an easy to use (for the developer) system which achieves the following:
Users input their code using an in-app editor (the editor could end up dumping their plugins into a subdirectory of the app, if necessary)
Keep the source of the application "closed" (Yes, there are pitfalls to obfuscation)
Thanks

I am guessing you need some sort of educational system where the user can submit code, presumable to check that the code performs cf. a exercise.
My immediate thoughts about this would to use a web-interface. In this manner, the code of the evaluating system is entirely hidden (unless the student hacks your webserver, but this is an entirely different topic).
However, for this to work you must be aware of the pitfalls of allowing others submit code to your service, that is then executed. The code must be rigorously checked for the obvious things, but the issues here are open-bounded. You must also protect your service from the back-end by providing a safe environment for executing (e.g. a sandbox).

Related

pyCharm: safe refactoring information for application on depending code

If I do refactoring in a library pyCharm does handle all depending applications which are known to the current running pyCharm instance.
But code which is not known to the current pyCharm does not get updated.
Is there a way to store the refactoring information in version control, so that depending applications can be updated if they get the update to the new version of the library?
Use Case:
class Server:
pass
gets renamed to
class ServerConnection:
pass
If a team mate updates the code of my library, his usage of Server needs to be changed to ServerConnection.
It would be very nice if pyCharm (or an other tool) could help my team mate to update his code automatically.

As far as I can tell this is not possible neither with a vanilla PyCharm nor with a plugin nor with a 3rd party tool.
It is not mentioned in the official documentation
There is no such plugin in the JetBrains Plugin Repositories
If PyCharm writes refactoring information to it's internal logs, you could build this yourself (but would you really want to?)
I am also not aware of any python specific refactorig tool that does that. You can check for yourself: there is another SO question for the most popular refactoring tools
But ...
I am sure there are reasons why your situation is like it is - there always are good reasons (and most of the time the terms 'historic and 'grown' turn up in explanations of these reasons) but I still feel obligated to point out what qarma already mentioned in his comment: the fact that you want to do something like replaying a refactoring on a different code base points towards a problem that should be solved in a different way.
Alternative 1: introduce an API
If you have different pieces of software that depend on each other on such a deep level, it might be a good idea to define an API that decouples the code bases from each others internals. With an API it is clear which parts have to be stable. If changes have to be done on the API level they must be communicated and coordinated with the involved teams.
Alternative 2: Make it what it actually is: one code base
If A1 for whatever reason is not possible I would conclude that you actually have one system distributed over different code bases and then those should be merged into one code base. Different teams can still work on the same code base (hopefully using a DVCS) but global refactorings can be done with tooling help and they reach all parts of the system.
Alternative 3: Make these refactorings in PyCharm over all involved code bases
Even if you can't merge them into one code base you could combine them easily in PyCharm by loading different projects into the same Window. I do this without problems with two git projects that have to be in different repositories but still share certain aspects. PyCharm handles commits to these repositories transparently: if you make changes in several repositories and commit them you write one commit message and the commits will be done to all repositories.

How to prevent decompilation or inspecting python code?

let us assume that there is a big, commercial project (a.k.a Project), which uses Python under the hood to manage plugins for configuring new control surfaces which can be attached and used by Project.
There was a small information leak, some part of the Project's Python API leaked to the public information and people were able to write Python scripts which were called by the underlying Python implementation as a part of Project's plugin loading mechanism.
Further on, using inspect module and raw __dict__ readings, people were able to find out a major part of Project's underlying Python implementation.
Is there a way to keep the Python secret codes secret?
Quick look at Python's documentation revealed a way to suppres a import of inspect module this way:
import sys
sys.modules['inspect'] = None
Does it solve the problem completely?

No, this does not solve the problem. Someone could just rename the inspect module to something else and import it.
What you're trying to do is not possible. The python interpreter must be able to take your bytecode and execute it. Someone will always be able to decompile the bytecode. They will always be able to produce an AST and view the flow of the code with variable and class names.
Note that this process can also be done with compiled language code; the difference there is that you will get assembly. Some tools can infer C structure from the assembly, but I don't have enough experience with that to comment on the details.
What specific piece of information are you trying to hide? Could you keep the algorithm server side and make your software into a client that touches your web service? Keeping the code on a machine you control is the only way to really keep control over the code. You can't hand someone a locked box, the keys to the box, and prevent them from opening the box when they have to open it in order to run it. This is the same reason DRM does not work.
All that being said, it's still possible to make it hard to reverse engineer, but it will never be impossible when the client has the executable.

There is no way to keep your application code an absolute secret.
Frankly, if a group of dedicated and determined hackers (in the good sense, not in the pejorative sense) can crack the PlayStation's code signing security model, then your app doesn't stand a chance. Once you put your app into the hands of someone outside your company, it can be reverse-engineered.
Now, if you want to put some effort into making it harder, you can compile your own embedded python executable, strip out unnecessary modules, obfuscate the compiled python bytecode and wrap it up in some malware rootkit that refuses to start your app if a debugger is running.
But you should really think about your business model. If you see the people who are passionate about your product as a threat, if you see those who are willing to put time and effort into customizing your product to personalize their experience as a danger, perhaps you need to re-think your approach to security. Assuming you're not in the DRM business, or have a similar model that involves squeezing money from reluctant consumers, consider developing an approach that involves sharing information with your users, and allowing them to collaboratively improve your product.

Is there a way to keep the Python secret codes secret?
No there is not.
Python is particularly easy to reverse engineer, but other languages, even compiled ones, are easy enough to reverse.

You cannot fully prevent reverse engineering of software - if it comes down to it, one can always analyze the assembler instructions your program consists of.
You can, however, significantly complicate the process, for example by messing with Python internals. However, before jumping to how to do it, I'd suggest you evaluate whether to do it. It's usually harder to "steal" your code (one needs to fully understand them to be able to extend them, after all) than code it oneself. A pure, unobfuscated Python plugin interface, however, can be vital in creating a whole ecosystem around your program, far outweighing the possible downsides to having someone peek in your maybe not perfectly designed coding internals.

What builtin functions shouldn't be run by untrusted users?

I'm creating a corewars type application that runs on django and allows a user to upload some python code that will control their character. Now, I know the real answer to this is that as long as I'm taking code input from untrusted users I'll have security vulnerabilities. I'm just trying to minimize the risk as much as possible. Here are some that spring to mind:
__import__ (I'll probably also do some ast scanning to make sure there aren't any import statements)
open
file
input
raw_input
Are there any others I'm missing?

There are lots of answers on what to do in general about restricting Python at http://wiki.python.org/moin/SandboxedPython. When I looked at it some time ago, the Zope RestrictedPython looked the best solution, working with a whitelist system. You'll still need to take care in your own code so that you don't expose any security vulnerabilities, but that seems to be the best system out there.

Since you sound determined to do this, I'll link you to the standard rexec module, not because I think you should use it (don't - it has known vulnerabilities), but because it might be a good starting point for getting your webserver compromised your own restricted-execution framework.
In particular, under the heading "Defining restricted environments" several modules and functions are listed that were considered reasonably safe by the rexec designer; these might be usable as an initial whitelist of sorts. I'd also suggest examining its code for other gotchas you might not have thought of.

You will really need to avoid eval.
Imagine code such as:
eval("__impor" + "t__('whatever').destroy_your_server")
This is probably the most important one.

Yeah, you have to whitelist. There are so many ways to hide the bad commands.
This is NOT the worst case scenario:
the worst case scenario is that someone gets into the database
The worst case scenario is getting the entire machine rooted and you not noticing as it probes your other machines and keylogs your passwords. Isolate this machine and consider it hostile (DMZ, block it from being able to launch attacks internally and externally, etc). Run tripwire or AIDE on non-writeable media and log everything to a second host.
Finally, as plash shows, there are a lot of dangerous system calls that need to be protected against.

If you're not committed to using Python as the language inside the game, one possibility would be to embed Lua using LunaticPython (I suggest the bugfixes branch at https://code.launchpad.net/~dne/lunatic-python/bugfixes).
It's much easier to sandbox Lua than Python, and it's much easier to embed Lua than to create your own programming language.

You should use a whitelist, rather than a blacklist. If you use a blacklist, you will always miss something. Even if you don't, Python will add a function to the standard library, and you won't update your blacklist in time.
Things you're currently allowing but probably should not include:
compile
eval
reload (if they do access the filesystem somehow, this is basically import)
I agree that this would be very tricky to do correctly. One complication (among many) could be a user accessing one of these functions through a field in another class.
I would consider using another isolation mechanism, such as a virtual machine, instead or in addition to this. You might look at how codepad does it.

Python, safe, sandbox [duplicate]

This question already has answers here:
How can I sandbox Python in pure Python?
(7 answers)
Closed 9 years ago.
I'd like to make a website where people could upload their Python scripts. Of course I'd like to execute those scripts. Those scripts should do some interesting work. The problem is that people could upload scripts that could harm my server and I'd like to prevent that. What is the option to run arbitrary scripts without harming my system - actually without seeing my system at all? Thank you

"Can't be done."
Running arbitrary (untrusted) scripts and staying safe is a contradiction. You should go as far as using custom kernels, jails, vms, the like.
You can look at how http://codepad.org/about does it, it's a lot of work.

I dont know in earlier versions, in Python 3 you can create functions with access to a custom scope through types.FunctionType.
def f():
return __builtins__
f() # this will work because it has access to __builtins__
scope = {}
sandboxed = FunctionType(f.__code__,scope)
sandboxed() # will throw NameError, builtins is not defined
the returned function has only access to whatever you supplied in the scope dictionary. I wonder if still there are hacks around this.

there are quite a lot of web-server running untrusted python codes nowadays:
http://codepad.org/ (probably the most notorious pastebin for python codes)
http://codingbat.com/ (previously Javabat, name change to reflect Python addition)
http://appengine.google.com/ (host python code on Google's infrastructure)
http://www.spoj.pl/ the infamous Sphere Online Judge coding challenge
you may want to look at how they approached their problems.
or you may want to look at a different approach:
http://pyjs.org/ - pyjamas - python-to-javascript compiler (running client-side, switch the security problem to their side)

"Can't be done," is too harsh. JavaScript engines live in your web browser and they accept and run untrusted scripts safely. There's always the possibility of exploits, but in correct engine operation they are innocuous. There are even "slow script" checks that prevent infinite loops from denial-of-service attacking your browser, making those little alert dialogs.
Google App Engine runs a sandboxed version of the Python VM which effectively removes all the naughty native bits that let you get at the underlying system. To do this yourself in a safe manner would take some Python VM expertise.
For sanity, you could start off by removing all builtins and whitelisting the ones you want to allow users once you certify they don't touch the underlying system.
It feels like something somebody must have already done, but I don't know of any existing project that does it. :-/

I think the way to do this is to run those scripts in normal Python shell, but on a virtual machine. I might be biased, because my "job" is currently to play around with VMs (universities are great!).
A new VM instance can be created and started in seconds. If you keep a few around and replace only those that get broken, you have good service, absolute security and almost no effort.
But there is one thing: Virtually all web hosts today are virtual machines and they don't support another virtual machine inside. You need a real, physical server to do this.

Brett Cannon has a tentative design for doing this, last I knew, but it has not been developed. So unless you're looking to put a LOT of effort into making this happen, there currently isn't a solution publicly available.
Brett's blog is at: http://sayspy.blogspot.com/ if you want to try to read up on it, I couldn't find a direct link to his discussions about the new security design. I can't recall if I read his blog talking about it, or if it was in person where he mentioned it, sorry.
There used to be some restricted execution abilities, but they were dropped because they just didn't work.
It's not impossible to do, but it's not something that Python is able to do right now. It's something people would like, but it's not really a high priority from what I've seen.

trypython.org (BSD licensed source here) does a safe browser oriented version of such a sanbox in IronPython (via Silverlight/Moonlight). You may be able to mash together a headless version of this for use on a server -- but you could definitely let users distribute scripts between each other, or you could distribute these scripts to be executed within the plugin environment.

If you use Linux maybe seccomp is the solution, even the mode 2 is nicer. With those you can create a new process that will fail any syscall and can only read already existing file descriptors.
Maybe using also namespaces and cgroup would help, this can be done with ctypes.

You could try Ideone API - it allows Python 2 and Python 3

Secure plugin system for python application

I have an application written in python. I created a plugin system for the application that uses egg files. Egg files contain compiled python files and can be easily decompiled and used to hack the application. Is there a way to secure this system? I'd like to use digital signature for this - sign these egg files and check the signature before loading such egg file. Is there a way to do this programmatically from python? Maybe using winapi?

Is there a way to secure this system?
The answer is "that depends".
The two questions you should ask is "what are people supposed to be able to do" and "what are people able to do (for a given implementation)". If there exists an implementation where the latter is a subset of the former, the system can be secured.
One of my friend is working on a programming competition judge: a program which runs a user-submitted program on some test data and compares its output to a reference output. That's damn hard to secure: you want to run other peoples' code, but you don't want to let them run arbitrary code. Is your scenario somewhat similar to this? Then the answer is "it's difficult".
Do you want users to download untrustworthy code from the web and run it with some assurance that it won't hose their machine? Then look at various web languages. One solution is not offering access to system calls (JavaScript) or offering limited access to certain potentially dangerous calls (Java's SecurityManager). None of them can be done in python as far as I'm aware, but you can always hack the interpreter and disallow the loading of external modules not on some whitelist. This is probably error-prone.
Do you want users to write plugins, and not be able to tinker with what the main body of code in your application does? Consider that users can decompile .pyc files and modify them. Assume that those running your code can always modify it, and consider the gold-farming bots for WoW.
One Linux-only solution, similar to the sandboxed web-ish model, is to use AppArmor, which limits which files your app can access and which system calls it can make. This might be a feasible solution, but I don't know much about it so I can't give you advice other than "investigate".
If all you worry about is evil people modifying code while it's in transit in the intertubes, standard cryptographic solutions exist (SSL). If you want to only load signed plugins (because you want to control what the users do?), signing code sounds like the right solution (but beware of crafty users or evil people who edit the .pyc files and disables the is-it-signed check).

Maybe some crypto library like this http://chandlerproject.org/Projects/MeTooCrypto helps to build an ad-hoc solution. Example usage: http://tdilshod.livejournal.com/38040.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.