Disable built-in module import in embedded Python - python

I'm embedding Python 3.6 in my application, and I want to disable import command in the scripts to prevent users to import any python built-in libraries. I'd like to use only the language itself and my own C++ defined modules.
Py_SetProgramName (L"Example");
Py_Initialize ();
PyObject* mainModule = PyImport_AddModule ("__main__");
PyObject* globals = PyModule_GetDict (mainModule);
// This should work
std::string script1 = "print ('example')";
PyRun_String (script1.c_str (), Py_file_input, globals, nullptr);
// This should not work
std::string script2 = "import random\n"
"print (random.randint (1, 10))\n";
PyRun_String (script2.c_str (), Py_file_input, globals, nullptr);
Py_Finalize ();
Do you know any way to achieve this?

Python has a long history of being impossible to create a secure sandbox (see How can I sandbox Python in pure Python? as a starting point, then dive into an old python-dev discussion if you feel like it). Here are what I consider to be your best two options.
Pre-scan the code
Before executing anything, scan the code. You could do this in Python with the AST module and then walk the tree, or can likely get far enough with simpler text searches. This likely works in your scenario because you have restricted use cases - it doesn't generalize to truly arbitrary code.
What you are looking for in your case will be any import statements (easy), and any top-level variables (e.g., in a.b.c you care about a and likely a.b for a given a) that are not "approved". This will enable you to fail on any code that isn't clean before running it.
The challenge here is that even trivally obfuscated code will bypass your checks. For example, here are some ways to import modules given other modules or globals that a basic scan for import won't find. You would likely want to restrict direct access to __builtins__, globals, some/most/all names with __double_underscores__ and members of certain types. In an AST, these will unavoidably show up as top-level variable reads or attribute accesses.
getattr(__builtins__, '__imp'+'ort__')('other_module')
globals()['__imp'+'ort__']('other_module')
module.__loader__.__class__(
"other_module",
module.__loader__.path + '/../other_module.py'
).load_module()
(I hope it goes somewhat without saying, this is an impossible challenge, and why this approach to sandboxing has never fully succeeded. But it may be good enough, depending on your specific threat model.)
Runtime auditing
If you are in a position to compile your own Python runtime, you might consider using the (currently draft) PEP 551 hooks. (Disclaimer: I am the author of this PEP.) There are draft implementations against the latest 3.7 and 3.6 releases.
In essence, this would let you add hooks for a range of events within Python and determine how to respond. For example, you can listen to all import events and determine whether to allow or fail them at runtime based on exactly which module is being imported, or listen to compile events to manage all runtime compilation. You can do this from Python code (with sys.addaudithook) or C code (with PySys_AddAuditHook).
The Programs/spython.c file in the repo is a fairly thorough example of auditing from C, while doing it from Python looks more like this (taken from my talk about this PEP):
import sys
def prevent_bitly(event, args):
if event == 'urllib.Request' and '://bit.ly/' in args[0]:
print(f'WARNING: urlopen({args[0]}) blocked')
raise RuntimeError('access to bit.ly is not allowed')
sys.addaudithook(prevent_bitly)
The downside of this approach is you need to build and distribute your own version of Python, rather than relying on a system install. However, in general this is a good idea if your application is dependent on embedding as it means you won't have to force users into a specific system configuration.

Related

Adding custom dllmain to cythonized module

Consider the following scenario:
A python tool should be deployed for external users in a way that (a) the source is protected and (b) it is ensured that only users with valid license can use it. Furthermore, for internal users, the code should be available as pure python, so debugging with a python debugger is possible and no compiling is necessary.
A license check is available and written in C.
(a) is solved by using cython.
Thoughts on (b):
The license check could be applied to a "central" part of the program (i.e., "a class that contains enough business logic so that no one could replace it"). Of course I could use cython to link the license check to this class's *.pyd and use the c function, but then it would be impossible for the internal user to use / debug the class. On the other hand, applying the license check to a class that is "unimportant" while debugging would mean that it could (presumably) be exchanged rather easily.
Creating a separate function "licenseIsValid() -> bool" and calling it from a central position in the code is not feasible as it can be replaced easily.
As a matter of fact, (almost) everything written in python can be replaced and therefore is unsafe.
So I came up with the idea of implementing the license check before anything python-related is executed: inside of the dllmain function. I could call the license check function on startup of the dll / pyd and stop the application before executing any cython code.
Is there a way to add a custom dllmain to a cythonized module? Is it even safe to add a dllmain (or does cython require a "specialized" dllmain)?
One last remark: I am aware that python is the wrong choice of language for creating closed source code with license check. But the advantages of using python far outweigh this...
Thank you in advance,
Jan

how do you statically find dynamically loaded modules

How does one get (finds the location of) the dynamically imported modules from a python script ?
so, python from my understanding can dynamically (at run time) load modules.
Be it using _import_(module_name), or using the exec "from x import y", either using imp.find_module("module_name") and then imp.load_module(param1, param2, param3, param4) .
Knowing that I want to get all the dependencies for a python file. This would include getting (or at least I tried to) the dynamically loaded modules, those loaded either by using hard coded string objects or those returned by a function/method.
For normal import module_name and from x import y you can do either a manual scanning of the code or use module_finder.
So if I want to copy one python script and all its dependencies (including the custom dynamically loaded modules) how should I do that ?
You can't; the very nature of programming (in any language) means that you cannot predict what code will be executed without actually executing it. So you have no way of telling which modules could be included.
This is further confused by user-input, consider: __import__(sys.argv[1]).
There's a lot of theoretical information about the first problem, which is normally described as the Halting problem, the second just obviously can't be done.
From a theoretical perspective, you can never know exactly what/where modules are being imported. From a practical perspective, if you simply want to know where the modules are, check the module.__file__ attribute or run the script under python -v to find files when modules are loaded. This won't give you every module that could possibly be loaded, but will get most modules with mostly sane code.
See also: How do I find the location of Python module sources?
This is not possible to do 100% accurately. I answered a similar question here: Dependency Testing with Python
Just an idea and I'm not sure that it will work:
You could write a module that contains a wrapper for __builtin__.__import__. This wrapper would save a reference to the old __import__and then assign a function to __builtin__.__import__ that does the following:
whenever called, get the current stacktrace and work out the calling function. Maybe the information in the globals parameter to __import__ is enough.
get the module of that calling functions and store the name of this module and what will get imported
redirect the call the real __import__
After you have done this you can call your application with python -m magic_module yourapp.py. The magic module must store the information somewhere where you can retrieve it later.
That's quite of a question.
Static analysis is about predicting all possible run-time execution paths and making sure the program halts for specific input at all.
Which is equivalent to Halting Problem and unfortunately there is no generic solution.
The only way to resolve dynamic dependencies is to run the code.

How can I sandbox Python in pure Python?

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).
This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.
Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py
AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.
I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?
I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}
Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)
I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

Python Auto Importing [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Perl's AUTOLOAD in Python (getattr on a module)
I'm coming from a PHP background and attempting to learn Python, and I want to be sure to do things the "Python way" instead of how i've developed before.
My question comes from the fact in PHP5 you can set up your code so if you attempt to call a class that doesn't exist in the namespace, a function will run first that will load the class in and allow you to continue on as if it were already loaded. the advantages to this is classes weren't loaded unless they were called, and you didn't have to worry about loading classes before using them.
In python, there's alot of emphasis on the import statement, is it bad practice to attempt an auto importing trick with python, to alleviate the need for an import statement? I've found this module that offers auto importing, however I dont know if that's the best way of doing it, or if auto importing of modules is something that is recommended, thoughts?
Imports serve at least two other important purposes besides making the modules or contents of the modules available:
They serve as a sort of declaration of intent -- "this module uses services from this other module" or "this module uses services belonging to a certain class" -- e.g. if you are doing a security review for socket-handling code, you can begin by only looking at modules that import socket (or other networking-related modules)
Imports serve as a proxy for the complexity of a module. If you find yourself with dozens of lines of imports, it may be time to reconsider your separation of concerns within the module, or within your application as a whole. This is also a good reason to avoid "from foo import *"-type imports.
In Python, people usually avoid auto imports, just because it is not worth the effort. You may slightly remove startup costs, but otherwise, there is no (or should be no) significant effect. If you have modules that are expensive to import and do a lot of stuff that doesn't need to be done, rather rewrite the module than delay importing it.
That said, there is nothing inherently wrong with auto imports. Because of the proxy nature, there may be some pitfalls (e.g. when looking at a thing that has not been imported yet). Several auto importing libraries are floating around.
If you are learning Python and want to do things the Python way, then just import the modules. It's very unusual to find autoimports in Python code.
You could auto-import the modules, but the most I have ever needed to import was about 10, and that is after I tacked features on top of the original program. You won't be importing a lot, and the names are very easy to remember.

Ctypes pro and con

I have heard that Ctypes can cause crashes (or stop errors) in Python and windows. Should I stay away from their use? Where did I hear? It was back when I tried to control various aspects of windows, automation, that sort of thing.
I hear of swig, but I see Ctypes more often than not. Any danger here? If so, what should I watch out for?
I did search for ctype pro con python.
In terms of robustness, I still think swig is somewhat superior to ctypes, because it's possible to have a C compiler check things more thoroughly for you; however, this is pretty moot by now (while it loomed larger in earlier ctypes versons), thanks to the argtypes feature #Mark already mentioned. However, there is no doubt that the runtime overhead IS much more significant for ctypes than for swig (and sip and boost python and other "wrapping" approaches): so, I think of ctypes as a convenient way to reach for a few functions within a DLL when the calls happen outside of a key bottleneck, not as a way to make large C libraries available to Python in performance-critical situations.
For a nice middle way between the runtime performance of swig (&c) and the convenience of ctypes, with the added bonus of being able to add more code that can use a subset of Python syntax yet run at just about C-code speeds, also consider Cython -- a python-like language that compiles down to C and is specialized for writing Python-callable extensions and wrapping C libraries (including ones that may be available only as static libraries, not DLLs: ctypes wouldn't let you play with those;-).
ctypes is a safe module to use, if you use it right.
Some libraries provide a lower level access to things, some modules simply allow you to shoot yourself in the foot. So naturally some modules are more dangerous than others. This doesn't mean you should not use them though!
You probably heard someone referring to something like this:
#Crash python interpreter
from ctypes import *
def crashme():
c = c_char('x')
p = pointer(c)
i = 0
while True:
p[i] = 'x'
i += 1
The python interpreter crashing is different than just the python code itself erroring out with a runtime error. For example infinite recursion with a default recursion limit set would cause a runtime error but the python interpreter would still be alive afterwards.
Another good example of this is with the sys module. You wouldn't stop using the sys module though because it can crash the python interpreter.
import sys
sys.setrecursionlimit(2**30)
def f(x):
f(x+1)
#This will cause no more resources left and then crash the python interpreter
f(1)
There are many libraries as well that provide lower level access. For example the The gc module can be manipulated to give access to partially constructed object, accessing fields of which can cause crashes.
Reference and ideas taken from: Crashing Python
ctypes can indeed cause crashes, if the C library you're using can already cause crashes.
If anything, ctypes can help reduce crashes, because you can enforce runtime type safety with the argtypes property on C functions using ctypes.
But if your C library is already stable and tested, there is absolutely no reason not to use ctypes if it performs what you need in terms of bringing C and Python together.
I highly suggest you look into reading this book:
Gray Hat Python: Python Programming for Hackers and Reverse Engineers
The book functions as an in-depth tutorial for the ctypes library, and shows you how to run incredibly low-level code

Categories