Adding custom dllmain to cythonized module

Adding custom dllmain to cythonized module - python

Consider the following scenario:
A python tool should be deployed for external users in a way that (a) the source is protected and (b) it is ensured that only users with valid license can use it. Furthermore, for internal users, the code should be available as pure python, so debugging with a python debugger is possible and no compiling is necessary.
A license check is available and written in C.
(a) is solved by using cython.
Thoughts on (b):
The license check could be applied to a "central" part of the program (i.e., "a class that contains enough business logic so that no one could replace it"). Of course I could use cython to link the license check to this class's *.pyd and use the c function, but then it would be impossible for the internal user to use / debug the class. On the other hand, applying the license check to a class that is "unimportant" while debugging would mean that it could (presumably) be exchanged rather easily.
Creating a separate function "licenseIsValid() -> bool" and calling it from a central position in the code is not feasible as it can be replaced easily.
As a matter of fact, (almost) everything written in python can be replaced and therefore is unsafe.
So I came up with the idea of implementing the license check before anything python-related is executed: inside of the dllmain function. I could call the license check function on startup of the dll / pyd and stop the application before executing any cython code.
Is there a way to add a custom dllmain to a cythonized module? Is it even safe to add a dllmain (or does cython require a "specialized" dllmain)?
One last remark: I am aware that python is the wrong choice of language for creating closed source code with license check. But the advantages of using python far outweigh this...
Thank you in advance,
Jan

Related

Trigger wrapped C/C++ rebuild when instantiating a Julia "object" (that calls the wrapped C/C++ functions)

I found the following paradigm quite useful and I would love to be able to reproduce it somehow in Julia to take advantage of Julia's speed and C wrapping capabilities.
I normally maintain a set of objects in Python/Matlab that represent the blocks (algorithms) of a pipeline. Set up unit-tests etc.
Then develop the equivalent C/C++ code by having equivalent python/Matlab objects (same API) that wrap C/C++ to implement the same functionality and have to pass the same tests (by this I mean the exact same tests written in python/Matlab where either I generate synthetic data or I load recorded data).
I will maintain the full-python and python/C++ objects in parallel enforcing parity with extensive test suites. The python only and python/C++ versions are fully interchangeable.
Every time I need to modify the behavior of the pipeline, or debug an issue, I first use the fully pythonic version of the specific object/block I need to modify, typically in conjunction with other blocks running in python/C++ mode for speed, then update the tests to match the behavior of the modified python block and finally update the C++ version until it reaches parity and passes the updated tests.
Evey time I instantiate the Python/C++ version on the block, in the constructor I run a "make" that rebuilds the C++ code if there was any modification. To make sure I always test the latest version of the C++.
Is there any elegant way to reproduce the same paradigm with the Julia/C++ combination? Maintaining julia/C++ versions in parallel via automatic testing.
I.e. how do I check/rebuild the C++ only once when I instantiate the object and not per function call (it would be way too slow).
I guess I could call the "make" once at the test-suite level before I run all the tests of the different blocks. But then I will have to manually call it if I'm writing a quick python script for a debugging session.
Let's pick the example of a little filter object with a configure method that changes the filter parameters and a filter method the filters the incoming data.
We will have something like:
f1 = filter('python');
f2 = filter('C++'); % rebuild C++ as needed
f1.configure(0.5);
f2.configure(0.5);
x1 = data;
x2 = data;
xf1 = f1.filter(x1);
xf2 = f2.filter(x2);
assert( xf1 == xf2 )
In general there will be a bunch of tests that instantiate the objects in both python-only mode or python/C++ mode and test them.
I guess what I'm trying to say is that since in Julia the paradigm is to have a filter type, and then "external" methods that modify/use the filter type there is no centralized way to check/rebuild all its methods that wrap C code. Unless the type contains a list of variable that keep track of the relevant methods. Seems awkward.
I would appreciate comments / ideas.

Is there a reason why you can't wrap your functions in a struct like this?
struct Filter
FilterStuff::String
Param::Int
function Filter(stuff::String, param::Int)
# Make the C++ code here
# Return the created object here
obj = new(stuff, param)
end
end

Disable built-in module import in embedded Python

I'm embedding Python 3.6 in my application, and I want to disable import command in the scripts to prevent users to import any python built-in libraries. I'd like to use only the language itself and my own C++ defined modules.
Py_SetProgramName (L"Example");
Py_Initialize ();
PyObject* mainModule = PyImport_AddModule ("__main__");
PyObject* globals = PyModule_GetDict (mainModule);
// This should work
std::string script1 = "print ('example')";
PyRun_String (script1.c_str (), Py_file_input, globals, nullptr);
// This should not work
std::string script2 = "import random\n"
"print (random.randint (1, 10))\n";
PyRun_String (script2.c_str (), Py_file_input, globals, nullptr);
Py_Finalize ();
Do you know any way to achieve this?

Python has a long history of being impossible to create a secure sandbox (see How can I sandbox Python in pure Python? as a starting point, then dive into an old python-dev discussion if you feel like it). Here are what I consider to be your best two options.
Pre-scan the code
Before executing anything, scan the code. You could do this in Python with the AST module and then walk the tree, or can likely get far enough with simpler text searches. This likely works in your scenario because you have restricted use cases - it doesn't generalize to truly arbitrary code.
What you are looking for in your case will be any import statements (easy), and any top-level variables (e.g., in a.b.c you care about a and likely a.b for a given a) that are not "approved". This will enable you to fail on any code that isn't clean before running it.
The challenge here is that even trivally obfuscated code will bypass your checks. For example, here are some ways to import modules given other modules or globals that a basic scan for import won't find. You would likely want to restrict direct access to __builtins__, globals, some/most/all names with __double_underscores__ and members of certain types. In an AST, these will unavoidably show up as top-level variable reads or attribute accesses.
getattr(__builtins__, '__imp'+'ort__')('other_module')
globals()['__imp'+'ort__']('other_module')
module.__loader__.__class__(
"other_module",
module.__loader__.path + '/../other_module.py'
).load_module()
(I hope it goes somewhat without saying, this is an impossible challenge, and why this approach to sandboxing has never fully succeeded. But it may be good enough, depending on your specific threat model.)
Runtime auditing
If you are in a position to compile your own Python runtime, you might consider using the (currently draft) PEP 551 hooks. (Disclaimer: I am the author of this PEP.) There are draft implementations against the latest 3.7 and 3.6 releases.
In essence, this would let you add hooks for a range of events within Python and determine how to respond. For example, you can listen to all import events and determine whether to allow or fail them at runtime based on exactly which module is being imported, or listen to compile events to manage all runtime compilation. You can do this from Python code (with sys.addaudithook) or C code (with PySys_AddAuditHook).
The Programs/spython.c file in the repo is a fairly thorough example of auditing from C, while doing it from Python looks more like this (taken from my talk about this PEP):
import sys
def prevent_bitly(event, args):
if event == 'urllib.Request' and '://bit.ly/' in args[0]:
print(f'WARNING: urlopen({args[0]}) blocked')
raise RuntimeError('access to bit.ly is not allowed')
sys.addaudithook(prevent_bitly)
The downside of this approach is you need to build and distribute your own version of Python, rather than relying on a system install. However, in general this is a good idea if your application is dependent on embedding as it means you won't have to force users into a specific system configuration.

Security and defensive coding perspective on Python Capsules in the Python/C API

I am developing a project that uses embedded Python. This project requires access to a local variable from a C function called by the Python interpreter. I was using global variables, but I read the answer to this question which states:
Capsules are basically python-opaque void pointers that you can pass
around or associate with modules. They are "the way" to solve your
problem.
My question is how is this not a huge security vulnerability? From what I understand, Python has no interpreter checks on accessing private variables. If you are passing around a pointer that is accessible by user-defined Python scripts, couldn't the user theoretically cause a segmentation fault or run arbitrary code by simply accessing the capsule, setting it to another value, and then running the C function from Python that operates on the pointer in the capsule?
EDIT
Title has been updated to reflect follow up question:
So I now see that there are more pressing concerns if someone has access to a Python script being run by a trusted interpreter than capsules. My follow up question is how this is not considered a Really Bad Idea™ from a software development standpoint? I would prefer not to even give my users the ability to interface with my C code in a way that can cause a segmentation fault (even if they would have to modify private variables to do so). This does not sound like defensive coding to me. Is this encompassed by the "Python culture" argument or is there a way to use capsules in which you can assure that you can recover from potential segmentation faults or even protect against them?

Python code already runs with full user privileges, and it already has plenty of ways of executing external applications (os.system(), subprocess, etc) or call arbitrary functions (e.g, using ctypes or cffi) without getting capsules involved. In short, Python code is allowed to do pretty much whatever it wants to, so there's no "security vulnerability" in that it's able to do this.
If you don't trust your users, don't allow them to load scripts into your Python interpreter.

OOP programming in python

I was reading Dietel's C++ programming book. In this book they mention how a programmer should release only the interface part of his code and not the implementation.
So carrying this over to python:
I have 2 files:
1) the implementation file = accountClass.py and
2) the interface file = useAccountClass.py
I have compiled the implementation file and have obtained the .pyc file. So when I provide my code to someone else, I would provide him with the .pyc file and the interface file, right?
Also, if I provide someone else with ONLY the .pyc file, can I expect him to write the interface on his own? I'm going to say no. But there's this one nagging doubt that I have:
The creators of numpy and scipy did not share the implementation with us end users. And I don't think they shared any interfaces either. But we can still search for the different classes and their methods inside both numpy and scipy. So, using this example of numpy and scipy, I guess what I'm trying to ask is:
Is it possible for someone else to create an interface to my code if I provide him/ her with only the compiled implementation file (in this case accountClass.pyc)? How will that person know what classes and methods I have defined in my implementation? I mean, will they use the
if __name__ = "__main__" :
blah blah
or is there some other way??

You got that entirely wrong. Or perhaps it's a horrible book whose author got something seriously wrong. Code using other code should indeed, barring significant counterarguments, adhere to an interface and not care about the details of the implementation. However, even in the world of static compilation to machine code (e.g. C++), this does not mean you should lock away the source code of the implementation.
Whether someone has access to the implementation, and whether they make use of that knowledge while writing a specific piece of code, are completely different issues. Heck, even the author of the implementation can/should still program to an interface when working on other code (e.g. other modules). Likewise, even if you lock the implementation away from someone, they may very well rely on implementation quirks which are not part of the interface. If anyone in the world of static compilation to machine code provides only headers and object files, and not the source code, it's because the projects are closed source, not to encourage good programming practices among clients.
In Python, your question makes no sense - there are no "interface" and "implementation" files, there's just code which is run and defines functions, classes, and other values. There is no such thing as an interface file you'd provide. You provide an implementation - and (hopefully) documentation which details both interface and possibly implementation details. And once a module is imported, the class objects, function objects, and other objects, contain plenty of information (including, in many cases, the text from which large parts of the documentation was generated). This is also true for extension modules like numpy. And note that their implementation is accessible, it's just not included in all distributions because it's of little use. With Python code, you practically have to distribute the source code because anything else is platform-specific.
On a side note, .pyc files are pretty high level, and easily understood when disassembled (which is as easy as importing the module and running the stdlib module dis on any function inside). I consider this a minor technicality as it's already the wrong question to ask.

Deitel's advice to C++ programmers doesn't apply to Python, for a number of reasons:
Python isn't compiled to machine code, so no matter what form you provide the program in, it will be relatively easy for someone to read the code.
Python doesn't have .h and .c files, all you can provide is the .py or .pyc files.
Treating code as a secret is kind of silly anyway. What is in your code that you need to keep hidden from others?
Numpy and Scipy are largely implemented in C, which is why you don't have the source, for your own convenience. You can get the source if you like. The "interface" to that code is the module that you can import and then call.

You should not confuse "user interface" with "class interface". If you have a useAccountClass file, that file probably performs some task using the classes and methods defined in the accountClass file, if I understood right.
If you send the file to other person, they are not supposed to "guess" what your compiled class does. That's what DOCUMENTATION is for: a description of the functions contained in the module (compiled or not), which parameters they take, which values they return, and what they are expected to do, the "meaning" of the task they perform.
As an abstract example, let's suppose you have an image processing class. If that class has the function findCircles(image), the documentation should explain that it takes an image, possibly containing circles, and returns a list or array of coordinates of the centers of circles contained in the image. HOW the circles are detected is not important, you don't need to know that to use the function. Now if the function was called like findCircles(image, gaussian_threshold=10), the caller would have to know the function uses some "gaussian_threshold" parameter, that is, the caller would NEED to know about the function's entrails, and in OOP this is Not Good. If you decided to use another algorithm in the future, every code using that function would have to be rewritten, because the gaussian_threshold most probably wouldn't make sense anymore.
So, the interface, in OOP, is the abstraction used to communicate to the object only the canonical parameters or inputs it needs to know to perform a task in the language of the problem, not in the language of the implementation (that can change anytime).
The documentation, in this sense, is a contract that assures to the user (in this case, another developer) that the function will perform as expected if sane inputs are given to it.
Now the FINAL USER, a non-technical person wanting to use your program, would need the WHOLE working program (controls and views), not only the class definitions (the model).
Hope this helps, and I must recommend the books "Code Complete 2nd ed." and "Pragmatic Programmer - From Journeyman to Master" as VERY enlightening readings on the broad topic.

How can I sandbox Python in pure Python?

I'm developing a web game in pure Python, and want some simple scripting available to allow for more dynamic game content. Game content can be added live by privileged users.
It would be nice if the scripting language could be Python. However, it can't run with access to the environment the game runs on since a malicious user could wreak havoc which would be bad. Is it possible to run sandboxed Python in pure Python?
Update: In fact, since true Python support would be way overkill, a simple scripting language with Pythonic syntax would be perfect.
If there aren't any Pythonic script interpreters, are there any other open source script interpreters written in pure Python that I could use? The requirements are support for variables, basic conditionals and function calls (not definitions).

This is really non-trivial.
There are two ways to sandbox Python. One is to create a restricted environment (i.e., very few globals etc.) and exec your code inside this environment. This is what Messa is suggesting. It's nice but there are lots of ways to break out of the sandbox and create trouble. There was a thread about this on Python-dev a year ago or so in which people did things from catching exceptions and poking at internal state to break out to byte code manipulation. This is the way to go if you want a complete language.
The other way is to parse the code and then use the ast module to kick out constructs you don't want (e.g. import statements, function calls etc.) and then to compile the rest. This is the way to go if you want to use Python as a config language etc.
Another way (which might not work for you since you're using GAE), is the PyPy sandbox. While I haven't used it myself, word on the intertubes is that it's the only real sandboxed Python out there.
Based on your description of the requirements (The requirements are support for variables, basic conditionals and function calls (not definitions)) , you might want to evaluate approach 2 and kick out everything else from the code. It's a little tricky but doable.

Roughly ten years after the original question, Python 3.8.0 comes with auditing. Can it help? Let's limit the discussion to hard-drive writing for simplicity - and see:
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r')
or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']): raise IOError('file write forbidden')
addaudithook(block_mischief)
So far exec could easily write to disk:
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
But we can forbid it at will, so that no wicked user can access the disk from the code supplied to exec(). Pythonic modules like numpy or pickle eventually use the Python's file access, so they are banned from disk write, too. External program calls have been explicitly disabled, too.
WRITE_LOCK = True
exec("open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("open('/tmp/FILE','a').write('pwned by l33t h4xx0rz')", dict(locals()))
exec("numpy.savetxt('/tmp/FILE', numpy.eye(3))", dict(locals()))
exec("import subprocess; subprocess.call('echo PWNED >> /tmp/FILE', shell=True)", dict(locals()))
An attempt of removing the lock from within exec() seems to be futile, since the auditing hook uses a different copy of locals that is not accessible for the code ran by exec. Please prove me wrong.
exec("print('muhehehe'); del WRITE_LOCK; open('/tmp/FILE','w')", dict(locals()))
...
OSError: file write forbidden
Of course, the top-level code can enable file I/O again.
del WRITE_LOCK
exec("open('/tmp/FILE','w')", dict(locals()))
Sandboxing within Cpython has proven extremely hard and many previous attempts have failed. This approach is also not entirely secure e.g. for public web access:
perhaps hypothetical compiled modules that use direct OS calls cannot be audited by Cpython - whitelisting the safe pure pythonic modules is recommended.
Definitely there is still the possibility of crashing or overloading the Cpython interpreter.
Maybe there remain even some loopholes to write the files on the harddrive, too. But I could not use any of the usual sandbox-evasion tricks to write a single byte. We can say the "attack surface" of Python ecosystem reduces to rather a narrow list of events to be (dis)allowed: https://docs.python.org/3/library/audit_events.html
I would be thankful to anybody pointing me to the flaws of this approach.
EDIT: So this is not safe either! I am very thankful to #Emu for his clever hack using exception catching and introspection:
#!/usr/bin/python3.8
from sys import addaudithook
def block_mischief(event,arg):
if 'WRITE_LOCK' in globals() and ((event=='open' and arg[1]!='r') or event.split('.')[0] in ['subprocess', 'os', 'shutil', 'winreg']):
raise IOError('file write forbidden')
addaudithook(block_mischief)
WRITE_LOCK = True
exec("""
import sys
def r(a, b):
try:
raise Exception()
except:
del sys.exc_info()[2].tb_frame.f_back.f_globals['WRITE_LOCK']
import sys
w = type('evil',(object,),{'__ne__':r})()
sys.audit('open', None, w)
open('/tmp/FILE','w').write('pwned by l33t h4xx0rz')""", dict(locals()))
I guess that auditing+subprocessing is the way to go, but do not use it on production machines:
https://bitbucket.org/fdominec/experimental_sandbox_in_cpython38/src/master/sandbox_experiment.py

AFAIK it is possible to run a code in a completely isolated environment:
exec somePythonCode in {'__builtins__': {}}, {}
But in such environment you can do almost nothing :) (you can not even import a module; but still a malicious user can run an infinite recursion or cause running out of memory.) Probably you would want to add some modules that will be the interface to you game engine.

I'm not sure why nobody mentions this, but Zope 2 has a thing called Python Script, which is exactly that - restricted Python executed in a sandbox, without any access to filesystem, with access to other Zope objects controlled by Zope security machinery, with imports limited to a safe subset.
Zope in general is pretty safe, so I would imagine there are no known or obvious ways to break out of the sandbox.
I'm not sure how exactly Python Scripts are implemented, but the feature was around since like year 2000.
And here's the magic behind PythonScripts, with detailed documentation: http://pypi.python.org/pypi/RestrictedPython - it even looks like it doesn't have any dependencies on Zope, so can be used standalone.
Note that this is not for safely running arbitrary python code (most of the random scripts will fail on first import or file access), but rather for using Python for limited scripting within a Python application.
This answer is from my comment to a question closed as a duplicate of this one: Python from Python: restricting functionality?

I would look into a two server approach. The first server is the privileged web server where your code lives. The second server is a very tightly controlled server that only provides a web service or RPC service and runs the untrusted code. You provide your content creator with your custom interface. For example you if you allowed the end user to create items, you would have a look up that called the server with the code to execute and the set of parameters.
Here's and abstract example for a healing potion.
{function_id='healing potion', action='use', target='self', inventory_id='1234'}
The response might be something like
{hp='+5' action={destroy_inventory_item, inventory_id='1234'}}

Hmm. This is a thought experiment, I don't know of it being done:
You could use the compiler package to parse the script. You can then walk this tree, prefixing all identifiers - variables, method names e.t.c. (also has|get|setattr invocations and so on) - with a unique preamble so that they cannot possibly refer to your variables. You could also ensure that the compiler package itself was not invoked, and perhaps other blacklisted things such as opening files. You then emit the python code for this, and compiler.compile it.
The docs note that the compiler package is not in Python 3.0, but does not mention what the 3.0 alternative is.
In general, this is parallel to how forum software and such try to whitelist 'safe' Javascript or HTML e.t.c. And they historically have a bad record of stomping all the escapes. But you might have more luck with Python :)

I think your best bet is going to be a combination of the replies thus far.
You'll want to parse and sanitise the input - removing any import statements for example.
You can then use Messa's exec sample (or something similar) to allow the code execution against only the builtin variables of your choosing - most likely some sort of API defined by yourself that provides the programmer access to the functionality you deem relevant.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.