Cleanly implementing run-time dynamic linking

Cleanly implementing run-time dynamic linking - python

My question is going to use examples from Python, but it seems like this could be a general question.
I've been using load-time dynamic linking, but for various reasons (it's recommended in the link below) I'd like to dynamically load the Python library:
HINSTANCE hModPython = LoadLibrary(_T("Python27.dll"));
I'm able to load Py_Initialize and other functions from the DLL, but it's a dirty process:
int (*pPy_Initialize)(void);
pPy_Initialize = (int (*)(void))GetProcAddress(hModPython, "Py_Initialize");
pPy_Initialize();
In this conversation it's said that:
Macros can make using these pointers transparent to any C code that calls routines in Python’s C API.
My question is really how to do what this writer suggests when I'm going to be importing a wide variety of functions, with various signatures. It would be nice to use the signatures that are already in Python.h (including that header somehow).

I would do like the system linker does: construct a symbol table containing all the function names. Then just initialise the pointers in that table. Function names can either be fixed string constants, or they might be read from the DLL itself (i.e. Win32 API to enumerate dll export functions?).
Significant drawback of that table approach, though, is impossibility to use it with existing code, which calls the functions by name (pPy_Initialize();) -- you'd have to use the pointer in the table, perhaps indexed via enum (pPy[Initialize]();).
Different signatures can be handled by using different tables (a table per signature). Signatures can also be stored along with the names in some symbolic form, and then you'd wrap it in some accessor magic which could parse and check it -- but that could quickly become too complex, like inventing yet another programming language.
IMHO, the only significant advantage of all that weird machinery over macros is that you might be able to load arbitrary DLLs with it. Other than that, I wouldn't go that route.

Related

Python: How are objects/variables such as the complex '1j' defined?

And how can I define my own types of variables/objects to behave with the provided operations in python?
Can I do this without having to create my own object classes and operators as a separate and distinct entity? I dont mind having to create my own object class (thats a given), but I want it to integrate flawlessly with the already existing constructs. So emphasis on my wish to avoid "separate and distinct".
Im trying to create a quaternion object class. I want to define a 1i and 1k that are distinct from 1j.
And yes, a package might already exist; this is purely academic and for my own programming practice and understanding. Im trying to extend what is already there, and not build something that is distinct and separate.
I already do class objects but unfortunately they require a redefinition of the basic operations in order to make use of them, and even then I have to "declare" these objects before I can use them, quite unlike '1j'.
I hope I am clear with my intent. The end result of a quaternion is not my intent; it is the types of methods and objects and generalizations Im trying to figure out how to do, to extend and make use of what is already built into python.
It seems to me whoever built numpy and cmath have already been able to achieve this endeavor.
Thanks to the commentary below. I now have the vocabulary to express my intent better.
Im trying to add new features to Pythons SYNTAX.
If anyone can offer resources on how to do this, Id appreciate it.

I see two options for you here:
Change the Python syntax (fork CPython), there's a surprising amount of articles about how to do that.
Build some kind of preprocessor like Mypy.
Either way it seems like too much trouble just to have a new literal value.

Python does not support custom operators nor custom literals.
A language that supports custom literals is C++ (since C++11 I believe), but it does not support custom operators.
A language that supports custom operators is, for example, Haskell.
If you want to add this feature to Python you'll have to take the Python sources, modify its grammar, modify the lexer/parser and more importantly the compiler.
However at that point you just create a new language, which means you broke compatibility with python.
The easiest solution would simply be to write a simple preprocessor that replaces some simple syntax with an expanded equivalent. For example:
sed -i 's/(\d)+\+(\d+)i/MyComplex(\1, \2)/g' my_file.py
Then you can execute the preprocessor in the build step of your library/application.
This has the advantage of letting you write the code you want, but when you ship it/use it it is translated into normal python, keeping 100% compatibility with existing installations.
I believe using import hooks it would be possible to avoid having to ship the preprocessed version of your library... basically the preprocessor could be included in the import step and done on the fly. This would avoid having to deal with temporary preprocessed files.
The only requirement would be that people that need to use your library will have to install the import hook someway.

Can one author a library API – one accessible outside Python – using Cython?

I know Cython's purpose is to create Python extensions modules, but can the compiled libraries made with Cython be loaded by non-python programs? If not, why?

I doubt that you can load them directly on a non-python program; looking at the C code generated by the simplest Cython script it's apparent that you need all the Python scaffolding to make it work. That said, you can do it indirectly for example from C++. In C++ I use boost.python to embed the Python interpreter and load some of my modules and scripts. This may seem convoluted, but allows you to quickly use whatever extensions you already have written in Python from C++, provided that you build the appropriate gluing code (see the boost.python wiki).
The disadvantage of this approach is that you are really loading a full Python interpreter just to be able to use some extension. This wasn't an issue for me since I already had the python extension and was embedding Python in my application to provide basic scripting ability, but I would not use this approach to write new libraries.

There are two mechanisms by which you can make cdef’ed Cython structures available externally:
Use a cdef public declaration – and/or also
Use the api keyword.
Here’s one of the examples of both mechanisms, from the aforelinked pages:
cdef public struct Vehicle:
int speed
float power
cdef api void activate(Vehicle *v):
if v.speed >= 88 and v.power >= 1.21:
print "Time travel achieved"
Each of these methods will instruct the Cython compiler to generate a header (“.h”) file that you can then integrate with your orthogonal C/C++ project.
A cdef public declaration yields a file named modulename.h; using the structures in this file will require you to link with the compiled Cython extension module.
An api declaration (which may be used simultaneously with cdef public, if you so desire) yields a modulename_api.h file; code consuming an api-based header will not need to link to the extension module – but it will need to call the cdef’d function import_modulename() before any of the API code can be used (a tactic the Cython-inclined users of NumPy will find familiar).
In my personal experience, it takes little effort to expose and subsequently employ encythoned structures as an external API in this fashion, provided that the struct layouts dovetail well with the consuming code, and that you’re willing to contend with manually managing the GIL in your C/C++ code in order to make things work.

Is wrapping C++ library with ctypes a bad idea?

I read through the following two threads on wrapping C library and C++ library, I am not sure I get it yet. The C++ library I am working with does use class and template, but not in any over-sophisticated way. What are issues or caveats of wrapping it with ctypes (besides the point that you can do so in pure python etc)?
PyCXX , Cython and boost::python are three other choices people mentioned, is there any consensus which one is more suitable for C++?
Thanks
Oliver

In defence of boost::python, given Alexander's answer on ctypes:
Boost python provides a very "c++" interface between c++ and python code - even doing things like allowed python subclasses of c++ classes to override virtual methods is relatively straightforward. Here's a potted list of good features:
Allow virtual methods of C++ classes to be overridden by python subclasses.
Bridge between std::vector<>, std::map<> instances and python lists and dictionaries (using vector_indexing_suite and map_indexing_suite)
Automatic sharing of reference counts in smart pointers (boost::shared_ptr, etc) with python reference counts (and you can extend this to any smart pointer).
Fine grained control of ownership when passing arguments and returning values from functions.
Basically, if you have a design where you want to expose a c++ interface in a way faithful to the language, then boost::python is probably the best way to do it.
The only downsides are increased compile time (boost::python makes extensive use of templates), and sometimes opaque error messages if you don't get things quite right.

For C++ a library to be accessible from Python it must use C export names, which basically means that a function named foo will be accessible from ctypes as foo.
This can be achieved only by enclosing the public interface with export C {}, which in turn disallows function overloading and templates therein (only the public interface of the library to be wrapped is relevant, the inner workings are not and may use any C++ features they like).
Reason for this is that C++ compilers use a mechanism called name mangling to generate unique names for overloaded or templated symbols. While ctypes would still find a function provided you knew its mangled name, the mangling scheme depends on the compiler/linker being used and is nothing you can rely on. In short: do not use ctypes to wrap libraries that use C++ features in their public interface.
Cython takes a different approach. It aids you at building a C extension module that does the interfacing with the original library. Therefore, linking to the C++ library is done by the regular C++ linkage mechanism, thus avoiding the aforementioned problem. The trouble with Cython is that C extension libraries need to to be recompiled for every platform, but anyway, this applies to the C++ library to be wrapped as well.
Personally, I'd say that in most cases the time to fire up Cython is a time that is well-spent and will eventually pay off in comparison to ctypes (with an exception for really simple Cish interfaces).
I don't have any experience with boost.python, so I can't comment on it (however, I don't have the impression that it is very popular either).

Running unexported .dll functions with python

This may seem like a weird question, but I would like to know how I can run a function in a .dll from a memory 'signature'. I don't understand much about how it actually works, but I needed it badly. Its a way of running unexported functions from within a .dll, if you know the memory signature and adress of it.
For example, I have these:
respawn_f "_ZN9CCSPlayer12RoundRespawnEv"
respawn_sig "568BF18B06FF90B80400008B86E80D00"
respawn_mask "xxxxx?xxx??xxxx?"
And using some pretty nifty C++ code you can use this to run functions from within a .dll.
Here is a well explained article on it:
http://wiki.alliedmods.net/Signature_Scanning
So, is it possible using Ctypes or any other way to do this inside python?

If you can already run them using C++ then you can try using SWIG to generate python wrappers for the C++ code you've written making it callable from python.
http://www.swig.org/
Some caveats that I've found using SWIG:
Swig looks up types based on a string value. For example
an integer type in Python (int) will look to make sure
that the cpp type is "int" otherwise swig will complain
about type mismatches. There is no automatic conversion.
Swig copies source code verbatim therefore even objects in the same namespace
will need to be fully qualified so that the cxx file will compile properly.
Hope that helps.

You said you were trying to call a function that was not exported; as far as I know, that's not possible from Python. However, your problem seems to be merely that the name is mangled.
You can invoke an arbitrary export using ctypes. Since the name is mangled, and isn't a valid Python identifier, you can use getattr().
Another approach if you have the right information is to find the export by ordinal, which you'd have to do if there was no name exported at all. One way to get the ordinal would be using dumpbin.exe, included in many Windows compiled languages. It's actually a front-end to the linker, so if you have the MS LinK.exe, you can also use that with appropriate commandline switches.
To get the function reference (which is a "function-pointer" object bound to the address of it), you can use something like:
import ctypes
func = getattr(ctypes.windll.msvcrt, "##myfunc")
retval = func(None)
Naturally, you'd replace the 'msvcrt' with the dll you specifically want to call.
What I don't show here is how to unmangle the name to derive the calling signature, and thus the arguments necessary. Doing that would require a demangler, and those are very specific to the brand AND VERSION of C++ compiler used to create the DLL.
There is a certain amount of error checking if the function is stdcall, so you can sometimes fiddle with things till you get them right. But if the function is cdecl, then there's no way to automatically check. Likewise you have to remember to include the extra this parameter if appropriate.

Prototyping with Python code before compiling

I have been mulling over writing a peak-fitting library for a while. I know Python fairly well and plan on implementing everything in Python to begin with but envisage that I may have to re-implement some core routines in a compiled language eventually.
IIRC, one of Python's original remits was as a prototyping language, however Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
What should I know about designing functions/classes which I envisage will have to interface into the compiled language? And how much of these potential problems are dealt with by libraries such as cTypes, bgen, SWIG, Boost.Python, Cython or Python SIP?
For this particular use case (a fitting library), I imagine allowing users to define mathematical functions (Guassian, Lorentzian etc.) as Python functions which can then to be passed an interpreted by the compiled code fitting library. Passing and returning arrays is also essential.

Finally a question that I can really put a value answer to :).
I have investigated f2py, boost.python, swig, cython and pyrex for my work (PhD in optical measurement techniques). I used swig extensively, boost.python some and pyrex and cython a lot. I also used ctypes. This is my breakdown:
Disclaimer: This is my personal experience. I am not involved with any of these projects.
swig:
does not play well with c++. It should, but name mangling problems in the linking step was a major headache for me on linux & Mac OS X. If you have C code and want it interfaced to python, it is a good solution. I wrapped the GTS for my needs and needed to write basically a C shared library which I could connect to. I would not recommend it.
Ctypes:
I wrote a libdc1394 (IEEE Camera library) wrapper using ctypes and it was a very straigtforward experience. You can find the code on https://launchpad.net/pydc1394. It is a lot of work to convert headers to python code, but then everything works reliably. This is a good way if you want to interface an external library. Ctypes is also in the stdlib of python, so everyone can use your code right away. This is also a good way to play around with a new lib in python quickly. I can recommend it to interface to external libs.
Boost.Python: Very enjoyable. If you already have C++ code of your own that you want to use in python, go for this. It is very easy to translate c++ class structures into python class structures this way. I recommend it if you have c++ code that you need in python.
Pyrex/Cython: Use Cython, not Pyrex. Period. Cython is more advanced and more enjoyable to use. Nowadays, I do everything with cython that i used to do with SWIG or Ctypes. It is also the best way if you have python code that runs too slow. The process is absolutely fantastic: you convert your python modules into cython modules, build them and keep profiling and optimizing like it still was python (no change of tools needed). You can then apply as much (or as little) C code mixed with your python code. This is by far faster then having to rewrite whole parts of your application in C; you only rewrite the inner loop.
Timings: ctypes has the highest call overhead (~700ns), followed by boost.python (322ns), then directly by swig (290ns). Cython has the lowest call overhead (124ns) and the best feedback where it spends time on (cProfile support!). The numbers are from my box calling a trivial function that returns an integer from an interactive shell; module import overhead is therefore not timed, only function call overhead is. It is therefore easiest and most productive to get python code fast by profiling and using cython.
Summary: For your problem, use Cython ;). I hope this rundown will be useful for some people. I'll gladly answer any remaining question.
Edit: I forget to mention: for numerical purposes (that is, connection to NumPy) use Cython; they have support for it (because they basically develop cython for this purpose). So this should be another +1 for your decision.

I haven't used SWIG or SIP, but I find writing Python wrappers with boost.python to be very powerful and relatively easy to use.
I'm not clear on what your requirements are for passing types between C/C++ and python, but you can do that easily by either exposing a C++ type to python, or by using a generic boost::python::object argument to your C++ API. You can also register converters to automatically convert python types to C++ types and vice versa.
If you plan use boost.python, the tutorial is a good place to start.
I have implemented something somewhat similar to what you need. I have a C++ function that
accepts a python function and an image as arguments, and applies the python function to each pixel in the image.
Image* unary(boost::python::object op, Image& im)
{
Image* out = new Image(im.width(), im.height(), im.channels());
for(unsigned int i=0; i<im.size(); i++)
{
(*out)[i] == extract<float>(op(im[i]));
}
return out;
}
In this case, Image is a C++ object exposed to python (an image with float pixels), and op is a python defined function (or really any python object with a __call__ attribute). You can then use this function as follows (assuming unary is located in the called image that also contains Image and a load function):
import image
im = image.load('somefile.tiff')
double_im = image.unary(lambda x: 2.0*x, im)
As for using arrays with boost, I personally haven't done this, but I know the functionality to expose arrays to python using boost is available - this might be helpful.

The best way to plan for an eventual transition to compiled code is to write the performance sensitive portions as a module of simple functions in a functional style (stateless and without side effects), which accept and return basic data types.
This will provide a one-to-one mapping from your Python prototype code to the eventual compiled code, and will let you use ctypes easily and avoid a whole bunch of headaches.
For peak fitting, you'll almost certainly need to use arrays, which will complicate things a little, but is still very doable with ctypes.
If you really want to use more complicated data structures, or modify the passed arguments, SWIG or Python's standard C-extension interface will let you do what you want, but with some amount of hassle.
For what you're doing, you may also want to check out NumPy, which might do some of the work you would want to push to C, as well as offering some additional help in moving data back and forth between Python and C.

f2py (part of numpy) is a simpler alternative to SWIG and boost.python for wrapping C/Fortran number-crunching code.

In my experience, there are two easy ways to call into C code from Python code. There are other approaches, all of which are more annoying and/or verbose.
The first and easiest is to compile a bunch of C code as a separate shared library and then call functions in that library using ctypes. Unfortunately, passing anything other than basic data types is non-trivial.
The second easiest way is to write a Python module in C and then call functions in that module. You can pass anything you want to these C functions without having to jump through any hoops. And it's easy to call Python functions or methods from these C functions, as described here: https://docs.python.org/extending/extending.html#calling-python-functions-from-c
I don't have enough experience with SWIG to offer intelligent commentary. And while it is possible to do things like pass custom Python objects to C functions through ctypes, or to define new Python classes in C, these things are annoying and verbose and I recommend taking one of the two approaches described above.

Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
In C you cannot pass a function as an argument to a function but you can pass a function pointer which is just as good a function.
I don't know how much that would help when you are trying to integrate C and Python code but I just wanted to clear up one misconception.

In addition to the tools above, I can recommend using Pyrex
(for creating Python extension modules) or Psyco (as JIT compiler for Python).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Cleanly implementing run-time dynamic linking - python

Related

Python: How are objects/variables such as the complex '1j' defined?

Can one author a library API – one accessible outside Python – using Cython?

Is wrapping C++ library with ctypes a bad idea?

Running unexported .dll functions with python

Prototyping with Python code before compiling

Categories

Resources