Possible to autogenerate Cython bindings around a large, existing C library?

Possible to autogenerate Cython bindings around a large, existing C library? - python

In otherwords: *.h/*.c --[??POSSIBLE??]--> *.pxd/*.pyx
OK. I’ve done (I hope) enough digging around the Internet - but I think this is a good question so I’ll ask it straight.
There are a few related questions (e.g. Generate python bindings, what methods/programs to use or Wrapping a C library in Python: C, Cython or ctypes? ) but which don't quite sum up the situation that I’m asking which is perhaps for a more “high-level” approach (and specifically for an existing library, not generating new C from python).
I’ve got a little bit of experience of this myself having wrapped a wee bit of code before using Cython. Cython gets the thumbs up for speed and maintainability. That’s OK in my book for small/single bits of code - but, this time I’ve got a bit more on my plate…
And following the first of the three great virtues of a programmer - I want to do this with as minimal effort as possible.
So the real question here is how can I ease the creation by automated means of the .pxd, and possibly .pyx, files (i.e. to save time and not slip up miss-typing something).
This here seems to be the only real hint/note about how to do this - but most of the projects on it are defunct, old or sourceforge. Many only seem to work for C++ (this is C I'm doing here).
Does anyone still use them? Recently? Has anyone got a workflow or best practice for doing this? Am I simply just better doing it by hand?
My library is well defined by a set of header files. One containing defs of all the C struct/types and another containing prototypes for all the functions. But it's loooonnnggg...
Thanks for any tips.
UPDATE (25th August, 2015):
Right, so over the last few months when I had a spare moment, I tried:
CFFI (thank for #David pointing that out) - has a noble aim of "to call C code from Python without learning a 3rd language: existing alternatives require users to learn domain specific language (Cython, SWIG) or API (ctypes)” - but it didn’t quite fit the bill as it involved a fair degree of embedded C code in the actual python files (or loading it in). This would be a pretty manual process to do for a large library. Maybe I missed something…
SWIG is the granddaddy of Python binding, and is pretty solid. Fundamentally though, it is not “hands off” as I understand it - i.e. you need a separate specification file. For example, you have to edit all your C header files to indicate building a python module with a #define SWIG_FILE_WITH_INIT or use other annotations. SIP has the same issue here. You don’t auto-generate from the headers, you modify them to include your own directives and annotations and create a complete specification file.
cwrap - I’m on a Mac so I used this version for clang. https://github.com/geggo/cwrap Really poor doc - but using the source I finally got it to run and it generated…. an empty .pyx file from a pretty simple header of structs. Not so good.
xdress - This showed promise. The website is down so the docs are actually seemingly here. There’s an impressive amount of work gone into it and it looks straightforward to use. But it needed all the llvm headers (and a correctly linked version of clang). I had to use brew install llvm —with-clang. There is a xdressclang-3.5 branch, but it doesn’t seem to have enough fixes done. I tried tapping homebrew/versions for an earlier version of clang (install llvm33 / llvm34) and that got it built. Anyway, I digress… it worked great for a simple example, but the resulting ctypes files for the full library was pretty garbled and refused to build. Something in the AST C->Python is a bit awry...
ctypesgen wasn’t one I had encountered in the original search. The documentation is pretty sparse - or you might call it concise. It hasn’t seemingly had much work done on it the last 4 years either (and people enquiring on the issues list if the developers are ever going to further the project). I’ve tried running it, but sadly it seems to fall over with what I suspect/seems like issues with the Clang compiler cdefs.h use of _attribute_. I’ve tried things like -std=c11 but to no avail.
In conclusion, out of all the ones I’ve looked at I think xdress came the closest to the fully automated generation of python bindings. It worked fine for the simple examples given, but couldn’t handle the more complex existing library headers, with all the complexities of forward declarations, enumerated types, void pointers… It seems a well designed and (for a while) well maintained project, so there is possibly some way to circumvent these issues if someone were to take it on again.
Still, the question remains, does anyone have a robust toolchain for generating python wrappers from C headers automatically? I think the reality is there always has to be a bit of manual work, and for that CFFI looks the most “modern” approach (one of the best overviews/comparisons I encountered is here) - yet it always involves a specially edited cdef() version of any header files (e.g. Using Python's CFFI and excluding system headers).

I find ctypesgen great for autogeneration. I'm only using it with one or two python modules that I hope to open source, and I've been happy so far. Here's a quick example using it with zlib, but I also just tried it successfully with a few other libraries:
(Edit: I know you mentioned ctypesgen has problems on a mac, so maybe it needs someone to tweak it to work on OSX - I don't have OSX at home or I'd try it.)
Get ctypesgen:
git clone https://github.com/davidjamesca/ctypesgen.git
Run short script to call ctypesgen (replace zlib info with another library):
import os
ZLIB_INC_DIR = "/usr/include"
ZLIB_LIB_DIR = "/usr/lib/x86_64-linux-gnu"
ZLIB_LIB = "libz.so"
ZLIB_HEADERS = "/usr/include/zlib.h"
# Set location of ctypesgen.py
ctypesgen_path = 'ctypesgen/ctypesgen.py'
wrapper_filename = 'zlib.py'
cmd = "LD_LIBRARY_PATH={} {} -I {} -L {} -l {} {} -o {}".format(
ZLIB_LIB_DIR, ctypesgen_path, ZLIB_INC_DIR, ZLIB_LIB_DIR, ZLIB_LIB,
ZLIB_HEADERS, wrapper_filename)
print(cmd)
os.system(cmd)
Usage example:
python
>>> import zlib
>>> zlib.compress("asdfasdfasdfasdfasdf")
'x\x9cK,NIKD\xc3\x00T\xfb\x08\x17'

Related

Why is the GHC test suite written in Python, not Haskell?

I noticed that GHC (a widely-used Haskell compiler) has a test suite written in Python, not in Haskell (as I would naively expect). What is the history of this? Are there particular advantages to writing the test suite in a different language?
edit: Per a suggestion in the comments, I asked this in /r/haskell. It has now generated three answers, which I've quoted below:
tathougies said:
The test suite driver seems to be written in Python. Python is a good high-level scripting language.
It's like asking 'why does GHC use Make instead of haskell'? Probably because make is better at running shell programs with external dependency resolution built-in.
The tests themselves seem to be written in Haskell, verifying certain properties of the compiler and catching regressions. If they fail, it looks like the python driver is informed, and then would report the error to the user.
phadej added:
FWIW GHC's built system is being rewritten to use shake: the Haskell library.
eacameron said:
I don't know. But GHC doesn't have the luxury of using Haskell the same way you and I do. It has to bootstrap using a previous version of itself and it wants to avoid dependencies. Python is a pretty light-weight requirement since most systems (except Windows) come with it built in.

The commit message introducing Python explains a lot of it:
Revamp the testsuite framework. The previous framework was an
experiment that got a little out of control - a whole new language
with an interpreter written in Haskell was rather heavyweight and left
us with a maintenance problem.
So the new test driver is written in Python. The downside is that you
need Python to run the testsuite, but we don't think that's too big a
problem since it only affects developers and Python installs pretty
easily onto everything these days.
Highlights:
790 lines of Python, vs. 5300 lines of Haskell + 720 lines of <strange made-up language>.
the framework supports running tests in various "ways", which should
catch more bugs. By default, each test is run in three ways:
normal, -O, and -O -fasm. Additionally, if profiling libraries
have been built, another way (-O -prof -auto-all) is added. I plan
to also add a 'GHCi' way.
Running tests multiple ways has already shown up some new bugs!
documentation is in the README file and is somewhat improved.
the framework is rather less GHC-specific, and could without much
difficulty be coaxed into using other compilers. Most of the
GHC-specificness is in a separate configuration file (config/ghc).
Things may need a while to settle down. Expect some unexpected
failures.

Create a Python package from C++ library

I've been researching the topic of using C++ code in Python, but haven't found a generic clean flexible way to wrap C++ library in the Python package.
The question is whether it's possible to use existing complex C++ library to create a regular Python library, that can be called exactly like native Python libraries, such as NumPy or SciPy. If yes, any references would be much appreciated. If there are examples/tutorials available - it would be even more useful.
Thanks

There are many, many ways. Boost Python, http://www.boost.org/doc/libs/1_57_0/libs/python/doc/ , is very C++-specific and exploits C++ templates to the hilt (like all of Boost!-). Part of more general (less C++ specific) approaches include manual C coding of Python extensions, per https://docs.python.org/3/extending/extending.html ; SWIG, per http://www.swig.org/Doc1.3/SWIGPlus.html ; Cython, per http://docs.cython.org/src/userguide/wrapping_CPlusPlus.html ; ... and no doubt others I haven't come across yet.
The very existence of so many strong, actively maintained alternatives, hints that there's no "one size fits all" here! If you're a template wizard I bet you'll swear by Boost; if you're not, I guess you're more likely to swear at it -- and so on, and so forth.
Personally, I tend to end up using Cython (or even just ctypes!-) for experimenting, manual extension coding when I decide I want to do a lot of Python work using a certain C++ library (and performance is crucial) -- and SWIG at work, because that's the standard there. Haven't seriously used Boost in far too long -- a refresh on it goes on my not-so-tiny todo list for when my spare time gets more copious...:-).

Python interface for R Programming Language [duplicate]

This question already has answers here:
How do Rpy2, pyrserve and PypeR compare?
(4 answers)
Closed 7 years ago.
I am quite new to R, and pretty much used to python. I am not so comfortable writing R code. I am looking for python interface to R, which lets me use R packages in pythonic way.
I have done google research and found few packages which can do that:
Rpy2
PypeR
pyRserve
But not sure which one is better ? Which has more contributers and more actively used ?
Please note my main requirement is pythonic way for accessing R packages.

As pointed out by #lgautier, there is already another answer on this subject. I leave my answer here as it adds the experience of approaching R as a novice, knowing Python first.
I use both Python and R and sympathise with your need as a newcomer to R.
Since any answer you get will be subjective, I summarise a few points from my experience:
I use rpy2 as my interface and find it is 'Pythonic', stable, predictable, and effective enough for my needs. I have not used the other packages so this is not a comment on them, rather on the merits of rpy2 itself.
BUT do not expect that there will be an easy way of using R in Python without learning both. I find that adding an interface between the two languages allows ease of coding when you know both, but a nightmare of debugging for someone who is deficient in one of the languages.
My advice:
For most applications, Python has packages that allow you to do most of the things that you want to do in R, from data wrangling to plotting. Check out SciPy, NumPy, pandas, BioPython, matplotlib and other scientific packages, or even the full Anaconda or Enthought python distributions. This allows you to stay within the Python environment and provides you most of the power that you need.
At the same time, you will want R's vast range of specialised packages, so spend some time learning it in an interactive environment. I found it almost impossible to master even basic R on the command line, but RStudio and the tutorials at Quick-R and Learn-R got me going very fast.
Once you know both, then you will do magic with rpy2 without the horrors of cross-language debugging.
New Resources
Update on 29 Jan 2015
This answer has proved popular and so I thought it would be useful to point out two more recent resources:
Ralph Heinkel gave a great talk on this subject at EuroPython 2014. The video on Combining the powerful worlds of Python and R is available on the EuroPython YouTube channel. Quoting him:
The triplet R, Rserve, and pyRserve allows the building up of a network bridge from Python to R: Now R-functions can be called from Python as if they were implemented in Python, and even complete R scripts can be executed through this connection.
It is now possible to combine R and Python using rmagic in IPython/Jupyter greatly easing the work of producing reproducible research and notebooks that combine both languages.

A question about comparing rpy2, pyrserve, and pyper with each other was answered on the site earlier.
Regarding the number of contributors, I'd say that all 3 have a relatively small number. A site like Ohloh can give a more detailled answer.
How actively a package is used is tricky to determine. One indication might be the number of downloads, an other might be the number of posts on mailing lists or the number questions on a site like stackoverflow, the number of other packages using it or citing it, the number of CVs or job openings mentioning the package. As much as I believe that I could give a fair evaluation, I might also be seen as having a conflict of interest. ;-)
All three have their pros and cons. I'd say that you base you choice on that.

My personal experience has been with Rpy, not Rpy2. I used it for a while, but dropped it in favor of using system commands. A typical case for me was running a FORTRAN model using Python scripts, and post-processing with R. In my experience the easiest solution was to create a command line tool using R, which is quite straightforward (at least under Linux). The command line tool could be executed in the root of the model run, and the script would produce a set of R objects and plots in an Routput directory. The advantage of disconnecting R and Python in this way was that I could easily debug the R code separate from the Python code.
I think Rpy really shines when a lot of back and forth communication between R and Python is needed. But if the functionality is nicely separable, and the overhead of disk i/o is not too bad, I would stick to system calls. See ?system for more information regarding system calls, and Rscript for running R scripts as a command line tool.
Regarding your wish to write R code in a Python way, this is not possible as all the solutions require you to write R code in R syntax. For Rpy this means R syntax, but a little different (no . for example). I agree with #gauden that there is no shortcut in using R through Rpy.

bpython-like autocomplete and parameter description in Emacs Python Mode?

I've been using bpython for a while now for all of my Python interpreting needs. It's delightful, particularly when you're using unfamiliar new libraries, or libraries with a multitude of functions. In any case, it's nice to have a bpython interpreter running alongside what I'm doing, but it'd be even better if I had both the autocomplete-like feature, and the parameter description in the manner that bpython does while I'm editing code in Emacs. Am I completely crazy? Does anyone have an idea on how to do this?
Thanks,
Bradley Powers

You're not completely crazy.
python-mode can integrate with eldoc-mode to display the arg spec of the function you're calling at point. Just do M-x eldoc-mode while you're in a python file to turn it on and it should start working. It talks to an inferior python buffer to inspect the functions directly, so it should always be decently accurate. You can turn it on automatically for all new python-mode buffers with (add-hook 'python-mode-hook '(lambda () (eldoc-mode 1)) t) in your emacs startup file. Now, at this point I have to say that I don't do any regular python programming, and that when I tried it just now it didn't work. I spent a few minutes poking around in the source code and everything seems to be in place, but the code that it runs in the inferior process is just returning an empty string. Perhaps it's just my setup, or perhaps I'm reading the wrong source files; it's hard to say.
Emacs provides several different types of expansion/autocompletion. By default you have access to dabbrev-expand by hitting M-/. This is a fairly simple form of completion; it's just meant to work on any old file you happen to edit. More sophisticated is hippie-expand, but even that doesn't do anything python-specific. The documentation says that it can integrate with hippie-expand for exact completions, but this might be a lie; I couldn't figure out how it works. A little poking around shows several related solutions for this, all of which seem to rely on pymacs. If I were going to do a lot of python programming and didn't already have a fairly complicated emacs set up, I'd probably start by installing emacs-for-python. It looks to be a pretty complete setup, and even claims to have live warning/error detection.
In the spirit of helping others to help themselves, I'd like to point out how I came upon all of this information. My first step was to open a file in python-mode. I didn't actually have any python code available, so I just went to my scratch buffer and made it a python buffer (M-x python-mode). Then I asked for help about this strange new mode (C-h m) to see what it could do. It's author has kindly put a brief summary of what the mode can do which mentions eldoc-mode, Imenu, outline-mode, hippie-expand, rlcompleter, abbrev tables, and a bunch of other things. From there I started looking at the source code. For instance, to integrate with eldoc-mode, it defines a function called python-eldoc-function and gives that to the eldoc module for use in python buffers. Reading that code shows me how it interacts with the inferior buffer, etc.
I hope some of this helps.

What are the downsides of using Python instead of Objective-C? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I know some Python and I'm really impressed by the language's ease of use. From what I've seen of Objective-C it looks a lot less pretty, but it seems to be the lingua franca for Mac OS X development (which means it has better documentation).
I'm thinking about starting Mac development - will using PyObjC+Python make me a second class citizen?

Yes.
For one thing, as you note, all the documentation is written for Objective-C, which is a very different language.
One difference is method name. In Objective-C, when you send a message to (Python would say “call a method of”) an object, the method name (selector) and arguments are mixed:
NSURL *URL = /*…*/;
NSError *error = nil;
QTMovie *movie = [QTMovie movieWithURL:URL
error:&error];
This isn't possible in Python. Python's keyword arguments don't count as part of the method name, so if you did this:
movie = QTMovie.movieWithURL(URL, error = ???)
you would get an exception, because the QTMovie class has no method named movieWithURL; the message in the Objective-C example uses the selector movieWithURL:error:. movieWithURL: and movieWithURL would be two other selectors.
There's no way they can change this, because Python's keyword arguments aren't ordered. Suppose you have a hypothetical three-argument method:
foo = Foo.foo(fred, bar=bar, baz=baz)
Now, this calls foo:bar:baz:, right?
Not so fast. Foo may also have a method named foo:baz:bar:. Because Python's keyword arguments aren't ordered, you may actually be calling that method. Likewise, if you tried to call foo:baz:bar:, you may actually end up calling foo:bar:baz:. In reality, this case is unlikely, but if it ever happens, you would be unable to reliably call either method.
So, in PyObjC, you would need to call the method like this:
movie = QTMovie.movieWithURL_error_(URL, ???)
You may be wondering about the ???. C doesn't allow multiple return values, so, in Objective-C, the error: argument takes a pointer to a pointer variable, and the method will store an object in that variable (this is called return-by-reference). Python doesn't have pointers, so the way the bridge handles arguments like this is that you pass None, and the method will (appear to) return a tuple. So the correct example is:
movie, error = QTMovie.movieWithURL_error_(URL, None)
You can see how even a simple example deviates from what documentation might show you in Objective-C.
There are other issues, such as the GIL. Cocoa apps are only going to get more concurrent, and you're going to want in on this, especially with tempting classes like NSOperation lying around. And the GIL is a serious liability, especially on multi-core machines. I say this as a Python guy myself (when not writing for Cocoa). As David Beazley demonstrates in that video, it's a cold, hard fact; there's no denying it.
So, if I were going to switch away from Objective-C for my apps, I would take up MacRuby. Unlike with PyObjC and RubyCocoa, messages to Cocoa objects don't cross the language bridge; it's a from-the-ground-up Ruby implementation in Cocoa, with language extensions to better support writing Cocoa code in it.
But that's too far ahead of you. You're just getting started. Start with Objective-C. Better to avoid all impedance mismatches between the language you're using and the one the documentation is written for by keeping them the same language.
Plus, you'll find some bugs (such as messages to deceased objects) harder to diagnose without knowledge of how Objective-C works. You will write these bugs as a new Cocoa programmer, regardless of which language you're writing the code in.
So, learn C, then learn Objective-C. A working knowledge of both shouldn't take more than a few weeks, and at the end of it, you'll be better prepared for everything else.
I won't go into how I learned C; suffice to say that I do not recommend the way I did it. I've heard that this book is good, but I've never owned nor read it. I do have this book, and can confirm that it's good, but it's also not Mac-specific; skip the chapter on how to compile the code, and use Xcode instead.
As for Objective-C: The Hillegass book is the most popular, but I didn't use it. (I have skimmed it, and it looks good.) I read Apple's document on the language, then jumped right in to writing small Cocoa apps. I read some of the guides, with mixed results. There is a Currency Converter tutorial, but it didn't help me at all, and doesn't quite reflect a modern Cocoa app. (Modern apps still use outlets and actions, but also Bindings, and a realistic Currency Converter would be almost entirely a couple of Bindings.)

This really says it all:
As the maintainer of PyObjC for nearly
15 years, I will say it bluntly. Use
Objective-C. You will need to know
Objective-C to really understand Cocoa
anyway and PyObjC is just going to add
a layer of bugs & issues that are
alien to 99% of Cocoa programmers.
a comment in an answer to this question. This question is also interesting.

DO NOT ATTEMPT to avoid learning objective-C if you're going to write apps for the Mac. The purpose of PyObjC and the other language bindings is to let you re-use existing libraries in your apps, not to let you avoid learning the native tools.

Second class citizen seems a bit strong. The Objective-C API's are available from Python as well, should you need them, and that's mostly if you want to make Cocoa apps. But then they are restricted to OS X anyway. Personally, I have no interest in building apps that isn't cross-platform, but that's me. That also means I haven't actually done this, so I don't know how tricky it is, but there was an article in the Python Magazine not long ago, and it didn't look that horrible.
The major drawback of Python is execution time, and that mainly comes from it being a dynamic language. This can be solved with Cython and C-extensions, etc, but then you get a mix of Python + ObjectiveC API's + Cython which can be daunting.
So it depends a lot of what kinds of applications you are going to make. Something uniquely OSX-ish that makes no sense anywhere else? ObjectiveC is probably the ticket. Cross-platform servers, well then Python rocks! Something else? Then it depends.

This is something I've been wondering myself, and although I hope someone comes by with more experience, from what I know you will not be seriously constrained by Python itself. Along with Java and GCC, Python is an excellent way to write native cross-platform applications. Once you get the hang of it you should be able to map example code in Objective C to your Python code.
Since you have access to all libraries and events, everything that you can do in Objective C will be there in Python. Of course, the more OS X-only calls and functions you use, the less easy it will be to port to another platform, but that's beside the point. Usually graphics programming and working with device drivers is somewhat of a limiting factor - but in both cases I'm finding evidence of good support and community libraries (search for Python and Quartz, Lightblue, libhid, PyUSB, for some examples).
The decisive factor for me would be: what is the level of tooling and IDE support that is needed. Apple provides some great software for building new software, but then again with something like Pydev you've got a great place to write Python code too! http://pydev.org/
So give it a try, I'm sure you won't regret it, and there will be a supportive community to draw on for help and insipiration.

You're going to need Objective-C: that's what all the tutorials, documentation, sample code, and everything is written in. In addition to a wilder variety of people being able to help you.
So learn ObjC first. If, on your second or third project, or a year down the road, you start a project that needs a Python module (like, say, Twisted, or SQLAlchemy. But a SERIOUS need like foundation of your app need, where the extra boost your app gets makes everything worth it), then you can write a PyObjC app and get a lot of the speed benefits of that language, with your background in Cocoa.

Just as an extra option, consider that wxPython can produce some pretty good applications on Mac as well as on Linux and Windows. For the most part you can get native appearance but maintain portability with little or no attention to platform-specific issues.
In other words, PyObjC + Python is not the only way to do Mac development with Python.

No you dont need to know Objective C you dont need to use PyObjC , and you wont be a second class citizent.
Unless you want to do something extremely specific to the MAC platform , coding in Objective C or using PyObjC is a really bad idea.
The reason is obvious, once you go the objc route you say a big "goodbye" to other platforms. Its that simple.
Apple does not want you to code for other platforms the same way Microsoft does not want you to code for other platforms. And that is why more and more developers are turning to open source languages like, python, java, ruby etc. Because you dont care what Apple and Microsot , you only care about an App that is the most useful and most easy to develop. And making your App available only for MAC will make it less useful and obviously developing in Objective C is way more difficult.
Python has more than enough libraries to accomodate you , hundrends of them , readily available for the mac platform. I for instance develope a new application in pygame, no its not a game, if I have done the same thing in ObjC or PyObj I would have to rewrite the code for windows and linux. While with pygame my code works exactly the same in windows and linux even though my main platform is macos.
Thats the appeal of most python libraries , they are cross platform. WxPython is another example, someone mentioned that "it does not exactly look natively" , do you want this to stop you from making your application available for windows and linux. Why limit yourself only on the MAC platform ? Do you think the average user will care how natively your app will look. Even macos apps do not look native , many of them introduce their own "eye candy" gui. Not that you cant make WxPython look 100% native, the way you code is always importnat.
Objc makes sense when you intend to develop for Iphone OS , as Apple thought it a great idea to exclude python (and not only python), even though they were forced to include javascript (or else websurfing would have being a nightmare on iphoneos) . Pyjamas, can make python available for iphone os as well (with no hacks or jailbroken phones), but with the obvious limitations since it translates python code to javascript, but still its a valid solution till Apple decide that excluding python from iphone os is a really bad idea.
link text
There is no harm done in studying Objective C though. You can always use the native libraries via pyobjc.
But to be absolutely sincere with you, If my app reaches a dead end with the python libraries ( a very unlikely scenario) I would rather wrap an existing cross platform C/C++ Libraries with Cython than go the objective c pyobjc route and detroy the cross platform ability of my app. The last thing I would be using is anything platoform specifc.
Now if you dont care about other platforms at all, then I guess Objective C can be a valid choice. It certainly looks ugly as hell, but I have heard that it gets much better the more you use it and there are many people that prefer it over C/C++.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.