How do I connect a Python and a C program? - python

I have a python-based program that reads serial data off an a port connected to an rs232 cable. I want to pass the data I get here to a C-program that will handle the computation-intensive side of things. I have been checking up the net and all I've found are linux-based.

My suggestion would be the inline function from the instant module, though that only works if you can do everything you need to in a single c function. You just pass it a c function and it compiles a c extension at runtime.
from instant import inline
sieve_code = """
PyObject* prime_list(int max) {
PyObject *list = PyList_New(0);
int *numbers, *end, *n;
numbers = (int *) calloc(sizeof(int), max);
end = numbers + max;
numbers[2] = 2;
for (int i = 3; i < max; i += 2) { numbers[i] = i; }
for (int i = 3; i < sqrt(max); i++) {
if (numbers[i] != 0) {
for (int j = i + i; j < max; j += i) { numbers[j] = 0; }
}
}
for (n = numbers; n < end; n++) {
if (*n != 0) { PyList_Append(list, PyInt_FromLong(*n)); }
}
free(numbers);
return list;
}
"""
sieve = inline(sieve_code)

There a number of ways to do this.
The rawest, simplest way is to use the Python C API and write a wrapper for your C library which can be called from Python. This ties your module to CPython.
The second way is to use ctypes which is an FFI for Python that allows you to load and call functions in C libraries directly. In theory, this should work across Python implementations.
A third way is to use Pyrex or it's next generation version Cython which allows you to annotate your Python code with type information that the compiler can convert into compiled code. It can be used to write wrappers too. AFAIK, It's tied to CPython.
Yet another way is to use SWIG which is a tool that generates glue code that helps you wrap C libraries for use from Python. It's basically the first approach with a helper tool.
Another way is to use Boost Python API which is an object oriented wrapper over the raw Python C API.
All of the above let you do your work in the same process.
If that's not a constraint, like Digital Ross suggested, you can simply spawn a subprocess and hand over arguments (either as command line ones or via it's standard input) and have an external process do the work for you.

Use a pipe and popen
The easiest way to deal with this is probably to just use popen(3). The popen function is available in both Python and C and will connect a program of either language with the other using a pipe.
>>> import subprocess
>>> print args
['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"]
>>> p = subprocess.Popen(args)
Once you have the pipe, you should probably send yaml or json through it, though I've never tried to read either in C. If it's really a simple stream, just parse it yourself. If you like XML, I suppose that's available as well.

How many bits per second are you getting across this RS-232 cable? Have you test results that show that Python won't do the crunchy bits fast enough? If the C program is yet to be written, consider the possibility of writing the computation-intensive side of things in Python, with easy fallback to Cython in the event that Python isn't fast enough.

Indeed this question does not have much to do with C++.
Having said that, you can try SWIG - it's multi-platform and allows functional calls from Python to C/C++.

I would use a standard form of IPC like a socket.
A good start would be Beej's Guide.
Also, don't tag the question with c++ if you are specifically using c. c and c++ are different languages.

I'd use ctypes: http://python.net/crew/theller/ctypes/tutorial.html
It allows you to call c (and c++) code from python.

Related

Is there a built-in way to use inline C code in Python?

Even if numba, cython (and especially cython.inline) exist, in some cases, it would be interesting to have inline C code in Python.
Is there a built-in way (in Python standard library) to have inline C code?
PS: scipy.weave used to provide this, but it's Python 2 only.
Directly in the Python standard library, probably not. But it's possible to have something very close to inline C in Python with the cffi module (pip install cffi).
Here is an example, inspired by this article and this question, showing how to implement a factorial function in Python + "inline" C:
from cffi import FFI
ffi = FFI()
ffi.set_source("_test", """
long factorial(int n) {
long r = n;
while(n > 1) {
n -= 1;
r *= n;
}
return r;
}
""")
ffi.cdef("""long factorial(int);""")
ffi.compile()
from _test import lib # import the compiled library
print(lib.factorial(10)) # 3628800
Notes:
ffi.set_source(...) defines the actual C source code
ffi.cdef(...) is the equivalent of the .h header file
you can of course add some cleaning code after, if you don't need the compiled library at the end (however, cython.inline does the same and the compiled .pyd files are not cleaned by default, see here)
this quick inline use is particularly useful during a prototyping / development phase. Once everything is ready, you can separate the build (that you do only once), and the rest of the code which imports the pre-compiled library
It seems too good to be true, but it seems to work!

Python C API: WindowsError after creating some number of PyObjects

I've been having an issue getting the Python C API to not give me errors.
Background: I've been using ctypes to run native code (C++) for a while, but until now I had never actually done anything specific with the Python C API. I had mostly just been passing in structs from Python and filling them from C++. The way I was using structs was becoming cumbersome, so I decided I would try creating Python objects directly in C++ and just pass them back to my Python script.
Code:
I have a DLL (Foo.dll) with only one function:
#define N 223
__declspec(dllexport) void Bar(void)
{
std::cout << "Bar" << std::endl;
for (int i = 0; i < N; ++i)
{
auto list = PyList_New(0);
std::cout << "Created: " << i << std::endl;
//Py_DECREF(list);
}
}
And then I have the Python script I'm running:
import ctypes as C
dll = r"C:\path\to\dll\Foo.dll"
Foo = C.CDLL(dll)
# lists = [[] for _ in range(0)]
Foo.Bar()
print "Done."
What happens: If I define N in the above DLL to be 222 or below, the code works fine (except for the memory leak, but that isn't the problem).
If I uncomment the //Py_DECREF(list) line, the code works fine.
However, with the above code, I get this:
Bar
Created: 0
Created: 1
Created: 2
...(omitted for your sake)
Created: 219
Created: 220
Created: 221
Traceback (most recent call last):
File "C:\path_to_script\script.py", line 9, in <module>
Foo.Bar()
WindowsError: exception: access violation reading 0x00000028
In fact, I get this same result with dictionaries, lists, tuples and so on. I get the same result if I create a list and then append empty sublists to that list.
What's weirder, every list that I make from within the actual Python script will decrease the number of lists the DLL can make before getting this windows error.
Weirder still, if I make more than 222 lists in my python script, then the DLL won't encounter this error until it's created something like 720 more lists.
**Other details: **
Windows 10
Using the Anaconda2 32-bit Python 2.7 distribution
(Using Python.h and python27.lib from that distribution
python.exe --version: 2.7.13 :: Anaconda custom (32-bit)
Creating DLL with Visual Studio 2017
As long as I don't create many PyObjects from my C++ code, everything seems to work fine. I can pass PyObjects to and from the Python code and it works fine.. until I've created "too many" of the objects from within my C++ code.
What is going on?
From the documentation for CDLL:
The Python global interpreter lock is released before calling any function exported by these libraries, and reacquired afterwards.
This makes it unsafe to use Python C API code. Exactly how it fails is unpredictable, as you are finding. I'd guess it has to do with if the allocation triggers a run of the garbage collector, but I don't think it's worth spending too much time trying to work out the exact cause.
There's (at least) two solutions to chose from:
Use ctypes.PyDLL (which the documentation notes is like CDLL except that it does not release the GIL)
Reacquire the GIL within your C++ code - an easy way to do this is:
auto state = PyGILState_Ensure();
// C++ code requiring GIL - probably your entire loop
PyGILState_Release(state);

Pypy (python) optimization

I'm looking into replacing some C code with python code and using pypy as the interpreter. The code does a lot of list/dictionary operations. Therefore to get a vague idea of the performance of pypy vs C I am writing sorting algorithms. To test all my read functions I wrote a bubble sort, both in python and C++. CPython of course sucks 6.468s, pypy came in at 0.366s and C++ at 0.229s. Then I remembered that I had forgotten -O3 on the C++ code and the time went to 0.042s. For a 32768 dataset C++ with -O3 is only 2.588s and pypy is 19.65s. Is there anything I can do to speed up my python code (besides using a better sort algorithm of course) or how I use pypy (some flag or something)?
Python code (read_nums module omitted since it's time is trivial: 0.036s on 32768 dataset):
import read_nums
import sys
nums = read_nums.read_nums(sys.argv[1])
done = False
while not done:
done = True
for i in range(len(nums)-1):
if nums[i] > nums[i+1]:
nums[i], nums[i+1] = nums[i+1], nums[i]
done = False
$ time pypy-c2.0 bubble_sort.py test_32768_1.nums
real 0m20.199s
user 0m20.189s
sys 0m0.009s
C code (read_nums function again omitted since it takes little time: 0.017s):
#include <iostream>
#include "read_nums.h"
int main(int argc, char** argv)
{
std::vector<int> nums;
int count, i, tmp;
bool done;
if(argc < 2)
{
std::cout << "Usage: " << argv[0] << " filename" << std::endl;
return 1;
}
count = read_nums(argv[1], nums);
done = false;
while(!done)
{
done = true;
for(i=0; i<count-1; ++i)
{
if(nums[i] > nums[i+1])
{
tmp = nums[i];
nums[i] = nums[i+1];
nums[i+1] = tmp;
done = false;
}
}
}
for(i=0; i<count; ++i)
{
std::cout << nums[i] << ", ";
}
return 0;
}
$ time ./bubble_sort test_32768_1.nums > /dev/null
real 0m2.587s
user 0m2.586s
sys 0m0.001s
P.S. Some of the numbers given in the first paragraph are a little different then the numbers from time because they're the numbers I got the first time.
Further improvements:
Just tried xrange instead of range and the run time went to 16.370s.
Moved the code starting from first done = False to last done = False in a function, speed is now 8.771-8.834s.
The most relevant way to answer this question is to note that the speed of C, CPython and PyPy are not differing by a constant factor: it depends most importantly on what is done and the way it is written. For example, if your C code is doing naive things like walking arrays when the "equivalent" Python code would naturally use dictionaries, then any implementation of Python is faster than C provided the arrays are long enough. Of course, this is not the case on most real-life examples, but the same argument still applies to a smaller extent. There is no one-size-fits-all way to predict the relative speed of a program written in C, or rewritten in Python and running on CPython or PyPy.
Obviously there are guidelines about these relative speeds: on small algorithmical examples you could expect the speed of PyPy to be approaching that of "gcc -O0". In your example it is "only" 1.6x slower. We might help you optimize it, or even find optimizations missing in PyPy, in order to gain 10% or 30% more speed. But this is a tiny example that has nothing to do with your real program. For the reasons above the speed we get here is only vaguely related to the speed you'll get in the end.
I can only say that rewriting code from C to Python for reasons of clarity, notably when the C has become too tangled up for further developments, is clearly a win in the long run --- even in the case where at the end you need to rewrite some parts of it in C again. And PyPy's goal here is to reduce the need for that. While it would be nice to say that no-one ever needs C any more, it's just not true :-)

Calling PARI/GP from Python

I would like to call PARI/GP from Python only to calculate the function nextprime(n) for different ns that I define. Unfortunately I can't get pari-python to install so I thought I would just call it using a command line via os.system in Python. I can't see in the man page how to do get PARI/GP to run in non-interactive mode, however. Is there a way to achieve this?
You can pipe input into gp's stdin like so, using the -q flag to quash verbosity:
senderle:~ $ echo "print(isprime(5))" | gp -q
1
However, it's not much harder to create a simple python extension that allows you to pass strings to pari's internal parser and get results back (as strings). Here's a bare-bones version that I wrote some time ago so that I could call pari's implementation of the APRT test from python. You could extend this further to do appropriate conversions and so on.
//pariparse.c
#include<Python.h>
#include<pari/pari.h>
static PyObject * pariparse_run(PyObject *self, PyObject *args) {
pari_init(40000000, 2);
const char *pari_code;
char *outstr;
if (!PyArg_ParseTuple(args, "s", &pari_code)) { return NULL; }
outstr = GENtostr(gp_read_str(pari_code));
pari_close();
return Py_BuildValue("s", outstr);
}
static PyMethodDef PariparseMethods[] = {
{"run", pariparse_run, METH_VARARGS, "Run a pari command."},
{NULL, NULL, 0, NULL}
};
PyMODINIT_FUNC initpariparse(void) {
(void) Py_InitModule("pariparse", PariparseMethods);
}
And the setup file:
#setup.py
from distutils.core import setup, Extension
module1 = Extension('pariparse',
include_dirs = ['/usr/include', '/usr/local/include'],
libraries = ['pari'],
library_dirs = ['/usr/lib', '/usr/local/lib'],
sources = ['pariparse.c'])
setup (name = 'pariparse',
version = '0.01a',
description = 'A super tiny python-pari interface',
ext_modules = [module1])
Then just type python setup.py build to build the extension. You can then call it like this:
>>> pariparse.run('nextprime(5280)')
'5281'
I tested this just now and it compiled for me with the latest version of pari available via homebrew (on OS X). YMMV!
You might want to try using the Sage math tool. Sage uses Python to glue together all sorts of math libraries, including PARI. Some of the math libraries are nicely integrated, others use hacks (passing strings in to the library and then parsing out the string results) but in all cases someone else did the integration work for you and you can just use it.
You can set up your own Sage system, or you can get a free account and try Sage on the University of Washington servers.
I don't think it is a good idea to call os.system except for a quick and dirty workaround when you have a reliable C library behind it. It is very easy to call C functions from Python; here are two functions for calling nextprime. One is using long integers (despite the name, it will mean here that you are using small integer numbers); the other is using the string type (for longer integers).
First check that you have the libpari installed. The solution below is for Linux and assumes that your library is called libpari.so. Under Windows it will probably be called with a .dll suffix instead. You may have to type the whole path of the DLL file if it isn't found at first attempt:
import ctypes
# load the library
pari=ctypes.cdll.LoadLibrary("libpari.so")
# set the right return type of the functions
pari.stoi.restype = ctypes.POINTER(ctypes.c_long)
pari.nextprime.restype = ctypes.POINTER(ctypes.c_long)
pari.strtoGENstr.restype = ctypes.POINTER(ctypes.c_long)
pari.geval.restype = ctypes.POINTER(ctypes.c_long)
pari.itostr.restype = ctypes.c_char_p
# initialize the library
pari.pari_init(2**19,0)
def nextprime(v):
g = pari.nextprime(pari.stoi(ctypes.c_long(v)))
return pari.itos(g)
def nextprime2(v):
g = pari.nextprime(pari.geval(pari.strtoGENstr(str(v))))
return int(pari.itostr(g))
print( nextprime(456) )
print( nextprime2(456) )

DLR & Performance

I'm intending to create a web service which performs a large number of manually-specified calculations as fast as possible, and have been exploring the use of DLR.
Sorry if this is long but feel free to skim over and get the general gist.
I've been using the IronPython library as it makes the calculations very easy to specify. My works laptop gives a performance of about 400,000 calculations per second doing the following:
ScriptEngine py = Python.CreateEngine();
ScriptScope pys = py.CreateScope();
ScriptSource src = py.CreateScriptSourceFromString(#"
def result():
res = [None]*1000000
for i in range(0, 1000000):
res[i] = b.GetValue() + 1
return res
result()
");
CompiledCode compiled = src.Compile();
pys.SetVariable("b", new DynamicValue());
long start = DateTime.Now.Ticks;
var res = compiled.Execute(pys);
long end = DateTime.Now.Ticks;
Console.WriteLine("...Finished. Sample data:");
for (int i = 0; i < 10; i++)
{
Console.WriteLine(res[i]);
}
Console.WriteLine("Took " + (end - start) / 10000 + "ms to run 1000000 times.");
Where DynamicValue is a class that returns random numbers from a pre-built array (seeded and built at run time).
When I create a DLR class to do the same thing, I get much higher performance (~10,000,000 calculations per second). The class is as follows:
class DynamicCalc : IDynamicMetaObjectProvider
{
DynamicMetaObject IDynamicMetaObjectProvider.GetMetaObject(Expression parameter)
{
return new DynamicCalcMetaObject(parameter, this);
}
private class DynamicCalcMetaObject : DynamicMetaObject
{
internal DynamicCalcMetaObject(Expression parameter, DynamicCalc value) : base(parameter, BindingRestrictions.Empty, value) { }
public override DynamicMetaObject BindInvokeMember(InvokeMemberBinder binder, DynamicMetaObject[] args)
{
Expression Add = Expression.Convert(Expression.Add(args[0].Expression, args[1].Expression), typeof(System.Object));
DynamicMetaObject methodInfo = new DynamicMetaObject(Expression.Block(Add), BindingRestrictions.GetTypeRestriction(Expression, LimitType));
return methodInfo;
}
}
}
and is called/tested in the same way by doing the following:
dynamic obj = new DynamicCalc();
long t1 = DateTime.Now.Ticks;
for (int i = 0; i < 10000000; i++)
{
results[i] = obj.Add(ar1[i], ar2[i]);
}
long t2 = DateTime.Now.Ticks;
Where ar1 and ar2 are pre-built, runtime seeded arrays of random numbers.
The speed is great this way, but it's not easy to specify the calculation. I'd basically be looking at creating my own lexer & parser, whereas IronPython has everything I need already there.
I'd have thought I could get much better performance from IronPython since it is implemented on top of the DLR, and I could do with better than what I'm getting.
Is my example making best use of the IronPython engine? Is it possible to get significantly better performance out of it?
(Edit) Same as first example but with the loop in C#, setting variables and calling the python function:
ScriptSource src = py.CreateScriptSourceFromString(#"b + 1");
CompiledCode compiled = src.Compile();
double[] res = new double[1000000];
for(int i=0; i<1000000; i++)
{
pys.SetVariable("b", args1[i]);
res[i] = compiled.Execute(pys);
}
where pys is a ScriptScope from py, and args1 is a pre-built array of random doubles. This example executes slower than running the loop in the Python code and passing in the entire arrays.
delnan's comment leads you to some of the problems here. But I'll just get specific about what the differences are here. In the C# version you've cut out a significant amount of the dynamic calls that you have in the Python version. For starters your loop is typed to int and it sounds like ar1 and ar2 are strongly typed arrays. So in the C# version the only dynamic operations you have are the call to obj.Add (which is 1 operation in C#) and potentially the assignment to results if it's not typed to object which seems unlikely. Also note all of this code is lock free.
In the Python version you first have the allocation of the list - this also appears to be during your timer where as in C# it doesn't look like it is. Then you have the dynamic call to range, luckily that only happens once. But that again creates a gigantic list in memory - delnan's suggestion of xrange is an improvement here. Then you have the loop counter i which is getting boxed to an object for every iteration through the loop. Then you have the call to b.GetValue() which is actually 2 dynamic invocatiosn - first a get member to get the "GetValue" method and then an invoke on that bound method object. This is again creating one new object for every iteration of the loop. Then you have the result of b.GetValue() which may be yet another value that's boxed on every iteration. Then you add 1 to that result and you have another boxing operation on every iteration. Finally you store this into your list which is yet another dynamic operation - I think this final operation needs to lock to ensure the list remains consistent (again, delnan's suggestion of using a list comprehension improves this).
So in summary during the loop we have:
C# IronPython
Dynamic Operations 1 4
Allocations 1 4
Locks Acquired 0 1
So basically Python's dynamic behavior does come at a cost vs C#. If you want the best of both worlds you can try and balance what you do in C# vs what you do in Python. For example you could write the loop in C# and have it call a delegate which is a Python function (you can do scope.GetVariable> to get a function out of the scope as a delegate). You could also consider allocating a .NET array for the results if you really need to get every last bit of performance as it may reduce working set and GC copying by not keeping around a bunch of boxed values.
To do the delegate you could have the user write:
def computeValue(value):
return value + 1
Then in the C# code you'd do:
CompiledCode compiled = src.Compile();
compiled.Execute(pys);
var computer = pys.GetVariable<Func<object,object>>("computeValue");
Now you can do:
for (int i = 0; i < 10000000; i++)
{
results[i] = computer(i);
}
If you concerned about computation speed, is it better to look at lowlevel computation specification? Python and C# are high-level languages, and its implementation runtime can spend a lot of time for undercover work.
Look on this LLVM wrapper library: http://www.llvmpy.org
Install it using: pip install llvmpy ply
or on Debian Linux: apt install python-llvmpy python-ply
You still need to write some tiny compiler (you can use PLY library), and bind it with LLVM JIT calls (see LLVM Execution Engine), but this approach can be more effective (generated code much closer to real CPU code), and multiplatform comparing to .NET jail.
LLVM has ready to use optimizing compiler infrastructure, including a lot of optimizer stage modules, and big user and developer community.
Also look here: http://gmarkall.github.io/tutorials/llvm-cauldron-2016
PS: If you interested in it, I can help you with a compiler, contributing to my project's manual in parallel. But it will not be jumpstart, this theme is new to me too.

Categories