Using CPU instruction directly from Numba - python

I would like to use my CPU's builtin instructions from within Numba compiled functions, but am having trouble figuring out how to reference them. For example, the popcnt instruction from the SSE4 instruction set, I can confirm I have it using
llvmlite.binding.get_host_cpu_features(), but have no way of calling the functions itself.
I need to be able to call these functions (instructions) from within other nopython compiled functions.
Ideally this would be done as closely to Python as possible, but in this case speed is more important that readability.

You can use Cython to call SSE intrinsics, but you cannot use Numba to do it. Code doing what you want via Cython is here: https://gist.github.com/aldro61/f604a3fa79b3dec5436a and here: https://gist.github.com/craffel/e470421958cad33df550

You can make a small assembly language DLL and call it through ctypes that in my experience have no overhead whatsoever when used from Numba nopython code. Or alternatively you can use instruction codes directly like in this blog post on jit in Python Piston JavaScript assembler might be used to obtain machine codes for a small asm routine. Numba allows making small functions in LLVM ir as described in this thread Of course llvmlite might be used too.

Related

Fastest way to interface simple C++ function with Python

I have a C++ function that takes in std::vector<std::vector<double> > X and does some operations on X and outputs std::vector<std::vector<double> > X_mod.
I want to be able to quickly make an interface such that I can pass a Python numpy array into this C++ function, and then have the C++ function return X_mod into Python.
I have briefly looked at Boost, and it seems too complicated for this simple purpose?
Any other suggestions on how to write a quick interface for this?
As suggested in the comments Pybind11 can be used to write c++ bindings for python Pybind Documentation, Pybind Github Repo,
Example how to use it.
The Reason why everyone is proposing Pybind over Boost can be found in their readme:
The main issue with Boost.Python—and the reason for creating such a similar project—is Boost. Boost is an enormously large and complex suite of utility libraries that works with almost every C++ compiler in existence. This compatibility has its cost: arcane template tricks and workarounds are necessary to support the oldest and buggiest of compiler specimens. Now that C++11-compatible compilers are widely available, this heavy machinery has become an excessively large and unnecessary dependency.

What exactly is PyOpenGL-accelerate?

The title is the main question here. I had some PyOpenGL code I was running on my computer, which was running somewhat slow. I realized I hadn't installed PyOpenGL-accelerate. This didn't change the speed at all, but most tutorials with the Python OpenGL bindings suggest that PyOpenGL-accelerate should be installed.
What exactly does this module do?
First of all note that PyOpenGL-accelerate isn't a silver bullet. Thereby if you're already poorly optimizing your application, then PyOpenGL-accelerate wouldn't gain you that much if any additional performance.
That being said. PyOpenGL-accelerate consist of Cython accelerator modules which attempt to speed up various aspects of PyOpenGL 3.x. Thus if you're using glBegin() and glEnd() to draw with, then you won't gain any performance from this.
So what is Cython accelerator modules?
These modules are completely self-contained, and are created solely to run faster than the equivalent pure Python code runs in CPython. Ideally, accelerator modules will always have a pure Python equivalent to use as a fallback if the accelerated version isn’t available on a given system. The CPython standard library makes extensive use of accelerator modules.
– Python – Binary Extensions
In more layman's terms. Cython is a bit of a mix between Python and C so to speak. With a goal being optimization and execution speed.
In relation to PyOpenGL-accelerate this means that the various helper classes PyOpenGL offers. Is instead implemented in a manner that offers more performance.
From the documentation:
This set of C (Cython) extensions provides acceleration of common operations for slow points in PyOpenGL 3.x. For code which uses large arrays extensively speed-up is around 10% compared to unaccelerated code.
You could dig through the code if you want to know precisely which optimizations are defined, but OpenGL is usually built around surprisingly coarse optimizations to account for different hardware - i suppose that extends to running off of an interpreter as well.

Are all the algorithms of Tensorflow written in C++ and Python only serve to be easy-to-use APIs?

I know that Tensorflow is written with a C++ engine, but I haven't found any C++ source code in my installation directory (I installed via pip). When I inspect the python codes, I got a sense that the python level is just a wrapper where the essence of the algorithm is not presented. For example, in tensorflow/python/ops/gradients.py, the gradients() function calls python_grad_func() to compute the gradients, which is a class method of DeFun.
My question is that, are all the essential part of Tensorflow written in C++ and the python are only serving as some APIs?
This is mostly correct, though there's a lot of sophisticated stuff implemented in Python. Instead of saying "algorithms" in C++, what I'd say is that the core dataflow execution engine and most of the ops (e.g., matmul, etc.) are in C++. A lot of the plumbing, as well as some functionality like defining gradients of functions, is in Python.
For more information and discussion about why it's this way, see this StackOverflow answer

Calling a C++ CUDA device function from a Python kernel

I'm working on a project that involves creating CUDA kernels in Python. Numba works quite well (what these guys have accomplished is quite incredible), and so does PyCUDA.
My problem is that I want to call a C device function from my Python generated kernel. I couldn't find a way to accomplish this. Numba can call CFFI modules but only in CPU code. In PyCUDA I can add my C device functions to the SourceModule, but I couldn't figure out how to include functions that already exist in another library.
Is there a way to accomplish this?
As far as I am aware, this isn't possible in either language. Neither exposes the necessary toolchain controls for separate compilation or APIs to do runtime linking of device code.

Creating a shared library in MATLAB

A researcher has created a small simulation in MATLAB and we want to make it accessible to others. My plan is to take the simulation, clean up a few things and turn it into a set of functions. Then I plan to compile it into a C library and use SWIG to create a Python wrapper. At that point, I should be able to call the simulation from a small Django application. At least I hope so.
Do I have the right plan? Are there are any serious pitfalls that I'm not aware of at the moment?
One thing to remember is that the MATLAB compiler does not actually compile the MATLAB code into native machine instructions. It simply wraps it into a stand-alone executable or a library with its own runtime engine that runs it. You would be able to run your code without MATLAB installed, and you would be able to interface it with other languages, but it will still be interpreted MATLAB code, so there would be no speedup.
Matlab Coder, on the other hand, is the thing that can generate C code from Matlab. There are some limitations, though. Not all Matlab functions are supported for code generation, and there are things you cannot do, like change the type of a variable on the fly.
I remember that I was able to wrap a MATLAB simulation into a DLL file and then call it from a Delphi application. It worked really well.
I'd also try ctypes first.
Use the MATLAB compiler to compile the code into C.
Compile the C code into a DLL.
Use ctypes to load and call code from this DLL
The hardest step is probably 1, but if you already know MATLAB and have used the MATLAB compiler, you should not have serious problems with it.
Perhaps try ctypes instead of SWIG. If it has been included as a part of Python 2.5, then it must be good :-)

Categories