Python native features for parallel processing

Python native features for parallel processing - python

This is an assignment I am working on.
I have been asked to write a sample parallel processing program using native python features. I can write the code but the problem is - even after searching I cannot find a native parallel programming feature in python.
Since we have to import "multiprocessing" module - it is not native. I just cannot find which feature is available to use.
Already checked following threads but they use multiprocessing:
Parallel programming in python
Python multiprocessing for parallel processes
How to do parallel programming in Python

I think your definition of "native" is too narrow, or your understanding of the term "import" is mistaken.
The multiprocessing module is part of Python's standard library. Every Python implementation should have it. It is a native feature of Python.
The term "import" should be understood as "make this module available in this program", not as "add this non-native feature to the language". Importing a module does not change the language.
Edit:
In Python 3 you could make concurrent programs with async def and yield. But that shouldn't be considered real parallel processing. You might call it cooperative "multitasking", but it isn't really. It's task switching.

Related

Python modules "Processing", "Multiprocessing" and other concurrency modules: what are the differences?

I am starting to read up over possible ways to parallelise Python code.
DISCLAIMER. This is NOT a question about Multiprocessing vs Multithreading.
At this link https://ipyparallel.readthedocs.io/en/latest/demos.html one finds references to several
concurrency packages for Python to avoid the GIL: https://scipy.github.io/old-wiki/pages/ParallelProgramming
-IPython1
-mpi4py
-parallel python
-Numba
There is also a multiprocessing package:
https://docs.python.org/3/library/multiprocessing.html
And another one called processing:
https://pypi.org/project/processing/
First of all, it is not at all clear to me the difference between the latter two above; what is the difference in using between the multiprocessing module and the processing module?.
In general, I fail to understand the differences between those all -- which must be there, given some developers made the effort to create a mpi4py version for the MPI used in C++. I guess this is not just about the dualism between "threading" and "multiprocessing" approaches, where in one case the memory is shared while the other has each process with its own memory and interpreter, something more must be different between all of those different packages out there.
Thanks to all of those who will dedicate time to answer this!

The difference is that the last version of processing was released in April of 2008 and multiprocessing was added in Python 2.6 in October 2008.
processing was a library that was used before multiprocessing was distributed with Python.
As far as the specific difference between other modules designed for multiprocessing: The scipy page you linked says that "This is a subject for graduate courses in computer science, and I'm not going to address it here....there are some python tools you can use to implement the things you learn in that graduate course." While they admit that may be a bit of an exaggeration, independent study of multiprocessing in general will be required to discern the difference between these libraries, you should probably just stick to the built in multiprocessing module for your initial experiments while you learn how it works. One you're more comfortable with multiprocessing, you might want to check out the pathos framework.
But here are the basics for the packages you mention:
Numba adds decorators that automatically compile functions to make them run faster, it isn't really a multiprocessing tool as much as a JIT compiling tool.
Parallel Python overcomes the GIL to utilize multiple cores or multiple computers, it's designed to be easy to use and to handle all the complex stuff behind the scenes.
MPI for Python is like Paralell Python with less emphasis on simplicity.
IPython is a toolkit with many features, including a shell and Jupyter kernel, it's also not really a multiprocessing tool.
Keep in mind that plenty of libraries/modules do the same thing, there doesn't need to be a reason more than one exists. Use whatever works for you.

How to compile a whole Python library (including dependencies) so that it can be used in C?

How to compile a whole Python library along with it's dependencies so that it can be used in C (without invoking Python's runtime). That is, the compiled code has the Python interpreter embedded and Python does not need to be installed on the system.
From my understanding, when Python code is compiled using Cython it:
Does not invoke the python runtime if the --embed argument is used
Compiles files individually
Allows for different modules to be called (from the Python runtime / other compiled Cython files)
The question which are still unclear are:
How to use these module files from C? Can the compiled Python files call other compiled Python files when used in C?
Does only the library entry point need to be declared or do all functions need to be declared?
How to manage the Python dependencies? how to compile them too (so that the Python runtime is not needed).
A simplified example for a python library called module where __init__.py is an empty file:
module/
├── run.py
├── http/
│ ├── __init__.py
│ ├── http_request.py
http_requests.py contains:
import requests
def get_ip():
r = requests.get('https://ipinfo.io/ip')
print(r.text)
and run.py contains the following:
from http import http_request
if __name__ == '__main__':
http_request.get_ip()
How to call the function get_ip from C without using the Python runtime (needing to have Python installed when running the application).
The above example is very simple. The actual use case is collecting/processing robotics data in C at a high sampling rate. Whilst C is great for basic data processing there are excellent Python libraries which allow for much more comprehensive analysis. The objective would be to call the Python libraries on data which has been partially processed in C. This would allow us to get a much more detailed understanding of the data (and process it in "real time"). The data frameworks are way too large for our team to rewrite in C.

How to compile a whole Python library along with it's dependencies so that it can be used in C (without invoking Python's runtime).
This is impossible in general. Python code is practically expected to run on a Python interpreter.
Sometimes, when only a small subset of Python is used (even indirectly by everything your Python code is using) you might use Cython (which is actually a superset of a small subset of Python: a lot of genuine Python features cannot be used from Cython, or uses the Python interpreter). But not every Python code can be cythonized, since Python and C have a very different (and incompatible) semantics (and memory management).
Otherwise (and most often), the C code using your Python stuff should embed the Python interpreter.
A wiser and more robust approach, if your goal is to make a self-sufficient C library usable from many C programs (on systems without Python), is to rewrite your code in C.
You could also consider starting (in your C library) some Python process (server-like, doing your Python stuff) and using inter-process communication facilities, that would be operating system specific. Of course Python needs to be installed on the system of the application using your library. For example, for Linux, you might fork some Python process in your library, and use pipe(7) or unix(7) sockets to communication from the C library to that process (perhaps using something like JSONRPC).
Your edit (still not an MCVE) shows some HTTP interaction done in Python. You could consider doing that in C, with the help of HTTP client libraries in C like libcurl, or (if so needed) of HTTP server libraries like libonion.
So consider rewriting your stuff in C but using several existing C libraries (how and what to choose is a very different question, probably off-topic on StackOverflow). Otherwise, accept the dependencies on Python.
The actual use case is collecting/processing robotics data in C at a high sampling rate. Whilst C is great for basic data processing there are excellent Python libraries which allow for much more comprehensive analysis.
You could keep high-level things in Python (see this) but recode low level things in C to accelerate them (many software are doing that, e.g. TensorFlow, ...), perhaps as extensions in C for Python or in some other process. Of course, that means some development efforts. I don't think that avoiding Python entirely is reasonable (getting rid of Python entirely is not pragmatical), if you use a lot of code in Python. BTW, you might perhaps consider embedding some other language in your C application (e.g. Lua, Guile, Ocaml - all of them are rumored to be faster than Python) and keep Python for the higher level, running in some other process.
You need to put more efforts on the architectural design of your thing. I'm not sure that avoiding Python entirely is a wise thing to do. Mixing Python and C (perhaps by having several processes cooperating) could be wiser. Of course you'll have operating system specific stuff (notably on the C side, for inter process communication). If on Linux, read something about Linux system programming in C, e.g. ALP or something newer.

Rewrite part of a c/c++ program in python

I am trying to develop a plugin interface in python for a popular 3d program, called Blender 3D. The purpose of the plugin interface is to extend some functionality of the program through python.
Now my problem is that I am trying to asses the performance impact. I will replace an existing functionality written in c code with something written in python.
I am worried that this might slow the application because the functionality that I am replacing is executed in real time and has to be really fast. It consists of a plain c function that just splits some polygons in triangles.
So the operations that I executing work on pieces of data that usually do not have more than 30 or 40 input points. And at most the operations that I am executing on them have a complexity of log(n) * n^2.
But I will be creating plenty of python objects each second, so I am already prepared to implement pooling to recycle the objects.
Now I am mostly worried that the python code will run 100 times slower than the c code and slow down the application. Should I be worried?
At most I will be doing 8500 computations in a single python function call. This function will be called each time when rendering the application interface.

The question of using c or python will depend on the use of your work. Is this a function that the blender developers will accept into the blender development? Do you expect many blender users will want to use it? A python addon allows you to develop your work outside of the main blender development and give many users access to it, while a patch to the c code that requires the user to compile their own version will reduce users.
You could also look at compiling your c code to a binary library that is included with the python addon and loaded as a python module. See two addons by Pyroevil created using cython - molecular and cubesurfer, some pre-built binaries are available on his main website. I'm not sure if using cython makes the python module creation easier or not, you could also use cython only as glue between python and your library.

Confusion about multithreading in Python and C

AFAIK, Python, using import thread, (or C#) doesn't do "real" multithreading, meaning all threads run on 1 CPU core.
But in C, using pthreads in linux, You get real multithreading.
Is this true ?
Assuming it is true, is there any difference between them when you have only 1 CPU core (I have it in a VM)?

Python uses something called a Global Interpreter Lock which means multiple python threads can only run within one native Thread.
There is more documentation in the official Docs here: https://wiki.python.org/moin/GlobalInterpreterLock
There shouldn't be a real performance difference on single core systems. On multicore systems the difference will varie based on what you do. (I/O is for the most part not affected by the GIL).

I'm not aware of how it works C# internally, but for CPython (the "official" python interpreter) it is true: threads are not really parallel due to GIL.
Other implementation of the Python interpreter do not suffer of this problem (like C's pthreads library).
Howevere if you only have 1 CPU you won't notice any difference.
As a side note: if you need real parallelism in CPython you could you multiprocessing module, which uses processes instead of threads.
EDIT:
Also thread module is a bit deprecated, you should consider using threading.

parallel computing support in enthought python

Does Enthought Canopy support parallel code execution on CPU using perhaps openMPI or on GPU using openCV or CUDA
I am looking into switching from C++ to python as i want to make GUI for my parallel code.
Is this a good idea. Does python support parallel computation?

Yes, Python does support this. There are three layers to processes with Python:
subprocess: which simply starts a process within the same thread
threading: which starts a new thread and leaves the old on alone. There are some frequent stories that this not necessarily leads to better performance.
multiprocessing: which is what you are after
Here is an intro to parallel processing on Python.
The official docs for the multiprocessing are here.
The ever so useful discussions on the Python Module of the Week are also worth a look.
Edit:
The python libraries mentioned by HT #jonathan are likely to be:
Cuda:
http://mathema.tician.de/software/pycuda
OpenCV:
http://code.google.com/p/pyopencv/
There is a nice tutorial for this here.
And Message Passing Interface:
http://mpi4py.scipy.org/docs/usrman/intro.html

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.