How to use custom data structure with multiprocessing in python - python

I'm using python 2.7 and numpy on a linux machine.
I am running a program which involves a time-consuming function computeGP(level, grid) which takes input in form of a numpy array level and an object grid, which is not is not modified by this function.
My goal is to parallelize computeGP (locally, so on different cores) for different level but the same grid. Since grid stays invariant, this can be done without synchronization hassle using shared memory. I've read a bit about threading in python and the GIL and it seems to me that i should go with the multiprocessing module rather than threading. This and this answers recommend to use multiprocessing.Array to share efficiently, while noting that on unix machines it is default behaviour that the object is not copied.
My problem is that the object grid is not a numpy array.
It is a list of numpy arrays, because the way my data structure works is that i need to access array (listelement) N and then access its row K.
Basically the list just fakes pointers to the arrays.
So my questions are:
My understanding is, that on unix machines i can share the object
grid without any further usage of multiprocessing datatypes
Array (or Value). Is that correct?
Is there a better way to
implement this pointer-to-array datastructure which can use the more
efficient multiprocessing.Array?
I don't want to assemble one large array containing the smaller ones from the list, because the smaller ones are not really small either...
Any thoughts welcome!

This SO question is very similar to yours: Share Large, Read-Only Numpy Array Between Multiprocessing Processes
There are a few answers in there, but the simplest if you are only using linux is to just make the data structure a global variable. Linux will fork() the process, which gives all worker processes copy-on-write access to the main process' memory (globals).
In this case you don't need to use any special multiprocessing classes or pass any data to the worker processes except level.

Related

How to correctly use MPI.Allgather() in python to gather different sized arrays from different processes

I'm using mpi4py currently, and I have a bunch of processes that are generating lists of tuples (or numpy arrays of tuples). At the end of each process, I want to call an allgather so that each process gets every list (Ideally, they are concatenated).
Unfortunately, the documentation on the mpi4py page doesn't cover allgather(), so I was wondering if anyone could help me. I'm confused if I'd need to use allgather() or allgatherv() or Allgather(), and essentially how this would work since I'm not scattering any data before-hand.

Python: How to synchronize access to a writable array of large numpy arrays (multiprocessing)

I am implementing a specific cache data structure in a machine learning application. The core consists of a list of (large) numpy arrays. The numpy arrays need to be replaced as often and quickly as possible, given the IO limitations. Therefore, there are a few workers that constantly read and prepare data. My current solution is to push a result produced by a worker into a shared queue. A separate process then received the result from the queue and replaces one of the numpy arrays in the list (which is owned by that central process).
I am now wondering whether that is the most elegant solution or whether there are faster solutions. In particular:
1.) As far as I understood the docs, going through a queue is equivalent to a serialization and de-serialization process, which could be slower than using shared memory.
2.) There is some memory overhead if the workers have several objects in the queue (which could have been replaced directly in the list).
I have thought about using a multiprocessing array or the numpy-sharedmem module but both did not really address my scenario. First, my list does not only contain ctypes. Second, each numpy array has a different size and all are independent. Third, I do not need write access to the numpy arrays but only to the 'wrapper' list organizing them.
Also, it should be noted that I am using multiprocessing and not threading as the workers heavily make use of numpy, which should invoke the global interpreter lock essentially all the time.
Questions:
- Is there a way to have a list of numpy arrays in shared memory?
- Is there a 'better' solution compared to the one described above?
Many thanks...

faster numpy array copy; multi-threaded memcpy?

Suppose we have two large numpy arrays of the same data type and shape, of size on the order of GB's. What is the fastest way to copy all the values from one into the other?
When I do this using normal notation, e.g. A[:] = B, I see exactly one core on the computer at maximum effort doing the copy for several seconds, while the others are idle. When I launch multiple workers using multiprocessing and have them each copy a distinct slice into the destination array, such that all the data is copied, using multiple workers is faster. This is true regardless of whether the destination array is a shared memory array or one that becomes local to the worker. I can get a 5-10x speedup in some tests on a machine with many cores. As I add more workers, the speed does eventually level off and even slow down, so I think this achieves being memory-performance bound.
I'm not suggesting using multiprocessing for this problem; it was merely to demonstrate the possibility of better hardware utilization.
Does there exist a python interface to some multi-threaded C/C++ memcpy tool?
Update (03 May 2017)
When it is possible, using multiple python processes to move data can give major speedup. I have a scenario in which I already have several small shared memory buffers getting written to by worker processes. Whenever one fills up, the master process collects this data and copies it into a master buffer. But it is much faster to have the master only select the location in the master buffer, and assign a recording worker to actually do the copying (from a large set of recording processes standing by). On my particular computer, several GB can be moved in a small fraction of a second by concurrent workers, as opposed to several seconds by a single process.
Still, this sort of setup is not always (or even usually?) possible, so it would be great to have a single python process able to drop into a multi-threaded memcpy routine...
If you are certain that the types/memory layout of both arrays are identical, this might give you a speedup: memoryview(A)[:] = memoryview(B) This should be using memcpy directly and skips any checks for numpy broadcasting or type conversion rules.

python multiprocessing module, shared multidimensional array

I have a code which given two parameters, (k, m) will return a 4d numpy array, my requirement is that I need to calculate this array for possible values of (k,m) with k,m < N and add them up. This is slow in serial so I am trying to learn the multiprocessing module in python to do it.
https://docs.python.org/2/library/multiprocessing.html
Essentially I want to use my 8 cores to parallely compute these 4d arrays and add them all up. Now the question is how to design this. Each array can be around 100 MB and N around 20. So storing 20**2 * 100 MB in a queue is not possible. The solution would be to have a shared memory object, a result array which each process will keep adding the results into.
multiprocessing has two means for doing this, using shared memory or a server process. Neither of them seem to support mutlidim arrays. Can anyone suggest a way to implement my program? Thx in advance.
One approach would be to create memory mapped arrays in the parent process, and pass them to the children to fill. Additionally you should probably have a multiprocessing.Event for every mapped array, so the chld process can signal to the parent that an array is done.

Python: Perform an operation on each pixel of a 2-d array simultaneously

I want to apply a 3x3 or larger image filter (gaussian or median) on a 2-d array.
Though there are several ways for doing that such as scipy.ndimage.gaussian_filter or applying a loop, I want to know if there is a way to apply a 3x3 or larger filter on each pixel of a mxn array simultaneously, because it would save a lot of time bypassing loops. Can functional programming be used for the purpose??
There is a module called scipy.ndimage.filters.convolve, please tell whether it is able to perform simultaneous operations.
You may want to learn about parallel processing in Python:
http://wiki.python.org/moin/ParallelProcessing
or the multiprocessing package in particular:
http://docs.python.org/library/multiprocessing.html
Check out using the Python Imaging Library (PIL) on multiprocessors.
Using multiprocessing with the PIL
and similar questions.
You could create four workers, divide your image in four, and assign each quadrant to a worker. You will likely lose time for the overhead however. If, on another hand, you have several images to process, then this approach may work (letting each worker opening its own image).
Even if python did provide functionality to apply an operation to an NxM array without looping over it, the operation would still not be executed simultaneously in the background since the amount of instructions a CPU can handle per cycle is limited and thus no time could be saved. For your use case this might even be counterproductive since the fields in your arrays proably have dependencies and if you don't know in what order they are accessed this will most likely end up in a mess.
Hugues provided some useful links about parallel processing in Python, but be careful when accessing the same data structure such as an array with multiple threads at the same time. If you don't synchronize the threads they might access the same part of the array at the same time and mess things up.
And be aware, the amount of threads that can effectively be run in parallel is limited by the number of processor cores.

Categories