Python parallel programming issue

Python parallel programming issue - python

I need to do some intense numerical computations and fortunately python offers very simple ways to implement parallelisations. However, the results I got were totally weird and after some trial'n error I stumbled upon the problem.
The following code simply calculates the mean of a random sample of numbers but illustrates my problem:
import multiprocessing
import numpy as np
from numpy.random import random
# Define function to generate random number
def get_random(seed):
dummy = random(1000) * seed
return np.mean(dummy)
# Input data
input_data = [100,100,100,100]
pool = multiprocessing.Pool(processes=4)
result = pool.map(get_random, input_data)
print result
for i in input_data:
print get_random(i)
Now the output looks like this:
[51.003368466729405, 51.003368466729405, 51.003368466729405, 51.003368466729405]
for the parallelisation, which is always the same
and like this for the normal not parallelised loop:
50.8581749381
49.2887091049
50.83585841
49.3067281055
As you can see, the parallelisation just returns the same results, even though it should have calculated difference means just as the loop. Now, sometimes I get only 3 equal numbers with one being different from the other 3.
I suspect that some memory is allocated to all sub processes...
I would love some hints on what is going on here and what a fix would look like. :)
thanks

When you use multiprocessing, you're talking about distinct processes. Distinct processes means distinct Python interpreters. Distinct interpreters means distinct random states. If you aren't seeding the random number generator uniquely on each process, then you're going to get the same starting random state from each process.

The answer was to put a new random seed into each process. Changing the function to
def get_random(seed):
np.random.seed()
dummy = random(1000) * seed
return np.mean(dummy)
gives the wanted results. 😊

Related

Mismatch between parallelized and linear nested for loops

I want to parallelize a piece of code that resembles the following:
Ngal=10
sampind=[7,16,22,31,45]
samples=0.3*np.ones((60,Ngal))
zt=[2.15,7.16,1.23,3.05,4.1,2.09,1.324,3.112,0.032,0.2356]
toavg=[]
for j in range(Ngal):
gal=[]
for m in sampind:
gal.append(samples[m][j]-zt[j])
toavg.append(np.mean(gal))
accuracy=np.mean(toavg)
so I followed the advice here and I rewrote it as follows:
toavg=[]
gal=[]
p = mp.Pool()
def deltaz(params):
j=params[0] # index of the galaxy
m=params[1] # indices for which we have sampled redshifts
gal.append(samples[m][j]-zt[j])
return np.mean(gal)
j=(np.linspace(0,Ngal-1,Ngal).astype(int))
m=sampind
grid=[j,m]
input=itertools.product(*grid)
results = p.map(deltaz,input)
accuracy=np.mean(results)
p.close()
p.join()
but the results are not the same. In fact, sometimes they are, sometimes they're not. It doesn't seem very deterministic. Is my approach correct? If not, what should I fix? Thank you! The modules that you will need to reproduce the above examples are:
import numpy as np
import multiprocess as mp
import itertools
Thank you!

The first issue I see is that you are creating a global variable gal which is being accessed by the function deltaz. These are however not shared between the pool processes but instantiated for each process separately. You will have to use shared memory if you want them to share this structure. This is probably why you see a non-deterministic behavior.
The next issue is that you are not actually completing the same tasking with the different variation. The first one you are taking an average of each set of averages (gal). The parallel one is taking an average of which ever elements happen to end up in that list. This is nondeterministic because items are assigned to processes as they become available and this is not necessarily predictable.
I would suggest parallelizing the inner loop. To do this, you need zt and samples to both be in shared memory because they are accessed by all of the processes. This can get dangerous if you are modifying data but since you appear to only be reading it should be fine.
import numpy as np
import multiprocessing as mp
import itertools
import ctypes
#Non-parallel code
Ngal=10
sampind=[7,16,22,31,45]
samples=0.3*np.ones((60,Ngal))
zt=[2.15,7.16,1.23,3.05,4.1,2.09,1.324,3.112,0.032,0.2356]
#Nonparallel
toavg=[]
for j in range(Ngal):
gal=[]
for m in sampind:
gal.append(samples[m][j]-zt[j])
toavg.append(np.mean(gal))
accuracy=np.mean(toavg)
print(toavg)
# Parallel function
def deltaz(j):
sampind=[7,16,22,31,45]
gal = []
for m in sampind:
gal.append(samples[m][j]-zt[j])
return np.mean(gal)
# Shared array for zt
zt_base = mp.Array(ctypes.c_double, int(len(zt)),lock=False)
ztArr = np.ctypeslib.as_array(zt_base)
#Shared array for samples
sample_base = mp.Array(ctypes.c_double, int(np.product(samples.shape)),lock=False)
sampArr = np.ctypeslib.as_array(sample_base)
sampArr = sampArr.reshape(samples.shape)
#Copy arrays to shared
sampArr[:,:] = samples[:,:]
ztArr[:] = zt[:]
with mp.Pool() as p:
result = p.map(deltaz,(np.linspace(0,Ngal-1,Ngal).astype(int)))
print(result)
Here is an example that produces the same results. You can add more complexity to this as you see fit but I would read about multiprocessing in general and memory types/scopes to get an idea of what will and won't work. You have to take more care when you get into the multiprocessing world. Let me know if this doesn't help and I will try to update it so that it does.

Refresh python process multiple times within a test in pytest

I have an interesting use case, where I have values that are inserted into an array in a random order. I then generate the hash of this array (using hashlib). Now, I want to make sure that this hash is always the same between different runs, so I am sorting this array before I hash it, and that solves the problem perfectly. (Note: this is an abstraction of the problem. It's not exactly like that, so replacing arrays by sets, or similar solutions are not what I'm looking for here).
Now, here is where my actual problem is: To prevent regressions, I want to add a test that reruns the process multiple times and ensures that the hash is the same between all those runs. The problem is, there is no way to rerun that process multiple times within the unit test. If I rerun it, the random order will always be the same (maybe because a random seed is being reused or something), negating the value of the test in the first place.
Thus, is there a way to run the test multiple times with a totally fresh Python process, in order to compare the outputs of the runs of the test, and make sure they are equal?

We were finally able to solve the problem with this:
import multiprocessing
def _generate_hash(queue: Queue):
// execute the hash generation process
def test_hash_is_consistent_across_multiple_runs():
multiprocessing.set_start_method("spawn")
q = multiprocessing.Queue()
for i in range(5):
p = multiprocessing.Process(target=_generate_hash, args=(q))
p.start()
p.join()
previous_hash = None
for i in range(5):
current_hash = q.get()
if previous_hash:
assert current_hash == previous_hash
previous_hash = current_hash
The trick is by using spawn, which will create a fresh python process for each created process.

How to share numpy random state of a parent process with child processes?

I set numpy random seed at the beginning of my program. During the program execution I run a function multiple times using multiprocessing.Process. The function uses numpy random functions to draw random numbers. The problem is that Process gets a copy of the current environment. Therefore, each process is running independently and they all start with the same random seed as the parent environment.
So my question is how can I share the random state of numpy in the parent environment with the child process environment? Just note that I want to use Process for my work and need to use a separate class and do import numpy in that class separately. I tried using multiprocessing.Manager to share the random state but it seems that things do not work as expected and I always get the same results. Also, it does not matter if I move the for loop inside drawNumpySamples or leave it in main.py; I still cannot get different numbers and the random state is always the same. Here's a simplified version of my code:
# randomClass.py
import numpy as np
class myClass(self):
def __init__(self, randomSt):
print ('setup the object')
np.random.set_state(randomSt)
def drawNumpySamples(self, idx)
np.random.uniform()
And in the main file:
# main.py
import numpy as np
from multiprocessing import Process, Manager
from randomClass import myClass
np.random.seed(1) # set random seed
mng = Manager()
randomState = mng.list(np.random.get_state())
myC = myClass(randomSt = randomState)
for i in range(10):
myC.drawNumpySamples() # this will always return the same results
Note: I use Python 3.5. I also posted an issue on Numpy's GitHub page. Just sending the issue link here for future reference.

Even if you manage to get this working, I don’t think it will do what you want. As soon as you have multiple processes pulling from the same random state in parallel, it’s no longer deterministic which order they each get to the state, meaning your runs won’t actually be repeatable. There are probably ways around that, but it seems like a nontrivial problem.
Meanwhile, there is a solution that should solve both the problem you want and the nondeterminism problem:
Before spawning a child process, ask the RNG for a random number, and pass it to the child. The child can then seed with that number. Each child will then have a different random sequence from other children, but the same random sequence that the same child got if you rerun the entire app with a fixed seed.
If your main process does any other RNG work that could depend non-deterministically on the execution of the children, you'll need to pre-generate the seeds for all of your child processes, in order, before pulling any other random numbers.
As senderle pointed out in a comment: If you don't need multiple distinct runs, but just one fixed run, you don't even really need to pull a seed from your seeded RNG; just use a counter starting at 1 and increment it for each new process, and use that as a seed. I don't know if that's acceptable, but if it is, it's hard to get simpler than that.
As Amir pointed out in a comment: a better way is to draw a random integer every time you spawn a new process and pass that random integer to the new process to set the numpy's random seed with that integer. This integer can indeed come from np.random.randint().

You need to update the state of the Manager each time you get a random number:
import numpy as np
from multiprocessing import Manager, Pool, Lock
lock = Lock()
mng = Manager()
state = mng.list(np.random.get_state())
def get_random(_):
with lock:
np.random.set_state(state)
result = np.random.uniform()
state[:] = np.random.get_state()
return result
np.random.seed(1)
result1 = Pool(10).map(get_random, range(10))
# Compare with non-parallel version
np.random.seed(1)
result2 = [np.random.uniform() for _ in range(10)]
# result of Pool.map may be in different order
assert sorted(result1) == sorted(result2)

Fortunately, according to the documentation, you can access the complete state of the numpy random number generator using get_state and set it again using set_state. The generator itself uses the Mersenne Twister algorithm (see the RandomState part of the documentation).
This means you can do anything you want, though whether it will be good and efficient is a different question entirely. As abarnert points out, no matter how you share the parent's state—this could use Alex Hall's method, which looks correct—your sequencing within each child will depend on the order in which each child draws random numbers from the MT state machine.
It would perhaps be better to build a large pool of pseudo-random numbers for each child, saving the start state of the entire generator once at the start. Then each child can draw a PRNG value until its particular pool runs out, after which you have the child coordinate with the parent for the next pool. The parent enumerates which children got which "pool'th" number. The code would look something like this (note that it would make sense to turn this into an infinite generator with a next method):
class PrngPool(object):
def __init__(self, child_id, shared_state):
self._child_id = child_id
self._shared_state = shared_state
self._numbers = []
def next_number(self):
if not self.numbers:
self._refill()
return self.numbers.pop(0) # XXX inefficient
def _refill(self):
# ... something like Alex Hall's lock/gen/unlock,
# but fill up self._numbers with the next 1000 (or
# however many) numbers after adding our ID and
# the index "n" of which n-through-n+999 numbers
# we took here. Any other child also doing a
# _refill will wait for the lock and get an updated
# index n -- eg, if we got numbers 3000 to 3999,
# the next child will get numbers 4000 to 4999.
This way there is not nearly as much communication through Manager items (MT state and our ID-and-index added to the "used" list). At the end of the process, it's possible to see which children used which PRNG values, and to re-generate those PRNG values if needed (remember to record the full MT internal start state!).
Edit to add: The way to think about this is like this: the MT is not actually random. It is periodic with a very long period. When you use any such RNG, your seed is simply a starting point within the period. To get repeatability you must use non-random numbers, such as a set from a book. There is a (virtual) book with every number that comes out of the MT generator. We're going to write down which page(s) of this book we used for each group of computations, so that we can re-open the book to those pages later and re-do the same computations.

You can use np.random.SeedSequence. See https://numpy.org/doc/stable/reference/random/parallel.html:
from numpy.random import SeedSequence, default_rng
ss = SeedSequence(12345)
# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]
This way, each of you thread/process will get a statistically independent random generator.

Parallelize Image Processing Using Numpy

I'm trying to speed up a section of my code using parallel processing in python, but I'm having trouble getting it to work right, or even find examples that are relevant to me.
The code produces a low-polygon version of an image using Delaunay triangulation, and the part that's slowing me down is finding the mean values of each triangle.
I've been able to get a good speed increase by vectorizing my code, but hope to get more using parallelization:
The code I'm having trouble with is an extremely simple for loop:
for tri in tris:
lopo[tridex==tri,:] = np.mean(hipo[tridex==tri,:],axis=0)
The variables referenced are as follows.
tris - a unique python list of all the indices of the triangles
lopo - a Numpy array of the final low-polygon version of the image
hipo - a Numpy array of the original image
tridex - a Numpy array the same size as the image. Each element represents a pixel and stores the triangle that the pixel lies within
I can't seem to find a good example that uses multiple numpy arrays as input, with one of them shared.
I've tried multiprocessing (with the above snippet wrapped in a function called colorImage):
p = Process(target=colorImage, args=(hipo,lopo,tridex,ppTris))
p.start()
p.join()
But I get a a broken pipe error immediately.

So the way that Python's multiprocessing works (for the most part) is that you have to designate the individual threads that you want to run. I made a brief introductory tutorial here: http://will-farmer.com/parallel-python.html
In your case, what I would recommend is split tris into a bunch of different parts, each equally sized, each that represents a "worker". You can split this list with numpy.split() (documentation here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.split.html).
Then for each list in tri, we use the Threading and Queue modules to designate 8 workers.
import numpy as np
# split into 8 different lists
tri_lists = np.split(tris, 8)
# Queues are threadsafe
return_values = queue.Queue()
threads = []
def color_image(q, tris, hipo, tridex):
""" This is the function we're parallelizing """
for tri in tris:
return_values.put(np.mean(hipo[tridex==tri,:], axis=0))
# Now we run the jobs
for i in range(8):
threads.append(threading.Thread(
target=color_image,
args=(return_values, tri_lists[i], hipo, tridex)))
# Now we have to cleanup our results
# First get items from queue
results = [item for item in return_values.queue]
# Now set values in lopo
for i in range(len(results)):
for t in tri_lists[i]:
lopo[tridex==t, :] = results[i]
This isn't the cleanest way to do it, and I'm not sure if it works since I can't test it, but this is a decent way to do it. The parallelized part is now np.mean(), while setting the values is not parallelized.
If you want to also parallelize the setting of the values, you'll have to have a shared variable, either using the Queue, or with a global variable.
See this post for a shared global variable: Python Global Variable with thread

Python: Splitting up a sum with threads

i have a costly calculation to do for fitting some experimental data. The fitting function is a sum over eigenmodes, each of them containing a specific surface integral. As it is rather slow if you do it the classical way i thought about threading it. I'm using python btw.
The function i want to calculate is something like
def fit_func(params , Mmin, Mmax):
values = np.zeros(1000)
for m in range(Mmin, Mmax):
# Fancy Calculation for each mode
# some calulation with all modes, adding them up 'values'
return values
How can i split this up? I did something like
data1 = thread.start_new_thread(fit_func, (params,0,13))
data2 = thread.start_new_thread(fit_func, (params,13,25))
but then the sum of data1 and data2 is not the same as fitfunc(params, 0,25)...

Try out multiprocessing. This will effectively create separate Python processes using a thread-like interface. However, make sure that you profile your computation and make sure that it is the problem, not something else like IO. Starting processes is very slow, so keep them around for a while if you are planning to use them.
You can also use numpy for those functions. They're written in C code, so they're stupid fast. Check them both out and see what fits best. I would go for the numpy solution myself...

use multiprocessing pool
import multiprocessing as mp
p = mp.Pool(10)
res = p.map(your_function, range(Mmin, Mmax))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.