Make the random module thread-safe in Python - python

I have an application requiring the same results given the same random seed. But I find random.randint not threadsafe. I have tried mutex but this does not work. Here is my experiment code (long but simple):
import threading
import random
def child(n, a):
g_mutex = threading.Lock()
g_mutex.acquire()
random.seed(n)
for i in xrange(100):
a.append(random.randint(0, 1000))
g_mutex.release()
def main():
a = []
b = []
c1 = threading.Thread(target = child, args = (10, a))
c2 = threading.Thread(target = child, args = (20, b))
c1.start()
c2.start()
c1.join()
c2.join()
c = []
d = []
c1 = threading.Thread(target = child, args = (10, c))
c2 = threading.Thread(target = child, args = (20, d))
c1.start()
c1.join()
c2.start()
c2.join()
print a == c, b == d
if __name__ == "__main__":
main()
I want to code to print true, true, but it stands a chance to give false, false. How can I make threadsafe randint?

You can create separate instances of random.Random for each thread
>>> import random
>>> local_random = random.Random()
>>> local_random.seed(1234)
>>> local_random.randint(1,1000)
967

From the documentation for random:
The functions supplied by this module are actually bound methods of a hidden instance of the random.Random class. You can instantiate your own instances of Random to get generators that don’t share state. This is especially useful for multi-threaded programs, creating a different instance of Random for each thread, and using the jumpahead() method to make it likely that the generated sequences seen by each thread don’t overlap.
The documentation doesn't say exactly what this class is, but it does show class random.SystemRandom([seed]), and random.Random([seed]) seems to be the same.
Example:
local_random = random.Random(n)
for i in xrange(100):
a.append(local_random.randint(0, 1000))

Others have pointed out the proper way to use random in a thread safe way. But I feel it's important to point out that the code you wrote would not be thread-safe for anything.
def child(n, a):
g_mutex = threading.Lock()
g_mutex.acquire()
random.seed(n)
for i in xrange(100):
a.append(random.randint(0, 1000))
g_mutex.release()
Each thread is running this method independently. That means that each thread is making their own lock instance, acquiring it, doing work, and then releasing it. Unless every thread is attempting to acquire the same lock, the there is nothing to ensure non-parallel execution. You need to assign a single value to g_mutex outside of the context of your run method.
Edit:
I just want to add that simply switching to a global lock is not guaranteed to do exactly what you said. The lock will ensure that only one thread is generating numbers at a time, but it does not guarantee which thread will start first.

Related

Python 3: RuntimeError: Synchronized objects should only be shared between processes through inheritance [duplicate]

I have a very large (read only) array of data that I want to be processed by multiple processes in parallel.
I like the Pool.map function and would like to use it to calculate functions on that data in parallel.
I saw that one can use the Value or Array class to use shared memory data between processes. But when I try to use this I get a RuntimeError: 'SynchronizedString objects should only be shared between processes through inheritance when using the Pool.map function:
Here is a simplified example of what I am trying to do:
from sys import stdin
from multiprocessing import Pool, Array
def count_it( arr, key ):
count = 0
for c in arr:
if c == key:
count += 1
return count
if __name__ == '__main__':
testData = "abcabcs bsdfsdf gdfg dffdgdfg sdfsdfsd sdfdsfsdf"
# want to share it using shared memory
toShare = Array('c', testData)
# this works
print count_it( toShare, "a" )
pool = Pool()
# RuntimeError here
print pool.map( count_it, [(toShare,key) for key in ["a", "b", "s", "d"]] )
Can anyone tell me what I am doing wrong here?
So what I would like to do is pass info about a newly created shared memory allocated array to the processes after they have been created in the process pool.
Trying again as I just saw the bounty ;)
Basically I think the error message means what it said - multiprocessing shared memory Arrays can't be passed as arguments (by pickling). It doesn't make sense to serialise the data - the point is the data is shared memory. So you have to make the shared array global. I think it's neater to put it as the attribute of a module, as in my first answer, but just leaving it as a global variable in your example also works well. Taking on board your point of not wanting to set the data before the fork, here is a modified example. If you wanted to have more than one possible shared array (and that's why you wanted to pass toShare as an argument) you could similarly make a global list of shared arrays, and just pass the index to count_it (which would become for c in toShare[i]:).
from sys import stdin
from multiprocessing import Pool, Array, Process
def count_it( key ):
count = 0
for c in toShare:
if c == key:
count += 1
return count
if __name__ == '__main__':
# allocate shared array - want lock=False in this case since we
# aren't writing to it and want to allow multiple processes to access
# at the same time - I think with lock=True there would be little or
# no speedup
maxLength = 50
toShare = Array('c', maxLength, lock=False)
# fork
pool = Pool()
# can set data after fork
testData = "abcabcs bsdfsdf gdfg dffdgdfg sdfsdfsd sdfdsfsdf"
if len(testData) > maxLength:
raise ValueError, "Shared array too small to hold data"
toShare[:len(testData)] = testData
print pool.map( count_it, ["a", "b", "s", "d"] )
[EDIT: The above doesn't work on windows because of not using fork. However, the below does work on Windows, still using Pool, so I think this is the closest to what you want:
from sys import stdin
from multiprocessing import Pool, Array, Process
import mymodule
def count_it( key ):
count = 0
for c in mymodule.toShare:
if c == key:
count += 1
return count
def initProcess(share):
mymodule.toShare = share
if __name__ == '__main__':
# allocate shared array - want lock=False in this case since we
# aren't writing to it and want to allow multiple processes to access
# at the same time - I think with lock=True there would be little or
# no speedup
maxLength = 50
toShare = Array('c', maxLength, lock=False)
# fork
pool = Pool(initializer=initProcess,initargs=(toShare,))
# can set data after fork
testData = "abcabcs bsdfsdf gdfg dffdgdfg sdfsdfsd sdfdsfsdf"
if len(testData) > maxLength:
raise ValueError, "Shared array too small to hold data"
toShare[:len(testData)] = testData
print pool.map( count_it, ["a", "b", "s", "d"] )
Not sure why map won't Pickle the array but Process and Pool will - I think perhaps it has be transferred at the point of the subprocess initialization on windows. Note that the data is still set after the fork though.
If you're seeing:
RuntimeError: Synchronized objects should only be shared between processes through inheritance
Consider using multiprocessing.Manager as it doesn't have this limitation. The manager works considering it presumably runs in a separate process altogether.
import ctypes
import multiprocessing
# Put this in a method or function, otherwise it will run on import from each module:
manager = multiprocessing.Manager()
counter = manager.Value(ctypes.c_ulonglong, 0)
counter_lock = manager.Lock() # pylint: disable=no-member
with counter_lock:
counter.value = count = counter.value + 1
If the data is read only just make it a variable in a module before the fork from Pool. Then all the child processes should be able to access it, and it won't be copied provided you don't write to it.
import myglobals # anything (empty .py file)
myglobals.data = []
def count_it( key ):
count = 0
for c in myglobals.data:
if c == key:
count += 1
return count
if __name__ == '__main__':
myglobals.data = "abcabcs bsdfsdf gdfg dffdgdfg sdfsdfsd sdfdsfsdf"
pool = Pool()
print pool.map( count_it, ["a", "b", "s", "d"] )
If you do want to try to use Array though you could try with the lock=False keyword argument (it is true by default).
The problem I see is that Pool doesn't support pickling shared data through its argument list. That's what the error message means by "objects should only be shared between processes through inheritance". The shared data needs to be inherited, i.e., global if you want to share it using the Pool class.
If you need to pass them explicitly, you may have to use multiprocessing.Process. Here is your reworked example:
from multiprocessing import Process, Array, Queue
def count_it( q, arr, key ):
count = 0
for c in arr:
if c == key:
count += 1
q.put((key, count))
if __name__ == '__main__':
testData = "abcabcs bsdfsdf gdfg dffdgdfg sdfsdfsd sdfdsfsdf"
# want to share it using shared memory
toShare = Array('c', testData)
q = Queue()
keys = ['a', 'b', 's', 'd']
workers = [Process(target=count_it, args = (q, toShare, key))
for key in keys]
for p in workers:
p.start()
for p in workers:
p.join()
while not q.empty():
print q.get(),
Output: ('s', 9) ('a', 2) ('b', 3)
('d', 12)
The ordering of elements of the queue may vary.
To make this more generic and similar to Pool, you could create a fixed N number of Processes, split the list of keys into N pieces, and then use a wrapper function as the Process target, which will call count_it for each key in the list it is passed, like:
def wrapper( q, arr, keys ):
for k in keys:
count_it(q, arr, k)

Sharing a counter with multiprocessing.Pool

I'd like to use multiprocessing.Value + multiprocessing.Lock to share a counter between separate processes. For example:
import itertools as it
import multiprocessing
def func(x, val, lock):
for i in range(x):
i ** 2
with lock:
val.value += 1
print('counter incremented to:', val.value)
if __name__ == '__main__':
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
with multiprocessing.Pool() as pool:
pool.starmap(func, ((i, v, lock) for i in range(25)))
print(counter.value())
This will throw the following exception:
RuntimeError: Synchronized objects should only be shared between
processes through inheritance
What I am most confused by is that a related (albeit not completely analogous) pattern works with multiprocessing.Process():
if __name__ == '__main__':
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
procs = [multiprocessing.Process(target=func, args=(i, v, lock))
for i in range(25)]
for p in procs: p.start()
for p in procs: p.join()
Now, I recognize that these are two different markedly things:
the first example uses a number of worker processes equal to cpu_count(), and splits an iterable range(25) between them
the second example creates 25 worker processes and tasks each with one input
That said: how can I share an instance with pool.starmap() (or pool.map()) in this manner?
I've seen similar questions here, here, and here, but those approaches doesn't seem to be suited to .map()/.starmap(), regarldess of whether Value uses ctypes.c_int.
I realize that this approach technically works:
def func(x):
for i in range(x):
i ** 2
with lock:
v.value += 1
print('counter incremented to:', v.value)
v = None
lock = None
def set_global_counter_and_lock():
"""Egh ... """
global v, lock
if not any((v, lock)):
v = multiprocessing.Value('i', 0)
lock = multiprocessing.Lock()
if __name__ == '__main__':
# Each worker process will call `initializer()` when it starts.
with multiprocessing.Pool(initializer=set_global_counter_and_lock) as pool:
pool.map(func, range(25))
Is this really the best-practices way of going about this?
The RuntimeError you get when using Pool is because arguments for pool-methods are pickled before being send over a (pool-internal) queue to the worker processes.
Which pool-method you are trying to use is irrelevant here. This doesn't happen when you just use Process because there is no queue involved. You can reproduce the error just with pickle.dumps(multiprocessing.Value('i', 0)).
Your last code snippet doesn't work how you think it works. You are not sharing a Value, you are recreating independent counters for every child process.
In case you were on Unix and used the default start-method "fork", you would be done with just not passing the shared objects as arguments into the pool-methods.
Your child-processes would inherit the globals through forking. With process-start-methods "spawn" (default Windows and macOS with Python 3.8+) or "forkserver", you'll have to use the initializer during Pool
instantiation, to let the child-processes inherit the shared objects.
Note, you don't need an extra multiprocessing.Lock here, because multiprocessing.Value comes by default with an internal one you can use.
import os
from multiprocessing import Pool, Value #, set_start_method
def func(x):
for i in range(x):
assert i == i
with cnt.get_lock():
cnt.value += 1
print(f'{os.getpid()} | counter incremented to: {cnt.value}\n')
def init_globals(counter):
global cnt
cnt = counter
if __name__ == '__main__':
# set_start_method('spawn')
cnt = Value('i', 0)
iterable = [10000 for _ in range(10)]
with Pool(initializer=init_globals, initargs=(cnt,)) as pool:
pool.map(func, iterable)
assert cnt.value == 100000
Probably worth noting as well is that you don't need the counter to be shared in all cases.
If you just need to keep track of how often something has happened in total, an option would be to keep separate worker-local counters during computation which you sum up at the end.
This could result in a significant performance improvement for frequent counter updates for which you don't need synchronization during the parallel computation itself.

Eliminating overhead in multiprocessing with pool

I am currently in a situation where I have parallelized code called repeatedly and try to reduce the overhead associated with the multiprocessing. So, consider the following example, which deliberately contains no "expensive" computations:
import multiprocessing as mp
def f(x):
# toy function
return x*x
if __name__ == '__main__':
for x in range(500):
pool = mp.Pool(processes=2)
print(pool.map(f, range(x, x + 50)))
pool.close()
pool.join() # necessary?
This code takes 53 seconds compared to 0.04 seconds for the sequential approach.
First question: do I really need to call pool.join() in this case when only pool.map() is ever used? I cannot find any negative effects from omitting it and the runtime would drop to 4.8 seconds. (I understand that omitting pool.close() is not possible, as we would be leaking threads then.)
Now, while this would be a nice improvement, as a first answer I would probably get "well, don't create the pool in the loop in the first place". Ok, no problem, but the parallelized code actually lives in an instance method, so I would use:
class MyObject:
def __init__(self):
self.pool = mp.Pool(processes=2)
def function(self, x):
print(self.pool.map(f, range(x, x + 50)))
if __name__ == '__main__':
my_object = MyObject()
for x in range(500):
my_object.function(x)
This would be my favorite solution as it runs in excellent 0.4 seconds.
Second question: should I call pool.close()/pool.join() somewhere explicitly (e.g. in the destructor of MyObject) or is the current code sufficient? (If it matters: it is ok to assume there are only a few long-lived instances of MyObject in my project.)
Of course it takes a long time: you keep allocating a new pool and destroying it for every x.
It will run much faster if instead you do:
if __name__ == '__main__':
pool = mp.Pool(processes=2) # allocate the pool only once
for x in range(500):
print(pool.map(f, range(x, x + 50)))
pool.close() # close it only after all the requests are submitted
pool.join() # wait for the last worker to finish
Try that and you'll see it now works much faster.
Here are links to the docs for join and close:
Once close is called you can't submit more tasks to the pool, and join waits till the last worker finished its job. They should be called in that order (first close then join).
Well, actually you could pass already allocated pool as argument to your object:
class MyObject:
def __init__(self, pool):
self.pool = pool
def function(self, x):
print(self.pool.map(f, range(x, x + 50)))
if __name__ == '__main__':
with mp.Pool(2) as pool:
my_object = MyObject(pool)
my_second_object = MyObject(pool)
for x in range(500):
my_object.function(x)
my_second_object.function(x)
pool.close()
I can not find a reason why it might be necessary to use different pools in different objects

Dictionary multiprocessing

I want to parallelize the processing of a dictionary using the multiprocessing library.
My problem can be reduced to this code:
from multiprocessing import Manager,Pool
def modify_dictionary(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_dictionary,jobargs)
p.close()
p.join()
print dictionary[(3,3)]
I create a pool of 5 workers, and each worker should increment dictionary[(3,3)] 100 times. So, if the locking process works correctly, I expect dictionary[(3,3)] to be 500 at the end of the script.
However; something in my code must be wrong, because this is not what I get: the locking process does not seem to be "activated" and dictionary[(3,3)] always have a valuer <500 at the end of the script.
Could you help me?
The problem is with this line:
dictionary[(3,3)] = dictionary[(3,3)]+1
Three things happen on that line:
Read the value of the dictionary key (3,3)
Increment the value by 1
Write the value back again
But the increment part is happening outside of any locking.
The whole sequence must be atomic, and must be synchronized across all processes. Otherwise the processes will interleave giving you a lower than expected total.
Holding a lock whist incrementing the value ensures that you get the total of 500 you expect:
from multiprocessing import Manager,Pool,Lock
lock = Lock()
def modify_array(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
with lock:
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_array,jobargs)
p.close()
p.join()
print dictionary[(3,3)]
I ve managed many times to find here the correct solution to a programming difficulty I had. So I would like to contribute a little bit. Above code still has the problem of not updating right the dictionary. To have the right result you have to pass lock and correct jobargs to f. In above code you make a new dictionary in every proccess. The code I found to work fine:
from multiprocessing import Process, Manager, Pool, Lock
from functools import partial
def f(dictionary, l, k):
with l:
for i in range(100):
dictionary[3] += 1
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict()
lock = manager.Lock()
dictionary[3] = 0
jobargs = list(range(5))
pool = Pool()
func = partial(f, dictionary, lock)
t = pool.map(func, jobargs)
pool.close()
pool.join()
print(dictionary)
In the OP's code, it is locking the entire iteration. In general, you should only apply locks for the shortest time, as long as it is effective. The following code is much more efficient. You acquire the lock only to make the code atomic
def f(dictionary, l, k):
for i in range(100):
with l:
dictionary[3] += 1
Note that dictionary[3] += 1 is not atomic, so it must be locked.

Python Multiprocessing with a single function

I have a simulation that is currently running, but the ETA is about 40 hours -- I'm trying to speed it up with multi-processing.
It essentially iterates over 3 values of one variable (L), and over 99 values of of a second variable (a). Using these values, it essentially runs a complex simulation and returns 9 different standard deviations. Thus (even though I haven't coded it that way yet) it is essentially a function that takes two values as inputs (L,a) and returns 9 values.
Here is the essence of the code I have:
STD_1 = []
STD_2 = []
# etc.
for L in range(0,6,2):
for a in range(1,100):
### simulation code ###
STD_1.append(value_1)
STD_2.append(value_2)
# etc.
Here is what I can modify it to:
master_list = []
def simulate(a,L):
### simulation code ###
return (a,L,STD_1, STD_2 etc.)
for L in range(0,6,2):
for a in range(1,100):
master_list.append(simulate(a,L))
Since each of the simulations are independent, it seems like an ideal place to implement some sort of multi-threading/processing.
How exactly would I go about coding this?
EDIT: Also, will everything be returned to the master list in order, or could it possibly be out of order if multiple processes are working?
EDIT 2: This is my code -- but it doesn't run correctly. It asks if I want to kill the program right after I run it.
import multiprocessing
data = []
for L in range(0,6,2):
for a in range(1,100):
data.append((L,a))
print (data)
def simulation(arg):
# unpack the tuple
a = arg[1]
L = arg[0]
STD_1 = a**2
STD_2 = a**3
STD_3 = a**4
# simulation code #
return((STD_1,STD_2,STD_3))
print("1")
p = multiprocessing.Pool()
print ("2")
results = p.map(simulation, data)
EDIT 3: Also what are the limitations of multiprocessing. I've heard that it doesn't work on OS X. Is this correct?
Wrap the data for each iteration up into a tuple.
Make a list data of those tuples
Write a function f to process one tuple and return one result
Create p = multiprocessing.Pool() object.
Call results = p.map(f, data)
This will run as many instances of f as your machine has cores in separate processes.
Edit1: Example:
from multiprocessing import Pool
data = [('bla', 1, 3, 7), ('spam', 12, 4, 8), ('eggs', 17, 1, 3)]
def f(t):
name, a, b, c = t
return (name, a + b + c)
p = Pool()
results = p.map(f, data)
print results
Edit2:
Multiprocessing should work fine on UNIX-like platforms such as OSX. Only platforms that lack os.fork (mainly MS Windows) need special attention. But even there it still works. See the multiprocessing documentation.
Here is one way to run it in parallel threads:
import threading
L_a = []
for L in range(0,6,2):
for a in range(1,100):
L_a.append((L,a))
# Add the rest of your objects here
def RunParallelThreads():
# Create an index list
indexes = range(0,len(L_a))
# Create the output list
output = [None for i in indexes]
# Create all the parallel threads
threads = [threading.Thread(target=simulate,args=(output,i)) for i in indexes]
# Start all the parallel threads
for thread in threads: thread.start()
# Wait for all the parallel threads to complete
for thread in threads: thread.join()
# Return the output list
return output
def simulate(list,index):
(L,a) = L_a[index]
list[index] = (a,L) # Add the rest of your objects here
master_list = RunParallelThreads()
Use Pool().imap_unordered if ordering is not important. It will return results in a non-blocking fashion.

Categories