I am trying to use multiprocessing for the first time and having some fairly basic issues. I have a toy example below, where two processes are adding data to a list:
def add_process(all_nums_class, numbers_to_add):
for number in numbers_to_add:
all_nums_class.all_nums_list.append(number)
class AllNumsClass:
def __init__(self):
self.all_nums_list = []
all_nums_class = AllNumsClass()
p1 = Process(target=add_process, args=(all_nums_class, [1,3,5]))
p1.start()
p2 = Process(target=add_process, args=(all_nums_class, [2,4,6]))
p2.start()
all_nums_class.all_nums_list
I'd like to have the all_nums_class shared between these processes so that they can both add to its all_nums_list - so the result should be
[1,2,3,4,5,6]
instead of what I'm currently getting which is just good old
[]
Could anybody please advise? I have played around with namespace a bit but I haven't yet made it work here.
I feel I'd better mention (in case it makes a difference) that I'm doing this on Jupyter notebook.
You can either use a multiprocessing Queue or a Pipe to share data between processes. Queues are both thread and process safe. You will have to be more careful when using a Pipe as the data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
Currently, your implementation spawns two separate processes each with its own self.all_nums_list. So you're essentially spawning three objects of AllNumsClass: One in your main program, one in p1, and one in p2. Since processes are independent and don't share the same memory space, they are appending correctly but its appending to its own self.all_nums_list for each process. That's why when you print all_nums_class.all_nums_list in your main program, you're printing the main processes' self.all_nums_list which is a empty list. To share the data and have the processes append to the same list, I would recommend using a Queue.
Example using Queue and Process
import multiprocessing as mp
def add_process(queue, numbers_to_add):
for number in numbers_to_add:
queue.put(number)
class AllNumsClass:
def __init__(self):
self.queue = mp.Queue()
def get_queue(self):
return self.queue
if __name__ == '__main__':
all_nums_class = AllNumsClass()
processes = []
p1 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [1,3,5]))
p2 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [2,4,6]))
processes.append(p1)
processes.append(p2)
for p in processes:
p.start()
for p in processes:
p.join()
output = []
while all_nums_class.get_queue().qsize() > 0:
output.append(all_nums_class.get_queue().get())
print(output)
This implementation is asynchronous as it does not apply in sequential order. Every time you run it, you may get different outputs.
Example outputs
[1, 2, 3, 5, 4, 6]
[1, 3, 5, 2, 4, 6]
[2, 4, 6, 1, 3, 5]
[2, 1, 4, 3, 5, 6]
A simpler way to maintain an ordered or unordered list of results is to use the mp.Pool class. Specifically, the Pool.apply and the Pool.apply_async functions. Pool.apply will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications. In contrast, Pool.apply_async will submit all processes at once and retrieve the results as soon as they are finished. An additional difference is that we need to use the get method after the Pool.apply_async call in order to obtain the return values of the finished processes.
Related
I am looking for some good example code of MultiProcessing in Python that would take in a large array (broken into different sections of the same main array) to speed up the processing of the subsequent Output file. I am noticing that there are other things like Lock() functions to make sure it comes back in a certain order but not a good example of how to get the resulting arrays back out when the jobs are run so I can output a single CSV file in the correct time series order.
Below is what I have been working with so far with the queue. How can one assign the results of q1.get() or others to recombine later? It just spins when I try assigning it with temp = q1.get()... And good examples of splitting out an array, sending it to multiple processes, then recombining the results of the function(s) called would be appreciated. I am using Python 3.7 and Windows 10.
import time
import multiprocessing
from multiprocessing import Process, Queue
def f1(q, testArray):
testArray2 = [[41, None, 'help'], [42, None, 'help'], [43, None, 'help']]
testArray = testArray + testArray2
q.put(testArray)
def f2(q, testArray):
#testArray.append([43, None, 'goodbye'])
testArray = testArray + ([44, None, 'goodbye'])
q.put(testArray)
return testArray
if __name__ == '__main__':
print("Number of cpu : ", multiprocessing.cpu_count())
testArray1 = [1]
testArray2 = [2]
q1 = Queue()
q2 = Queue()
p1 = multiprocessing.Process(target=f1, args=(q1, testArray1,))
p2 = multiprocessing.Process(target=f2, args=(q2, testArray2,))
p1.start()
p2.start()
print(q1.get()) # prints whatever you set in function above
print(q2.get()) # prints whatever you set in function above
print(testArray1)
print(testArray2)
p1.join()
p2.join()
I believe you only need one queue for all of your processes. The queue is designed for inter-process communication.
For the ordering you can pass in a process id and sort based on that after the results are joined. Or you can try and use multiprocessing pool as furas suggests.
Which sounds like a better approach. Worker pools in general allocate a pool of workers up front then run a set of jobs on the pool. This is more efficient because the processes / threads are set up initially and reused for jobs. Where your implementation is going the process is created per job / function which is costly depending on how much data you're crunching.
I understand this is a slightly vague and open ended question, but I need some help in this area as a quick Google/Stack Overflow search hasn't yielded useful information.
The basic idea is to use multiple processes to speed up an expensive computation that currently gets executed sequentially in a loop. The caveat being that I have 2 significant data structures that are accessed by the expensive function:
one data structure will be read by all processes but is not ever modified by a process (so could be copied to each process, assuming memory size isn't an issue, which, in this case, it isn't)
the other data structure will spend most of the time being read by processes, but will occasionally be written to, and this update needs to be propagated to all processes from that point onwards
Currently the program works very basically like so:
def do_all_the_things(self):
read_only_obj = {...}
read_write_obj = {...}
output = []
for i in range(4):
for j in range(4):
output.append(do_expensive_operation(read_only_obj, read_write_obj))
return output
In a uniprocessor world, this is fine as any changes made to read_write_obj are accessed sequentially.
What I am looking to do is to run each instance of do_expensive_operation in a separate process so that a multiprocessor can be fully utilised.
The two things I am looking to understand are:
How does the whole multiprocessing thing work. I have seen Queues and Pools and don't understand which I should be using in this situation?
I have a feeling sharing memory (read_only_obj and read_write_obj) is going to be complicated. Is this possible? Advisable? And how do I go about it?
Thank you for your time!
Disclamer: I will help you and will provide you with a working example but I am not an expert in this topic.
Point 1 has been answered here to some extent.
Point 2 has been answered here to some extent.
I used different options in the past for CPU-bound tasks in python and here is one toy example for you to follow:
from multiprocessing import Process, Queue
import time, random
def do_something(n_order, x, queue):
time.sleep(5)
queue.put((idx, x))
def main():
data = [1,2,3,4,5]
queue = Queue()
processes = [Process(target=do_something, args=(n,x,queue)) for n,x in enumerate(data)]
for p in processes:
p.start()
for p in processes:
p.join()
unsorted_result = [queue.get() for _ in processes]
result = [i[1] for i in sorted(unsorted_result)]
print(result)
You can write the same but in a loop instead of using queues and check the time consumed (in this silly case is the sleep, for testing purposes) and you will realized that you shortened the time approximately by the number of processes that you run, as expected.
In fact, this is the results in my computer for the exact script that I provide you with (first multiprocess and the second loop):
[1, 2, 3, 4, 5]
real 0m5.240s
user 0m0.397s
sys 0m0.260s
[1, 4, 9, 16, 25]
real 0m25.104s
user 0m0.051s
sys 0m0.030s
With respect to read_only or read and write objects, I will need more information to provide help. What type of objects are those? Are they indexed?
I'm using Python 2.7's multiprocessing.Pool to manage a pool of 3 workers. Each worker is fairly complicated and there's a resource leak (presumably) in some third-party code that causes problems after 6-8 hours of continuous runtime. So I'd like to use maxtasksperchild to have workers refreshed periodically.
I'd also like each worker to write to its own separate log file. Without maxtasksperchild I use a shared multiprocessing.Value to assign an integer (0, 1, or 2) to each worker, then use the integer to name the log file.
With maxtasksperchild I'd like to reuse log files once a worker is done. So if this whole thing runs for a month, I only want three log files, not one log file for each worker that was spawned.
If I could pass a callback (e.g. a finalizer to go along with the initializer currently supported), this would be straightforward. Without that, I can't see a robust and simple way to do it.
That's AFAIK undocumented, but multiprocessing has a Finalizer class, "which supports object finalization using weakrefs". You could use it to register a finalizer within your initializer.
I don't see multiprocessing.Value a helpful synchronization choice in this case, though. Multiple workers could exit simultaneously, signaling which file-integers are free is more than a (locked) counter could provide then.
I would suggest use of multiple bare multiprocessing.Locks, one for each file, instead:
from multiprocessing import Pool, Lock, current_process
from multiprocessing.util import Finalize
def f(n):
global fileno
for _ in range(int(n)): # xrange for Python 2
pass
return fileno
def init_fileno(file_locks):
for i, lock in enumerate(file_locks):
if lock.acquire(False): # non-blocking attempt
globals()['fileno'] = i
print("{} using fileno: {}".format(current_process().name, i))
Finalize(lock, lock.release, exitpriority=15)
break
if __name__ == '__main__':
n_proc = 3
file_locks = [Lock() for _ in range(n_proc)]
pool = Pool(
n_proc, initializer=init_fileno, initargs=(file_locks,),
maxtasksperchild=2
)
print(pool.map(func=f, iterable=[50e6] * 18))
pool.close()
pool.join()
# all locks should be available if all finalizers did run
assert all(lock.acquire(False) for lock in file_locks)
Output:
ForkPoolWorker-1 using fileno: 0
ForkPoolWorker-2 using fileno: 1
ForkPoolWorker-3 using fileno: 2
ForkPoolWorker-4 using fileno: 0
ForkPoolWorker-5 using fileno: 1
ForkPoolWorker-6 using fileno: 2
[0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 2]
Process finished with exit code 0
Note that with Python 3 you can't use Pool's context-manager reliably instead of the old way of doing it shown above. Pool's context-manager (unfortunately) calls terminate(), which might kill worker-processes before the finalizer had a chance to run.
I ended up going with the following. It assumes that PIDs aren't recycled very quickly (true on Ubuntu for me, but not in general on Unix). I don't think it makes any other assumptions, but I'm really just interested in Ubuntu so I didn't look at other platforms such as Windows carefully.
The code use an array to keep track of which PIDs have claimed which index. Then when a new worker is started, it looks to see if any PIDs are no longer in use. If it finds one, it assumes this is because the worker has completed its work (or been terminated for another reason). If it doesn't find one then we're out of luck! So this isn't perfect but I think its simpler than anything I've seen so far or considered.
def run_pool():
child_pids = Array('i', 3)
pool = Pool(3, initializser=init_worker, initargs=(child_pids,), maxtasksperchild=1000)
def init_worker(child_pids):
with child_pids.get_lock():
available_index = None
for index, pid in enumerate(child_pids):
# PID 0 means unallocated (this happens when our pool is started), we reclaim PIDs
# which are no longer in use. We also reclaim the lucky case where a PID was recycled
# but assigned to one of our workers again, so we know we can take it over
if not pid or not _is_pid_in_use(pid) or pid == os.getpid():
available_index = index
break
if available_index is not None:
child_pids[available_index] = os.getpid()
else:
# This is unexpected - it means all of the PIDs are in use so we have a logical error
# or a PID was recycled before we could notice and reclaim its index
pass
def _is_pid_in_use(pid):
try:
os.kill(pid, 0)
return True
except OSError:
return False
I have a lot of tasks (independent of each other, represented by some code in Python) that need to be executed. Their execution time varies. I also have limited resources so at most N tasks can be running at the same time. The goal is to finish executing the whole stack of tasks as fast as possible.
It seems that I am looking for some kind of manager that starts new tasks when the resource gets available and collects finished tasks.
Are there any already-made solutions or should I code it myself?
Are there any caveats that I should keep in mind?
as far as I can tell your main would just become:
def main():
tasks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with multiprocessing.Pool(POOL_SIZE) as pool:
pool.map(sleep, tasks)
i.e. you've just reimplemented a pool, but inefficiently (Pool reuses Processes where possible) and in not as safely, Pool goes to lots of effort to cleanup on exceptions
Here is a simple code snippet that should fit the requirements:
import multiprocessing
import time
POOL_SIZE = 4
STEP = 1
def sleep(seconds: int):
time.sleep(seconds)
def main():
tasks = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
pool = [None] * POOL_SIZE
while tasks or [item for item in pool if item is not None]:
for i in range(len(pool)):
if pool[i] is not None and not pool[i].is_alive():
# Finished task. Clear the resource.
pool[i] = None
if pool[i] is None:
# Free resource. Start new task if any are left.
if tasks:
task = tasks.pop(0)
pool[i] = multiprocessing.Process(target=sleep, args=(task,))
pool[i].start()
time.sleep(STEP)
if __name__ == '__main__':
main()
The manager has a tasks list of arbitrary length, here are tasks for simplicity represented by integers that are being placed as arguments to a sleep function. It also has a pool list, initially empty, representing the available resource.
The manager periodically visits all currently running processes and checks if they are finished or not. It also starts new processes if the resource becomes available. The whole cycle is being repeated until there are no tasks and no currently running processes left. The STEP value is here to save the computing power - you generally don't need to check the running processes every millisecond.
As for the caveats, there are some guidelines that should be kept in mind when using multiprocessing.
I have code that makes unique combinations of elements. There are 6 types, and there are about 100 of each. So there are 100^6 combinations. Each combination has to be calculated, checked for relevance and then either be discarded or saved.
The relevant bit of the code looks like this:
def modconffactory():
for transmitter in totaltransmitterdict.values():
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
Now this takes a long time and that is fine, but now I realize this process (making the configurations and then calculations for later use) is only using 1 of my 8 processor cores at a time.
I've been reading up about multithreading and multiprocessing, but I only see examples of different processes, not how to multithread one process. In my code I call two functions: 'dosomethingwith()' and 'saveforlateruse_if_useful()'. I could make those into separate processes and have those run concurrently to the for-loops, right?
But what about the for-loops themselves? Can I speed up that one process? Because that is where the time consumption is. (<-- This is my main question)
Is there a cheat? for instance compiling to C and then the os multithreads automatically?
I only see examples of different processes, not how to multithread one process
There is multithreading in Python, but it is very ineffective because of GIL (Global Interpreter Lock). So if you want to use all of your processor cores, if you want concurrency, you have no other choice than use multiple processes, which can be done with multiprocessing module (well, you also could use another language without such problems)
Approximate example of multiprocessing usage for your case:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(generator, step, offset, conn):
"""
Function to be invoked by every worker process.
generator: iterable object, the very top one of all you are iterating over,
in your case, totalrecieverdict.values()
We are passing a whole iterable object to every worker, they all will iterate
over it. To ensure they will not waste time by doing the same things
concurrently, we will assume this: each worker will process only each stepTH
item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER,
and offset must be a unique number for each worker, varying from 0 to
WORKERS_NUMBER - 1
conn: a multiprocessing.Connection object, allowing the worker to communicate
with the main process
"""
for i, transmitter in enumerate(generator):
if i % step == offset:
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
conn.send('done')
def modconffactory():
"""
Function to launch all the worker processes and wait until they all complete
their tasks
"""
processes = []
generator = totaltransmitterdict.values()
for i in range(WORKERS_NUMBER):
conn, childConn = multiprocessing.Pipe()
process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
process.start()
processes.append((process, conn))
# Here we have created, started and saved to a list all the worker processes
working = True
finishedProcessesNumber = 0
try:
while working:
for process, conn in processes:
if conn.poll(): # Check if any messages have arrived from a worker
message = conn.recv()
if message == 'done':
finishedProcessesNumber += 1
if finishedProcessesNumber == WORKERS_NUMBER:
working = False
except KeyboardInterrupt:
print('Aborted')
You can adjust WORKERS_NUMBER to your needs.
Same with multiprocessing.Pool:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(transmitter):
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
def modconffactory():
pool = multiprocessing.Pool(WORKERS_NUMBER)
pool.map(modconffactoryProcess, totaltransmitterdict.values())
You probably would like to use .map_async instead of .map
Both snippets do the same, but I would say in the first one you have more control over the program.
I suppose the second one is the easiest, though :)
But the first one should give you the idea of what is happening in the second one
multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html
you can run your function in this way:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers