I am looking for some good example code of MultiProcessing in Python that would take in a large array (broken into different sections of the same main array) to speed up the processing of the subsequent Output file. I am noticing that there are other things like Lock() functions to make sure it comes back in a certain order but not a good example of how to get the resulting arrays back out when the jobs are run so I can output a single CSV file in the correct time series order.
Below is what I have been working with so far with the queue. How can one assign the results of q1.get() or others to recombine later? It just spins when I try assigning it with temp = q1.get()... And good examples of splitting out an array, sending it to multiple processes, then recombining the results of the function(s) called would be appreciated. I am using Python 3.7 and Windows 10.
import time
import multiprocessing
from multiprocessing import Process, Queue
def f1(q, testArray):
testArray2 = [[41, None, 'help'], [42, None, 'help'], [43, None, 'help']]
testArray = testArray + testArray2
q.put(testArray)
def f2(q, testArray):
#testArray.append([43, None, 'goodbye'])
testArray = testArray + ([44, None, 'goodbye'])
q.put(testArray)
return testArray
if __name__ == '__main__':
print("Number of cpu : ", multiprocessing.cpu_count())
testArray1 = [1]
testArray2 = [2]
q1 = Queue()
q2 = Queue()
p1 = multiprocessing.Process(target=f1, args=(q1, testArray1,))
p2 = multiprocessing.Process(target=f2, args=(q2, testArray2,))
p1.start()
p2.start()
print(q1.get()) # prints whatever you set in function above
print(q2.get()) # prints whatever you set in function above
print(testArray1)
print(testArray2)
p1.join()
p2.join()
I believe you only need one queue for all of your processes. The queue is designed for inter-process communication.
For the ordering you can pass in a process id and sort based on that after the results are joined. Or you can try and use multiprocessing pool as furas suggests.
Which sounds like a better approach. Worker pools in general allocate a pool of workers up front then run a set of jobs on the pool. This is more efficient because the processes / threads are set up initially and reused for jobs. Where your implementation is going the process is created per job / function which is costly depending on how much data you're crunching.
Related
So I have two webscrapers that collect data from two different sources. I am running them both simultaneously to collect a specific piece of data (e.g. covid numbers).
When one of the functions finds data I want to use that data without waiting for the other one to finish.
So far I tried the multiprocessing - pool module and to return the results with get() but by definition I have to wait for both get() to finish before I can continue with my code. My goal is to have the code as simple and as short as possible.
My webscraper functions can be run with arguments and return a result if found. It is also possible to modify them.
The code I have so far which waits for both get() to finish.
from multiprocessing import Pool
from scraper1 import main_1
from scraper2 import main_2
from twitter import post_tweet
if __name__ == '__main__':
with Pool(processes=2) as pool:
r1 = pool.apply_async(main_1, ('www.website1.com','June'))
r2 = pool.apply_async(main_2, ())
data = r1.get()
data2 = r2.get()
post_tweet("New data is {}".format(data))
post_tweet("New data is {}".format(data2))
From here I have seen that threading might be a better option since webscraping involves a lot of waiting and only little parsing but I am not sure how I would implement this.
I think the solution is fairly easy but I have been searching and trying different things all day without much success so I think I will just ask here. (I only started programming 2 months ago)
As always there are many ways to accomplish this task.
you have already mentioned using a Queue:
from multiprocessing import Process, Queue
from scraper1 import main_1
from scraper2 import main_2
def simple_worker(target, args, ret_q):
ret_q.put(target(*args)) # mp.Queue has it's own mutex so we don't need to worry about concurrent read/write
if __name__ == "__main__":
q = Queue()
p1 = Process(target=simple_worker, args=(main_1, ('www.website1.com','June'), q))
p2 = Process(target=simple_worker, args=(main_2, ('www.website2.com','July'), q))
p1.start()
p2.start()
first_result = q.get()
do_stuff(first_result)
#don't forget to get() the second result before you quit. It's not a good idea to
#leave things in a Queue and just assume it will be properly cleaned up at exit.
second_result = q.get()
p1.join()
p2.join()
You could also still use a Pool by using imap_unordered and just taking the first result:
from multiprocessing import Pool
from scraper1 import main_1
from scraper2 import main_2
def simple_worker2(args):
target, arglist = args #unpack args
return target(*arglist)
if __name__ == "__main__":
tasks = ((main_1, ('www.website1.com','June')),
(main_2, ('www.website2.com','July')))
with Pool() as p: #Pool context manager handles worker cleanup (your target function may however be interrupted at any point if the pool exits before a task is complete
for result in p.imap_unordered(simple_worker2, tasks, chunksize=1):
do_stuff(result)
break #don't bother with further results
I've seen people use queues in such cases: create one and pass it to both parsers so that they put their results in queue instead of returning them. Then do a blocking pop on the queue to retrieve the first available result.
I have seen that threading might be a better option
Almost true but not quite. I'd say that asyncio and async-based libraries is much better than both threading and multiprocessing when we're talking about code with a lot of blocking I/O. If it's applicable in your case, I'd recommend rewriting both your parsers in async.
I am trying to use multiprocessing for the first time and having some fairly basic issues. I have a toy example below, where two processes are adding data to a list:
def add_process(all_nums_class, numbers_to_add):
for number in numbers_to_add:
all_nums_class.all_nums_list.append(number)
class AllNumsClass:
def __init__(self):
self.all_nums_list = []
all_nums_class = AllNumsClass()
p1 = Process(target=add_process, args=(all_nums_class, [1,3,5]))
p1.start()
p2 = Process(target=add_process, args=(all_nums_class, [2,4,6]))
p2.start()
all_nums_class.all_nums_list
I'd like to have the all_nums_class shared between these processes so that they can both add to its all_nums_list - so the result should be
[1,2,3,4,5,6]
instead of what I'm currently getting which is just good old
[]
Could anybody please advise? I have played around with namespace a bit but I haven't yet made it work here.
I feel I'd better mention (in case it makes a difference) that I'm doing this on Jupyter notebook.
You can either use a multiprocessing Queue or a Pipe to share data between processes. Queues are both thread and process safe. You will have to be more careful when using a Pipe as the data in a pipe may become corrupted if two processes (or threads) try to read from or write to the same end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time.
Currently, your implementation spawns two separate processes each with its own self.all_nums_list. So you're essentially spawning three objects of AllNumsClass: One in your main program, one in p1, and one in p2. Since processes are independent and don't share the same memory space, they are appending correctly but its appending to its own self.all_nums_list for each process. That's why when you print all_nums_class.all_nums_list in your main program, you're printing the main processes' self.all_nums_list which is a empty list. To share the data and have the processes append to the same list, I would recommend using a Queue.
Example using Queue and Process
import multiprocessing as mp
def add_process(queue, numbers_to_add):
for number in numbers_to_add:
queue.put(number)
class AllNumsClass:
def __init__(self):
self.queue = mp.Queue()
def get_queue(self):
return self.queue
if __name__ == '__main__':
all_nums_class = AllNumsClass()
processes = []
p1 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [1,3,5]))
p2 = mp.Process(target=add_process, args=(all_nums_class.get_queue(), [2,4,6]))
processes.append(p1)
processes.append(p2)
for p in processes:
p.start()
for p in processes:
p.join()
output = []
while all_nums_class.get_queue().qsize() > 0:
output.append(all_nums_class.get_queue().get())
print(output)
This implementation is asynchronous as it does not apply in sequential order. Every time you run it, you may get different outputs.
Example outputs
[1, 2, 3, 5, 4, 6]
[1, 3, 5, 2, 4, 6]
[2, 4, 6, 1, 3, 5]
[2, 1, 4, 3, 5, 6]
A simpler way to maintain an ordered or unordered list of results is to use the mp.Pool class. Specifically, the Pool.apply and the Pool.apply_async functions. Pool.apply will lock the main program until all processes are finished, which is quite useful if we want to obtain results in a particular order for certain applications. In contrast, Pool.apply_async will submit all processes at once and retrieve the results as soon as they are finished. An additional difference is that we need to use the get method after the Pool.apply_async call in order to obtain the return values of the finished processes.
I am trying to implement an online recursive parallel algorithm, which is highly parallelizable. My problem is that my python implementation does not work as I want. I have two 2D matrices where I want to update recursively every column every time a new observation is observed at time-step t.
My parallel code is like this
def apply_async(t):
worker = mp.Pool(processes = 4)
for i in range(4):
X[:,i,np.newaxis], b[:,i,np.newaxis] = worker.apply_async(OULtraining, args=(train[t,i], X[:,i,np.newaxis], b[:,i,np.newaxis])).get()
worker.close()
worker.join()
for t in range(p,T):
count = 0
for l in range(p):
for k in range(4):
gn[count]=train[t-l-1,k]
count+=1
G = G*v + gn # gn.T
Gt = (1/(t-p+1))*G
if __name__ == '__main__':
apply_async(t)
The two matrices are X and b. I want to replace directly on master's memory as each process updates recursively only one specific column of the matrices.
Why this implementation is slower than the sequential?
Is there any way to resume the process every time-step rather than killing them and create them again? Could this be the reason it is slower?
The reason is, your program is in practice sequential. This is an example code snippet that is from parallelism standpoint identical to yours:
from multiprocessing import Pool
from time import sleep
def gwork( qq):
print (qq)
sleep(1)
return 42
p = Pool(processes=4)
for q in range(1, 10):
p.apply_async(gwork, args=(q,)).get()
p.close()
p.join()
Run this and you shall notice numbers 1-9 appearing exactly once in a second. Why is this? The reason is your .get(). This means every call to apply_async will in practice block in get() until a result is available. It will submit one task, wait a second emulating processing delay, then return the result, after which another task is submitted to your pool. This means there is no parallel execution ongoing at all.
Try replacing the pool management part with this:
results = []
for q in range(1, 10):
res = p.apply_async(gwork, args=(q,))
results.append(res)
p.close()
p.join()
for r in results:
print (r.get())
You can now see parallelism at work, as four of your tasks are now processed simultaneously. Your loop does not block in get, as get is moved out of the loop and results are received only when they are ready.
NB: If your arguments to your worker or the return values from them are large data structures, you will lose some performance. In practice Python implements these as queues, and transmitting a lot of data via a queue is slow on relative terms compared to getting an in-memory copy of a data structure when a subprocess is forked.
I have code that makes unique combinations of elements. There are 6 types, and there are about 100 of each. So there are 100^6 combinations. Each combination has to be calculated, checked for relevance and then either be discarded or saved.
The relevant bit of the code looks like this:
def modconffactory():
for transmitter in totaltransmitterdict.values():
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
Now this takes a long time and that is fine, but now I realize this process (making the configurations and then calculations for later use) is only using 1 of my 8 processor cores at a time.
I've been reading up about multithreading and multiprocessing, but I only see examples of different processes, not how to multithread one process. In my code I call two functions: 'dosomethingwith()' and 'saveforlateruse_if_useful()'. I could make those into separate processes and have those run concurrently to the for-loops, right?
But what about the for-loops themselves? Can I speed up that one process? Because that is where the time consumption is. (<-- This is my main question)
Is there a cheat? for instance compiling to C and then the os multithreads automatically?
I only see examples of different processes, not how to multithread one process
There is multithreading in Python, but it is very ineffective because of GIL (Global Interpreter Lock). So if you want to use all of your processor cores, if you want concurrency, you have no other choice than use multiple processes, which can be done with multiprocessing module (well, you also could use another language without such problems)
Approximate example of multiprocessing usage for your case:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(generator, step, offset, conn):
"""
Function to be invoked by every worker process.
generator: iterable object, the very top one of all you are iterating over,
in your case, totalrecieverdict.values()
We are passing a whole iterable object to every worker, they all will iterate
over it. To ensure they will not waste time by doing the same things
concurrently, we will assume this: each worker will process only each stepTH
item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER,
and offset must be a unique number for each worker, varying from 0 to
WORKERS_NUMBER - 1
conn: a multiprocessing.Connection object, allowing the worker to communicate
with the main process
"""
for i, transmitter in enumerate(generator):
if i % step == offset:
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
conn.send('done')
def modconffactory():
"""
Function to launch all the worker processes and wait until they all complete
their tasks
"""
processes = []
generator = totaltransmitterdict.values()
for i in range(WORKERS_NUMBER):
conn, childConn = multiprocessing.Pipe()
process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
process.start()
processes.append((process, conn))
# Here we have created, started and saved to a list all the worker processes
working = True
finishedProcessesNumber = 0
try:
while working:
for process, conn in processes:
if conn.poll(): # Check if any messages have arrived from a worker
message = conn.recv()
if message == 'done':
finishedProcessesNumber += 1
if finishedProcessesNumber == WORKERS_NUMBER:
working = False
except KeyboardInterrupt:
print('Aborted')
You can adjust WORKERS_NUMBER to your needs.
Same with multiprocessing.Pool:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(transmitter):
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
def modconffactory():
pool = multiprocessing.Pool(WORKERS_NUMBER)
pool.map(modconffactoryProcess, totaltransmitterdict.values())
You probably would like to use .map_async instead of .map
Both snippets do the same, but I would say in the first one you have more control over the program.
I suppose the second one is the easiest, though :)
But the first one should give you the idea of what is happening in the second one
multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html
you can run your function in this way:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers
I'm trying to run a function with multiprocessing. This is the code:
import multiprocessing as mu
output = []
def f(x):
output.append(x*x)
jobs = []
np = mu.cpu_count()
for n in range(np*500):
p = mu.Process(target=f, args=(n,))
jobs.append(p)
running = []
for i in range(np):
p = jobs.pop()
running.append(p)
p.start()
while jobs != []:
for r in running:
if r.exitcode == 0:
try:
running.remove(r)
p = jobs.pop()
p.start()
running.append(p)
except IndexError:
break
print "Done:"
print output
The output is [], while it should be [1,4,9,...]. Someone sees where i'm making a mistake?
You are using multiprocessing, not threading. So your output list is not shared between the processes.
There are several possible solutions;
Retain most of your program but use a multiprocessing.Queue instead of a list. Let the workers put their results in the queue, and read it from the main program. It will copy data from process to process, so for big chunks of data this will have significant overhead.
You could use shared memory in the form of multiprocessing.Array. This might be the best solution if the processed data is large.
Use a Pool. This takes care of all the process management for you. Just like with a queue, it copies data from process to process. It is probably the easiest to use. IMO this is the best option if the data sent to/from each worker is small.
Use threading so that the output list is shared between threads. Threading in CPython has the restriction that only one thread at a time can be executing Python bytecode, so you might not get as much performance benefit as you'd expect. And unlike the multiprocessing solutions it will not take advantage of multiple cores.
Edit:
Thanks to #Roland Smith to point out.
The main problem is the function f(x). When child process call this, it's unable for them to fine the output variable (since it's not shared).
Edit:
Just as #cdarke said, in multiprocessing you have to carefully control the shared object that child process could access(maybe a lock), and it's pretty complicated and hard to debug.
Personally I suggest to use the Pool.map method for this.
For instance, I assume that you run this code directly, not as a module, then your code would be:
import multiprocessing as mu
def f(x):
return x*x
if __name__ == '__main__':
np = mu.cpu_count()
args = [n for n in range(np*500)]
pool = mu.Pool(processes=np)
result = pool.map(f, args)
pool.close()
pool.join()
print result
but there's something you must know
if you just run this file but not import with module, the if __name__ == '__main__': is important, since python will load this file as a module for other process, if you don't place the function 'f' outside if __name__ == '__main__':, the child process would not be able to find your function 'f'
**Edit:**thanks #Roland Smith point out that we could use tuple
if you have more then one args for the function f, then you might need a tuple to do so, for instance
def f((x,y))
return x*y
args = [(n,1) for n in range(np*500)]
result = pool.map(f, args)
or check here for more detailed discussion