torch.multiprocessing.Queue yields no speedup

torch.multiprocessing.Queue yields no speedup - python

My training system consists of a bunch of processes that exchange data in the form of tensors, or list/dictionaries of tensors. Memory sharing via the torch.multiprocessing module is a known technique to speedup similar workflows. Yet for some reason it does not help me with my app.
Here's a test script that emulates a system, we create a process and send tensors via a queue:
import sys
import time
import torch
from torch.multiprocessing import Process as TorchProcess
from torch.multiprocessing import Queue as TorchQueue
q = TorchQueue()
def torch_shared_mem_process():
counter = 0
while True:
data = q.get()
counter += 1
if data is None:
return
print('Received data:', len(data), data, counter)
def test_mem_share(share_memory):
p = TorchProcess(target=torch_shared_mem_process)
p.start()
def sample_data():
return torch.rand([1000, 128, 72, 3], dtype=torch.float)
start = time.time()
n = 50
for i in range(n):
data = sample_data()
for data_item in data:
if share_memory:
data_item.share_memory_()
q.put(data)
print(f'Progress {i}/{n}')
q.put(None)
p.join()
print(f'Finished sending {n} tensor lists!')
took_seconds = time.time() - start
return took_seconds
def main():
no_shared_memory = test_mem_share(share_memory=False)
with_shared_memory = test_mem_share(share_memory=True)
print(f'Took {no_shared_memory:.1f} s without shared memory.')
print(f'Took {with_shared_memory:.1f} s with shared memory.')
if __name__ == '__main__':
sys.exit(main())
Since I am using torch.multiprocessing I expect version with share_memory=True to be faster, but in reality, it is actually marginally slower:
Took 10.2 s without shared memory.
Took 11.7 s with shared memory.
Did I misunderstand the way torch.multiprocessing.Queue works?

I believe torch.multiprocessing.Queue already moves tensors to shared memory when transporting them, so data_item.share_memory_() shouldn't speed things up any further.

Have you solved this issue? I modified your code a little and there is nothing change (sharing time) when switching between torch.multiprocessing and multiprocessing
import sys
import time
import torch
from torch.multiprocessing import Process
from torch.multiprocessing import Queue
# from multiprocessing import Process
# from multiprocessing import Queue
def torch_shared_mem_process(q):
while True:
data = q.get()
if data is None:
return
print('Received data:', len(data), data)
def test_mem_share():
q = Queue()
p = Process(target=torch_shared_mem_process, args=(q,))
p.start()
def sample_data():
return torch.zeros([100, 3, 1080, 1920], dtype=torch.float)
data = sample_data()
start = time.time()
q.put(data)
q.put(None)
p.join()
print(f'Finished sending tensor!')
took_seconds = time.time() - start
return took_seconds
def main():
shared_memory = test_mem_share()
print(f'Took {shared_memory:.1f} s with shared memory.')
if __name__ == '__main__':
sys.exit(main())

Related

Is this the most I can get from Python multiprocess?

I have data, which is in a text file. Each line is a computation to do. This file has around 100 000 000 lines.
First I load everything into the ram, then I have a a method that performs the computation and gives the following results:
def process(data_line):
#do computation
return result
Then I call it like this with packets of 2000 lines and then save the result to disk :
POOL_SIZE = 15 #nbcore - 1
PACKET_SIZE = 2000
pool = Pool(processes=POOL_SIZE)
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets = int(number_of_lines/ PACKET_SIZE)
for i in range(number_of_packets):
lines_packet = data_lines[:PACKET_SIZE]
data_lines = data_lines[PACKET_SIZE:]
results = pool.map(process, lines_packet)
save_computed_data_to_disk(to_be_computed_filename, results)
# process the last packet, which is smaller
results.extend(pool.map(process, data_lines))
save_computed_data_to_disk(to_be_computed_filename, results)
print("Done")
The problem is, while I was writing to disk, my CPU is computing nothing and has 8 cores. It is looking at the task manager and it seems that quite a lot of CPU time is lost.
I have to write to disk after having completed my computation because the results are 1000 times larger than the input.
Anyways, I would have to write to the disk at some point. If time is not lost here, it will be lost later.
What could I do to allow one core to write to disk, while still computing with the others? Switch to C?
At this rate I can process 100 millions lines in 75h, but I have 12 billions lines to process, so any improvement is welcome.
example of timings:
Processing packet 2/15 953 of C:/processing/drop_zone\to_be_processed_txt_files\t_to_compute_303620.txt
Launching task and waiting for it to finish...
Task completed, Continuing
Packet was processed in 11.534576654434204 seconds
We are currently going at a rate of 0.002306915330886841 sec/words
Which is 433.47928145051293 words per seconds
Saving in temporary file
Printing writing 5000 computed line to disk took 0.04400920867919922 seconds
saving word to resume from : 06 20 25 00 00
Estimated time for processing the remaining packets is : 51:19:25

Note: This SharedMemory works only for Python >= 3.8 since it first appeared there
Start 3 kinds of processes: Reader, Processor(s), Writer.
Have Reader process read the file incrementally, sharing the result via shared_memory and Queue.
Have the Processor(s) consume the Queue, consume the shared_memory, and return the result(s) via another Queue. Again, as shared_memory.
Have the Writer process consume the second Queue, writing to the destination file.
Have them all communicate through, say, some Events or DictProxy, with the MainProcess who will act as the orchestrator.
Example:
import time
import random
import hashlib
import multiprocessing as MP
from queue import Queue, Empty
# noinspection PyCompatibility
from multiprocessing.shared_memory import SharedMemory
from typing import Dict, List
def readerfunc(
shm_arr: List[SharedMemory], q_out: Queue, procr_ready: Dict[str, bool]
):
numshm = len(shm_arr)
for batch in range(1, 6):
print(f"Reading batch #{batch}")
for shm in shm_arr:
#### Simulated Reading ####
for j in range(0, shm.size):
shm.buf[j] = random.randint(0, 255)
#### ####
q_out.put((batch, shm))
# Need to sync here because we're reusing the same SharedMemory,
# so gotta wait until all processors are done before sending the
# next batch
while not q_out.empty() or not all(procr_ready.values()):
time.sleep(1.0)
def processorfunc(
q_in: Queue, q_out: Queue, suicide: type(MP.Event()), procr_ready: Dict[str, bool]
):
pname = MP.current_process().name
procr_ready[pname] = False
while True:
time.sleep(1.0)
procr_ready[pname] = True
if q_in.empty() and suicide.is_set():
break
try:
batch, shm = q_in.get_nowait()
except Empty:
continue
print(pname, "got batch", batch)
procr_ready[pname] = False
#### Simulated Processing ####
h = hashlib.blake2b(shm.buf, digest_size=4, person=b"processor")
time.sleep(random.uniform(5.0, 7.0))
#### ####
q_out.put((pname, h.hexdigest()))
def writerfunc(q_in: Queue, suicide: type(MP.Event())):
while True:
time.sleep(1.0)
if q_in.empty() and suicide.is_set():
break
try:
pname, digest = q_in.get_nowait()
except Empty:
continue
print("Writing", pname, digest)
#### Simulated Writing ####
time.sleep(random.uniform(3.0, 6.0))
#### ####
print("Writing", pname, digest, "done")
def main():
shm_arr = [
SharedMemory(create=True, size=1024)
for _ in range(0, 5)
]
q_read = MP.Queue()
q_write = MP.Queue()
procr_ready = MP.Manager().dict()
poison = MP.Event()
poison.clear()
reader = MP.Process(target=readerfunc, args=(shm_arr, q_read, procr_ready))
procrs = []
for n in range(0, 3):
p = MP.Process(
target=processorfunc, name=f"Proc{n}", args=(q_read, q_write, poison, procr_ready)
)
procrs.append(p)
writer = MP.Process(target=writerfunc, args=(q_write, poison))
reader.start()
[p.start() for p in procrs]
writer.start()
reader.join()
print("Reader has ended")
while not all(procr_ready.values()):
time.sleep(5.0)
poison.set()
[p.join() for p in procrs]
print("Processors have ended")
writer.join()
print("Writer has ended")
[shm.close() for shm in shm_arr]
[shm.unlink() for shm in shm_arr]
if __name__ == '__main__':
main()

You say you have 8 cores, yet you have:
POOL_SIZE = 15 #nbcore - 1
Assuming you want to leave one processor free (presumably for the main process?) why wouldn't this number be 7? But why do you even want to read a processor free? You are making successive calls to map. While the main process is waiting for these calls to return, it requires know CPU. This is why if you do not specify a pool size when you instantiate your pool it defaults to the number of CPUs you have and not that number minus one. I will have more to say about this below.
Since you have a very large, in-memory list, is it possible that you are expending waisted cycles in your loop rewriting this list on each iteration of the loop. Instead, you can just take a slice of the list and pass that as the iterable argument to map:
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets, remainder = divmod(number_of_lines, PACKET_SIZE)
with Pool(processes=POOL_SIZE) as pool:
offset = 0
for i in range(number_of_packets):
results = pool.map(process, data_lines[offset:offset+PACKET_SIZE])
offset += PACKET_SIZE
save_computed_data_to_disk(to_be_computed_filename, results)
if remainder:
results = pool.map(process, data_lines[offset:offset+remainder])
save_computed_data_to_disk(to_be_computed_filename, results)
print("Done")
Between each call to map the main process is writing out the results to to_be_computed_filename. In the meanwhile, every process in your pool is sitting idle. This should be given to another process (actually a thread running under the main process):
import multiprocessing
import queue
import threading
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
data_lines = util.load_data_lines(to_be_computed_filename)
number_of_packets, remainder = divmod(number_of_lines, PACKET_SIZE)
def save_data(q):
while True:
results = q.get()
if results is None:
return # signal to terminate
save_computed_data_to_disk(to_be_computed_filename, results)
q = queue.Queue()
t = threading.Thread(target=save_data, args=(q,))
t.start()
with Pool(processes=POOL_SIZE) as pool:
offset = 0
for i in range(number_of_packets):
results = pool.map(process, data_lines[offset:offset+PACKET_SIZE])
offset += PACKET_SIZE
q.put(results)
if remainder:
results = pool.map(process, data_lines[offset:offset+remainder])
q.put(results)
q.put(None)
t.join() # wait for thread to terminate
print("Done")
I've chosen to run save_data in a thread of the main process. This could also be another process in which case you would need to use a multiprocessing.Queue instance. But I figured the main process thread is mostly waiting for the map to complete and there would not be competition for the GIL. Now if you do not leave a processor free for the threading job, save_data, it may end up doing most of the saving only after all of the results have been created. You would need to experiment a bit with this.
Ideally, I would also modify the reading of the input file so as to not have to first read it all into memory but rather read it line by line yielding 2000 line chunks and submitting those as jobs for map to process:
import multiprocessing
import queue
import threading
POOL_SIZE = 15 # ????
PACKET_SIZE = 2000
def save_data(q):
while True:
results = q.get()
if results is None:
return # signal to terminate
save_computed_data_to_disk(to_be_computed_filename, results)
def read_data():
"""
yield lists of PACKET_SIZE
"""
lines = []
with open(some_file, 'r') as f:
for line in iter(f.readline(), ''):
lines.append(line)
if len(lines) == PACKET_SIZE:
yield lines
lines = []
if lines:
yield lines
q = queue.Queue()
t = threading.Thread(target=save_data, args=(q,))
t.start()
with Pool(processes=POOL_SIZE) as pool:
for l in read_data():
results = pool.map(process, l)
q.put(results)
q.put(None)
t.join() # wait for thread to terminate
print("Done")

I made two assumptions: The writing is hitting the I/O bound, not the CPU bound - meaning that throwing more cores onto writing would not improve the performance. And the process function contains some heavy computations.
I would approach it differently:
Split up the large list into a list of list
Feed it than into the processes
Store the total result
Here is the example code:
import multiprocessing as mp
data_lines = [0]*10000 # read it from file
size = 2000
# Split the list into a list of list (with chunksize `size`)
work = [data_lines[i:i + size] for i in range(0, len(data_lines), size)]
def process(data):
result = len(data) # some something fancy
return result
with mp.Pool() as p:
result = p.map(process, work)
save_computed_data_to_disk(file_name, result)
On meta: You may also have a look into numpy or pandas (depending on the data) because it sounds that you would like to do something into that direction.

The first thing that comes to mind for the code is to run the saving function in the thread. By this we exclude the bottelneck of waiting disk writing. Like so:
executor = ThreadPoolExecutor(max_workers=2)
future = executor.submit(save_computed_data_to_disk, to_be_computed_filename, results)
saving_futures.append(future)
...
concurrent.futures.wait(saving_futures, return_when=ALL_COMPLETED) # wait all saved to disk after processing
print("Done")

Process pool results without waiting for all tasks to finish

from multiprocessing import Pool
from functools import partial
from time import sleep
import random
import string
import uuid
import os
import glob
def task_a(param1, param2, mydata):
thread_id = str(uuid.uuid4().hex) # this may not be robust enough to guarantee no collisions, address
output_filename = ''.join([str(thread_id),'.txt'])
# part 1 - create output file for task_b to use
with open(output_filename, 'w') as outfile:
for line in mydata:
outfile.write(line)
# part 2 - do some extra stuff (whilst task_b is running)
sleep(5)
print('Task A finished')
return output_filename # not interested in return val
def task_b(expected_num_files):
processed_files = 0
while processed_files<expected_num_files:
print('I am task_b, waiting for {} files ({} so far)'.format(expected_num_files, processed_files))
path_to_search = ''
for filename in glob.iglob(path_to_search + '*.txt', recursive=True):
print('Got file : {}'.format(filename))
# would do something complicated here
os.rename(filename, filename+'.done')
processed_files+=1
sleep(10)
if __name__ == '__main__':
param1 = '' # dummy variable, need to support in solution
param2 = '' # dummy variable, need to support in solution
num_workers = 2
full_data = [[random.choice(string.ascii_lowercase) for _ in range(5)] for _ in range(100)]
print(full_data)
for i in range(0, len(full_data), num_workers):
print('Going to process {}'.format(full_data[i:i+num_workers]))
p = Pool(num_workers)
task_a_func = partial(task_a, param1, param2)
results = p.map(task_a_func, full_data[i:i+num_workers])
p.close()
p.join()
task_b(expected_num_files=num_workers) # want this running sooner
print('Iteration {} complete'.format(i))
#want to wait for task_a's and task_b to finish
I'm having trouble scheduling these tasks to run concurrently.
task_a is a multiprocessing pool that produces an output file part way through it execution.
task_b MUST process the output files sequentially can be in any order (can be as soon as they are available), WHILST task_a continues to run (it will no longer change the output file)
The next iteration must only start when both all task_a's have completed AND task_b has completed.
The toy code I have posted obviously waits for task_a's to fully complete before task_b is started (which is not what I want)
I have looked at multiprocessing / subprocess etc. but cannot find a way to launch both the pool and the single task_b process concurrently AND wait for BOTH to finish.
task_b is written as if it could be changed to an external script, but I am still stuck on how manage the execution.
Should I effectively merge code from task_b into task_a and somehow pass a flag to ensure one worker per pool 'runs the task_b code' via a if/else - at least then I would just be waiting on the pool to complete?

You can use an interprocess queue to communicate the filenames between task a and task b.
Also, initializing pool repeatedly inside the loop is harmful and unnecessarily slow.
Its better to initialize the pool once in the beginning.
from multiprocessing import Pool, Manager, Event
from functools import partial
from time import sleep
import random
import string
import uuid
import os
import glob
def task_a(param1, param2, queue, mydata):
thread_id = str(uuid.uuid4().hex)
output_filename = ''.join([str(thread_id),'.txt'])
output_filename = 'data/' + output_filename
with open(output_filename, 'w') as outfile:
for line in mydata:
outfile.write(line)
print(f'{thread_id}: Task A file write complete for data {mydata}')
queue.put(output_filename)
print('Task A finished')
def task_b(queue, num_workers, data_size, event_task_b_done):
print('Task b started!')
processed_files = 0
while True:
filename = queue.get()
if filename == 'QUIT':
# Whenever you want task_b to quit, just push 'quit' to the queue
print('Task b quitting')
break
print('Got file : {}'.format(filename))
os.rename(filename, filename+'.done')
processed_files+=1
print(f'Have processed {processed_files} so far!')
if (processed_files % num_workers == 0) or (processed_files == data_size):
event_task_b_done.set()
if __name__ == '__main__':
param1 = '' # dummy variable, need to support in solution
param2 = '' # dummy variable, need to support in solution
num_workers = 2
data_size = 100
full_data = [[random.choice(string.ascii_lowercase) for _ in range(5)] for _ in range(data_size)]
mgr = Manager()
queue = mgr.Queue()
event_task_b_done = mgr.Event()
# One extra worker for task b
p = Pool(num_workers + 1)
p.apply_async(task_b, args=(queue, num_workers, data_size, event_task_b_done))
task_a_func = partial(task_a, param1, param2, queue)
for i in range(0, len(full_data), num_workers):
data = full_data[i:i+num_workers]
print('Going to process {}'.format(data))
p.map_async(task_a_func, full_data[i:i+num_workers])
print(f'Waiting for task b to process all {num_workers} files...')
event_task_b_done.wait()
event_task_b_done.clear()
print('Iteration {} complete'.format(i))
queue.put('QUIT')
p.close()
p.join()
exit(0)

Threading queue hangs in Python

I am trying to make a parser multi-threaded via a Queue. It seems to work, but my Queue is hanging. I'd appreciate if someone could tell me how to fix this, since I have rarely written multi-threaded code.
This code reads from the Q:
from silk import *
import json
import datetime
import pandas
import Queue
from threading import Thread
l = []
q = Queue.Queue()
def parse_record():
d = {}
while not q.empty():
rec = q.get()
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
l.append(d) # l is global
And this fills the Q:
def parse_records():
ffile = '/tmp/query.rwf'
flows = SilkFile(ffile, READ)
numthreads = 2
# fill queue
for rec in flows:
q.put(rec)
# work on Queue
for i in range(numthreads):
t = Thread(target = parse_record)
t.daemon = True
t.start()
# blocking
q.join()
# never reached
data_df = pandas.DataFrame.from_records(l)
return data_df
I only call parse_records() in my main. It never terminates.

The Queue.empty doc says:
...if empty() returns False it doesn’t guarantee that a subsequent call to get() will not block.
As a minimum you should use get_nowait or risk data loss. But more importantly, the join will only release when all of the queued items have been marked complete with a Queue.task_done call:
If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).
As a side note, l.append(d) is not atomic and should be protected with a lock.
from silk import *
import json
import datetime
import pandas
import Queue
from threading import Thread, Lock
l = []
l_lock = Lock()
q = Queue.Queue()
def parse_record():
d = {}
while 1:
try:
rec = q.getnowait()
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
with l_lock():
l.append(d) # l is global
q.task_done()
except Queue.Empty:
return
You could shorten your code considerably by using a thread pool from the standard libs.
from silk import *
import json
import datetime
import pandas
import multiprocessing.pool
def parse_record(rec):
d = {}
d['timestamp'] = rec.stime.strftime("%Y-%m-%d %H:%M:%S")
# ... many ops like this
d['dport'] = rec.dport
return d
def parse_records():
ffile = '/tmp/query.rwf'
flows = SilkFile(ffile, READ)
pool = multiprocessing.pool.Pool(2)
data_df = pandas.DataFrame.from_records(pool.map(parse_record), flows)
pool.close()
return data_df

Producer consumer in python locks on get

I'm struggling to make a producer consumer queue in Python3. I can't get my consumer to wake up:
from multiprocessing import Process, Queue
import time
def consumer(q):
while(True):
data=q.get()
if (data[0]==False):
print("Killing")
return
print((data[1]))
time.sleep(1)
maxitems=3
q = Queue(maxitems)
p = Process(target=consumer, args=(q,))
p.start()
for idx in range(0,10):
q.put((True,idx))
#Where idx would normally be a chunk of data
p.put((False,False))
p.join()
Output:
0
then it locks...
How do I get the consumer thread to wake up when I push data to it?
Launch:
python3.3 tryit.py
Built with:
[ebuild R ] dev-lang/python-3.3.5-r1:3.3::gentoo USE="gdbm ipv6 ncurses readline ssl threads xml -build -doc -examples -hardened -sqlite -tk -wininst" 0 KiB

p.put((False,False)) is wrong and some non-idiomatic Python, otherwise it's fine.
from multiprocessing import Process, Queue
import time
def consumer(q):
while True:
data=q.get()
if data[0]==False:
print("Killing")
break
print(data[1])
time.sleep(1)
maxitems=3
q = Queue(maxitems)
p = Process(target=consumer, args=(q,))
p.start()
for idx in range(0,10):
q.put((True,idx))
#Where idx would normally be a chunk of data
q.put((False,False))
p.join()

Somehow this needs to run from main
from multiprocessing import Process, Queue
import time
def consumer(q):
while(True):
data=q.get()
if (data[0]==False):
print("Killing")
return
print((data[1]))
time.sleep(1)
if __name__ == '__main__':
maxitems=3
q = Queue(maxitems)
p = Process(target=consumer, args=(q,))
p.start()
for idx in range(0,10):
q.put((True,idx))
#Where idx would normally be a chunk of data
q.put((False,False))
p.join()

Write data to hdf file using multiprocessing

This seems like a simple issue but I cant get my head around it.
I have a simulation which runs in a double for loop and writes the results to an HDF file. A simple version of this program is shown below:
import tables as pt
a = range(10)
b = range(5)
def Simulation():
hdf = pt.openFile('simulation.h5',mode='w')
for ii in a:
print(ii)
hdf.createGroup('/','A%s'%ii)
for i in b:
hdf.createArray('/A%s'%ii,'B%s'%i,[ii,i])
hdf.close()
return
Simulation()
This code does exactly what I want but since the process can take quite a while to run I tried to use the multiprocessing module and use the following code:
import multiprocessing
import tables as pt
a = range(10)
b = range(5)
def Simulation(ii):
hdf = pt.openFile('simulation.h5',mode='w')
print(ii)
hdf.createGroup('/','A%s'%ii)
for i in b:
hdf.createArray('/A%s'%ii,'B%s'%i,[ii,i])
hdf.close()
return
if __name__ == '__main__':
jobs = []
for ii in a:
p = multiprocessing.Process(target=Simulation, args=(ii,))
jobs.append(p)
p.start()
This however only prints the last simulation to the HDF file, somehow it overrites all the other groups.

Each time you open a file in write (w) mode, a new file is created -- so the contents of the file is lost if it already exists. Only the last file handle can successfully write to the file. Even if you changed that to append mode, you should not try to write to the same file from multiple processes -- the output will get garbled if two processes try to write at the same time.
Instead, have all the worker processes put output in a queue, and have a single dedicated process (either a subprocess or the main process) handle the output from the queue and write to the file:
import multiprocessing as mp
import tables as pt
num_arrays = 100
num_processes = mp.cpu_count()
num_simulations = 1000
sentinel = None
def Simulation(inqueue, output):
for ii in iter(inqueue.get, sentinel):
output.put(('createGroup', ('/', 'A%s' % ii)))
for i in range(num_arrays):
output.put(('createArray', ('/A%s' % ii, 'B%s' % i, [ii, i])))
def handle_output(output):
hdf = pt.openFile('simulation.h5', mode='w')
while True:
args = output.get()
if args:
method, args = args
getattr(hdf, method)(*args)
else:
break
hdf.close()
if __name__ == '__main__':
output = mp.Queue()
inqueue = mp.Queue()
jobs = []
proc = mp.Process(target=handle_output, args=(output, ))
proc.start()
for i in range(num_processes):
p = mp.Process(target=Simulation, args=(inqueue, output))
jobs.append(p)
p.start()
for i in range(num_simulations):
inqueue.put(i)
for i in range(num_processes):
# Send the sentinal to tell Simulation to end
inqueue.put(sentinel)
for p in jobs:
p.join()
output.put(None)
proc.join()
For comparison, here is a version which uses mp.Pool:
import multiprocessing as mp
import tables as pt
num_arrays = 100
num_processes = mp.cpu_count()
num_simulations = 1000
def Simulation(ii):
result = []
result.append(('createGroup', ('/', 'A%s' % ii)))
for i in range(num_arrays):
result.append(('createArray', ('/A%s' % ii, 'B%s' % i, [ii, i])))
return result
def handle_output(result):
hdf = pt.openFile('simulation.h5', mode='a')
for args in result:
method, args = args
getattr(hdf, method)(*args)
hdf.close()
if __name__ == '__main__':
# clear the file
hdf = pt.openFile('simulation.h5', mode='w')
hdf.close()
pool = mp.Pool(num_processes)
for i in range(num_simulations):
pool.apply_async(Simulation, (i, ), callback=handle_output)
pool.close()
pool.join()
It looks simpler doesn't it? However there is one signficant difference. The original code used output.put to send args to handle_output which was running in its own subprocess. handle_output would take args from the output queue and handle them immediately. With the Pool code above, Simulation accumulates a whole bunch of args in result and result is not sent to handle_output until after Simulation returns.
If Simulation takes a long time, there will be a long waiting period while nothing is being written to simulation.h5.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

torch.multiprocessing.Queue yields no speedup - python

I believe torch.multiprocessing.Queue already moves tensors to shared memory when transporting them, so data_item.share_memory_() shouldn't speed things up any further.

Related

Is this the most I can get from Python multiprocess?

Process pool results without waiting for all tasks to finish

Threading queue hangs in Python

Producer consumer in python locks on get

Write data to hdf file using multiprocessing

Categories

Resources