eg.
I have a controller script.
I have a worker script.
I have 50 python objects that have to be passed to the worker script.
I want them to run in parallel.
The worker script has its own parallelisation of some database fetches.
This i achieve by:
p = Pool(processes=NUM_PROCS)
results = p.starmap(db_fetch, db_fetch_arguments)
p.close()
p.join()
Whats the most pythonic way, i can pass my 50 arguments (python objects, not string arguments), into my worker and make it run in parallel, and not have any issues when the worker tries to spawn more child processes.
Thankyou in advance.
Edit 1:
from multiprocessing import Pool
import os
def worker(num:int):
num_list = list(range(num))
# print('worker start')
with Pool() as p:
p.map(printer, num_list)
def printer(num:int):
# print('printer')
print(f"Printing num {num} - child: {os.getpid()} - parent: {os.getppid()}")
if __name__ == '__main__':
with Pool(4) as controller_pool:
controller_pool.map(worker, [1,2,3])
print('here')
Here I am getting the error: AssertionError: daemonic processes are not allowed to have children
Used the ProcessPoolExecutor from concurrent.futures to have as my controller outer pool. Inside I've used normal multiprocessing.Pool
Thanks.
from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor
import os
def worker(num:int):
num_list = list(range(num))
# print('worker start')
with Pool() as p:
p.map(printer, num_list)
def printer(num:int):
# print('printer')
print(f"Printing num {num} - child: {os.getpid()} - parent: {os.getppid()}")
if __name__ == '__main__':
with ProcessPoolExecutor(4) as controller_pool:
controller_pool.map(worker, [1,2,3])
print('here')
Related
I'd like to run multiple processes concurrently, but using Process I cannot limit the number of processes at a time, so that my computer becomes unusable for anything else.
In my problem I have to run the main_function for all of the data in my_dataset. Here is a short sample of my code, is it possible to limit the number of processes at a time?
from multiprocessing import Process
def my_function(my_dataset):
processes = []
for data in my_dataset:
transformed_data = transform(data)
p = Process(target=main_function, args=(data, transformed_data))
p.start()
processes.append(p)
for p in processes:
p.join()
You can utilize the multiprocessing's Pool
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
from multiprocessing import Pool
names = ["Joe", "James", "Jimmy"] * 10
def print_name(name):
print(f"Got Name: {name}")
def runner():
p = Pool(4)
p.map(print_name, names)
if __name__== "__main__":
runner()
In the python docs, it says that starmap blocks until the result is ready.
Does this mean that we can safely update a variable in main process by the results of child processes like this ?
from multiprocessing import Pool, cpu_count
from multiprocessing import Process, Manager
all_files = list(range(100))
def create_one_training_row(num):
return num
def process():
all_result = []
with Pool(processes=cpu_count()) as pool:
for item in pool.starmap(create_one_training_row, zip(all_files)):
all_result.append(item)
return all_result
if __name__ == '__main__':
ans = process()
print(ans)
print(sum(ans))
I know the basic usage of multiprocessing about pools,and I use apply_async() func to avoid block,my problem code such like:
from multiprocessing import Pool, Queue
import time
q = Queue(maxsize=20)
script = "my_path/my_exec_file"
def initQueue():
...
def test_func(queue):
print 'Coming'
While True:
do_sth
...
if __name__ == '__main__':
initQueue()
pool = Pool(processes=3)
for i in xrange(11,20):
result = pool.apply_async(test_func, (q,))
pool.close()
while True:
if q.empty():
print 'Queue is emty,quit'
break
print 'Main Process Lintening'
time.sleep(2)
The results output are always Main Process Linstening,I can;t find word 'Coming'..
The code above has no syntax error and no any Exceptions.
Any one can help, thanks!
In the below program I have posted 5 jobs to the queue, but have created only 3 threads. When I run the program, only 3 jobs are completed. How am I supposed to complete all 5 jobs with only 3 threads? Is there a way to the make a thread that has completed its job take the next job?
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
print "Going to sleep.."
time.sleep(self.que.get())
print "Slept .."
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
q.join()
print "done!!"
You need to have your worker threads run in a loop. You can use a sentinel value (like None or custom class) to tell the workers to shut down after you've put all your actual worked items in the queue:
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
for item in iter(self.que.get, None): # This will call self.que.get() until None is returned, at which point the loop will break.
print "Going to sleep.."
time.sleep(item)
print "Slept .."
self.que.task_done()
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
for i in range(3): # Shut down all the workers
q.put(None)
q.join()
print "done!!"
Another option would be to use a multiprocessing.dummy.Pool, which is a thread pool that Python manages for you:
import time
from multiprocessing.dummy import Pool
def run(i):
print "Going to sleep..."
time.sleep(i)
print "Slept .."
p = Pool(3) # 3 threads in the pool
p.map(run, range(5)) # Calls run(i) for each element i in range(5)
p.close()
p.join()
print "done!!"
I am trying to emulate a scenario where the child spawned by python multiprocessing pool gets killed. The subprocess never returns, but I would like the parent to get notified in such a scenario.The test code I am using is:
import multiprocessing as mp
import time
import os
result_map = {}
def foo_pool(x):
print x,' : ',os.getpid()
pid = os.getpid()
if x == 1:
os.kill(pid,9)
return x
result_list = []
def log_result(result):
print 'callback',result
def apply_async_with_callback():
print os.getpid()
pool = mp.Pool()
for i in range(2):
result_map[i] = pool.apply_async(foo_pool, args = (i, ), callback = log_result)
pool.close()
pool.join()
for k,v in result_map.iteritems():
print k,' : ',v.successful()
print(result_list)
if __name__ == '__main__':
apply_async_with_callback()
The Pool of processes does not expose any mechanism to notify the parent process about a child process termination.
Projects like billiard or pebble might do what you're looking for.
Keep in mind that there's no way to intercept a SIGKILL signal so using signal handlers is pointless.