In the python docs, it says that starmap blocks until the result is ready.
Does this mean that we can safely update a variable in main process by the results of child processes like this ?
from multiprocessing import Pool, cpu_count
from multiprocessing import Process, Manager
all_files = list(range(100))
def create_one_training_row(num):
return num
def process():
all_result = []
with Pool(processes=cpu_count()) as pool:
for item in pool.starmap(create_one_training_row, zip(all_files)):
all_result.append(item)
return all_result
if __name__ == '__main__':
ans = process()
print(ans)
print(sum(ans))
Related
eg.
I have a controller script.
I have a worker script.
I have 50 python objects that have to be passed to the worker script.
I want them to run in parallel.
The worker script has its own parallelisation of some database fetches.
This i achieve by:
p = Pool(processes=NUM_PROCS)
results = p.starmap(db_fetch, db_fetch_arguments)
p.close()
p.join()
Whats the most pythonic way, i can pass my 50 arguments (python objects, not string arguments), into my worker and make it run in parallel, and not have any issues when the worker tries to spawn more child processes.
Thankyou in advance.
Edit 1:
from multiprocessing import Pool
import os
def worker(num:int):
num_list = list(range(num))
# print('worker start')
with Pool() as p:
p.map(printer, num_list)
def printer(num:int):
# print('printer')
print(f"Printing num {num} - child: {os.getpid()} - parent: {os.getppid()}")
if __name__ == '__main__':
with Pool(4) as controller_pool:
controller_pool.map(worker, [1,2,3])
print('here')
Here I am getting the error: AssertionError: daemonic processes are not allowed to have children
Used the ProcessPoolExecutor from concurrent.futures to have as my controller outer pool. Inside I've used normal multiprocessing.Pool
Thanks.
from multiprocessing import Pool
from concurrent.futures import ProcessPoolExecutor
import os
def worker(num:int):
num_list = list(range(num))
# print('worker start')
with Pool() as p:
p.map(printer, num_list)
def printer(num:int):
# print('printer')
print(f"Printing num {num} - child: {os.getpid()} - parent: {os.getppid()}")
if __name__ == '__main__':
with ProcessPoolExecutor(4) as controller_pool:
controller_pool.map(worker, [1,2,3])
print('here')
This is my python code. I am trying to get the returned value(aa1) from the print_cube()
Is there a way to get the value of aa1 inside the main(). I have to use multiprocessing to call other functions also.
import multiprocessing
def print_cube(num):
aa1 = num * num * num
return aa1
def main():
# creating processes
p1 = multiprocessing.Process(target=print_cube, args=(10, ))
p1.start()
main()
Use multiprocessing.Pool when you want to retrieve return values.
def print_cube(num):
aa1 = num * num * num
return aa1
def main():
with Pool(5) as p:
results = p.map(print_cube, range(10, 15))
print(results)
if __name__ == "__main__":
main()
You can use Queue from multiprocessing, then pass it to print_cube() as shown below:
from multiprocessing import Process, Queue
def print_cube(num, q):
aa1 = num * num * num
q.put(aa1)
def main():
queue = Queue()
p1 = Process(target=print_cube, args=(10, queue))
p1.start()
print(queue.get()) # 1000
main()
This is the result below:
1000
Be careful, if using queue module below with process, the program doesn't work properly:
import queue
queue = queue.Queue()
So, just use Queue from multiprocessing module with process as I used in the code above:
from multiprocessing import Queue
queue = Queue()
I want to use use multiprocessing to do the following:
class myClass:
def proc(self):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [_ for i in range(cpu_count())])#What is the correct approach?
return data
Try this:
def proc(self, i):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [i for i in range(cpu_count())])#What is the correct approach?
return data
Since you don't have to pass an argument to the processes, there's no reason to map, just call apply_async() as many times as needed.
Here's what I'm saying:
from multiprocessing import cpu_count
from multiprocessing.pool import Pool
from random import randint
class MyClass:
def proc(self):
#processing random numbers
return randint(1, 10)
def gen_data(self, num_procs):
with Pool() as pool: # The default pool size will be the number of cpus.
results = [pool.apply_async(self.proc) for _ in range(num_procs)]
pool.close()
pool.join() # Wait until all worker processes exit.
return [result.get() for result in results] # Gather results.
if __name__ == '__main__':
obj = MyClass()
print(obj.gen_data(8))
I know the basic usage of multiprocessing about pools,and I use apply_async() func to avoid block,my problem code such like:
from multiprocessing import Pool, Queue
import time
q = Queue(maxsize=20)
script = "my_path/my_exec_file"
def initQueue():
...
def test_func(queue):
print 'Coming'
While True:
do_sth
...
if __name__ == '__main__':
initQueue()
pool = Pool(processes=3)
for i in xrange(11,20):
result = pool.apply_async(test_func, (q,))
pool.close()
while True:
if q.empty():
print 'Queue is emty,quit'
break
print 'Main Process Lintening'
time.sleep(2)
The results output are always Main Process Linstening,I can;t find word 'Coming'..
The code above has no syntax error and no any Exceptions.
Any one can help, thanks!
I am trying to understand how to use the multiprocessing module in Python. The code below spawns four processes and outputs the results as they become available. It seems to me that there must be a better way for how the results are obtained from the Queue; some method that does not rely on counting how many items the Queue contains but that just returns items as they become available and then gracefully exits once the queue is empty. The docs say that Queue.empty() method is not reliable. Is there a better alternative for how to consume the results from the queue?
import multiprocessing as mp
import time
def multby4_wq(x, queue):
print "Starting!"
time.sleep(5.0/x)
a = x*4
queue.put(a)
if __name__ == '__main__':
queue1 = mp.Queue()
for i in range(1, 5):
p = mp.Process(target=multbyc_wq, args=(i, queue1))
p.start()
for i in range(1, 5): # This is what I am referring to as counting again
print queue1.get()
Instead of using queue, how about using Pool?
For example,
import multiprocessing as mp
import time
def multby4_wq(x):
print "Starting!"
time.sleep(5.0/x)
a = x*4
return a
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(multby4_wq, range(1, 5)):
print result
Pass multiple arguments
Assume you have a function that accept multiple parameters (add in this example). Make a wrapper function that pass arguments to add (add_wrapper).
import multiprocessing as mp
import time
def add(x, y):
time.sleep(1)
return x + y
def add_wrapper(args):
return add(*args)
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(add_wrapper, [(1,2), (3,4), (5,6), (7,8)]):
print result