Python Multiprocessing: Running each iteration of multiple for loops in parallel - python

I have two loops in python. Here is some pseudo code. I would like to run both those functions and each iteration of each of those functions at the same time. So in this example, there would be 8 processes going on at once. I know you can use "Process", but I just don't know how to incorporate an iterable. Please let me know, thanks!
import...
def example1(iteration):
print('stuff')
def example2(iteration):
print('stuff')
if __name__ == '__main__':
freeze_support()
pool = multiprocessing.Pool(4)
iteration = [1,2,3,4]
pool.map(example1,iteration)

Assuming they don't need to be kicked off at exactly the same time I think map_async is what you want.
In the example bellow we can print the result from example2 before example1 has finished even though example1 was kicked off first.
import multiprocessing
import time
def example1(iteration):
time.sleep(1)
return 1
def example2(iteration):
return 2
if __name__ == '__main__':
pool = multiprocessing.Pool(4)
iteration = [1,2,3,4]
result1 = pool.map_async(example1, iteration)
result2 = pool.map_async(example2, iteration)
print(result2.get())
print(result1.get())

Related

Python multiprocessing is not giving expected results

I am new to multiprocessing with python, I was following a course and i find thatthe code is not working as they say in the tutorials. for example:
this code :
import multiprocessing
# empty list with global scope
result = []
def square_list(mylist):
"""
function to square a given list
"""
global result
# append squares of mylist to global list result
for num in mylist:
result.append(num * num)
# print global list result
print("Result(in process p1): {}".format(result))
if __name__ == "__main__":
# input list
mylist = [1,2,3,4]
# creating new process
p1 = multiprocessing.Process(target=square_list, args=(mylist,))
# starting process
p1.start()
# wait until process is finished
p1.join()
# print global result list
print("Result(in main program): {}".format(result))
should print this result as they say in the tutorial:
Result(in process p1): [1, 4, 9, 16]
Result(in main program): []
but when I run it it prints
Result(in main program): []
I think the prosses did not even start.
I am using python 3.7.9 from anaconda.
how to fix this ?
Do not use global Variables which you access at the same time. Global Variables are most of the time a very bad idea and should be used very carefully.
The easiest way is to use p.map. (you don't have to start/join the processes)
with Pool(5) as p:
result=p.map(square_list,mylist)
If you do not want to use p.map you can use also q.put() to return the value and q.get() to get the value from the function
You can find also examples for getting the result in multiprocessed function here:
https://docs.python.org/3/library/multiprocessing.html

Python multiprocessing pool doesn't take an iterable as an argument

I've read many posts here about multiprocessing.pool, but I still don't understand where the problem in my code is.
I want to parallelize a function using multiprocessing pool in python. The function takes one argument and returns two values. I want this one argument to be an integer and want to iterate over this integer. I've tried the examples I've seen here, but it doesn't work for me (apparently I do something wrong, but what?)
My code:
import multiprocessing
from multiprocessing import Pool
def function(num):
res1 = num ** 2 # calculate someting
res2 = num + num # calculate someting
return res1, res2
if __name__ == '__main__':
num = 10
pool = multiprocessing.Pool(processes=4)
# next line works, but with [something,something,...] as an argument
result = pool.map(function, [1, 100, 10000])
# next line doesn't work and I have no idea why!
result2 = pool.map(function, range(num))
pool.close()
pool.join()
print(result2)
I get TypeError: 'float' object is not subscriptable when I calculate result2.
Would be grateful for help!

Eliminating overhead in multiprocessing with pool

I am currently in a situation where I have parallelized code called repeatedly and try to reduce the overhead associated with the multiprocessing. So, consider the following example, which deliberately contains no "expensive" computations:
import multiprocessing as mp
def f(x):
# toy function
return x*x
if __name__ == '__main__':
for x in range(500):
pool = mp.Pool(processes=2)
print(pool.map(f, range(x, x + 50)))
pool.close()
pool.join() # necessary?
This code takes 53 seconds compared to 0.04 seconds for the sequential approach.
First question: do I really need to call pool.join() in this case when only pool.map() is ever used? I cannot find any negative effects from omitting it and the runtime would drop to 4.8 seconds. (I understand that omitting pool.close() is not possible, as we would be leaking threads then.)
Now, while this would be a nice improvement, as a first answer I would probably get "well, don't create the pool in the loop in the first place". Ok, no problem, but the parallelized code actually lives in an instance method, so I would use:
class MyObject:
def __init__(self):
self.pool = mp.Pool(processes=2)
def function(self, x):
print(self.pool.map(f, range(x, x + 50)))
if __name__ == '__main__':
my_object = MyObject()
for x in range(500):
my_object.function(x)
This would be my favorite solution as it runs in excellent 0.4 seconds.
Second question: should I call pool.close()/pool.join() somewhere explicitly (e.g. in the destructor of MyObject) or is the current code sufficient? (If it matters: it is ok to assume there are only a few long-lived instances of MyObject in my project.)
Of course it takes a long time: you keep allocating a new pool and destroying it for every x.
It will run much faster if instead you do:
if __name__ == '__main__':
pool = mp.Pool(processes=2) # allocate the pool only once
for x in range(500):
print(pool.map(f, range(x, x + 50)))
pool.close() # close it only after all the requests are submitted
pool.join() # wait for the last worker to finish
Try that and you'll see it now works much faster.
Here are links to the docs for join and close:
Once close is called you can't submit more tasks to the pool, and join waits till the last worker finished its job. They should be called in that order (first close then join).
Well, actually you could pass already allocated pool as argument to your object:
class MyObject:
def __init__(self, pool):
self.pool = pool
def function(self, x):
print(self.pool.map(f, range(x, x + 50)))
if __name__ == '__main__':
with mp.Pool(2) as pool:
my_object = MyObject(pool)
my_second_object = MyObject(pool)
for x in range(500):
my_object.function(x)
my_second_object.function(x)
pool.close()
I can not find a reason why it might be necessary to use different pools in different objects

Dictionary multiprocessing

I want to parallelize the processing of a dictionary using the multiprocessing library.
My problem can be reduced to this code:
from multiprocessing import Manager,Pool
def modify_dictionary(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_dictionary,jobargs)
p.close()
p.join()
print dictionary[(3,3)]
I create a pool of 5 workers, and each worker should increment dictionary[(3,3)] 100 times. So, if the locking process works correctly, I expect dictionary[(3,3)] to be 500 at the end of the script.
However; something in my code must be wrong, because this is not what I get: the locking process does not seem to be "activated" and dictionary[(3,3)] always have a valuer <500 at the end of the script.
Could you help me?
The problem is with this line:
dictionary[(3,3)] = dictionary[(3,3)]+1
Three things happen on that line:
Read the value of the dictionary key (3,3)
Increment the value by 1
Write the value back again
But the increment part is happening outside of any locking.
The whole sequence must be atomic, and must be synchronized across all processes. Otherwise the processes will interleave giving you a lower than expected total.
Holding a lock whist incrementing the value ensures that you get the total of 500 you expect:
from multiprocessing import Manager,Pool,Lock
lock = Lock()
def modify_array(dictionary):
if((3,3) not in dictionary):
dictionary[(3,3)]=0.
for i in range(100):
with lock:
dictionary[(3,3)] = dictionary[(3,3)]+1
return 0
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict(lock=True)
jobargs = [(dictionary) for i in range(5)]
p = Pool(5)
t = p.map(modify_array,jobargs)
p.close()
p.join()
print dictionary[(3,3)]
I ve managed many times to find here the correct solution to a programming difficulty I had. So I would like to contribute a little bit. Above code still has the problem of not updating right the dictionary. To have the right result you have to pass lock and correct jobargs to f. In above code you make a new dictionary in every proccess. The code I found to work fine:
from multiprocessing import Process, Manager, Pool, Lock
from functools import partial
def f(dictionary, l, k):
with l:
for i in range(100):
dictionary[3] += 1
if __name__ == "__main__":
manager = Manager()
dictionary = manager.dict()
lock = manager.Lock()
dictionary[3] = 0
jobargs = list(range(5))
pool = Pool()
func = partial(f, dictionary, lock)
t = pool.map(func, jobargs)
pool.close()
pool.join()
print(dictionary)
In the OP's code, it is locking the entire iteration. In general, you should only apply locks for the shortest time, as long as it is effective. The following code is much more efficient. You acquire the lock only to make the code atomic
def f(dictionary, l, k):
for i in range(100):
with l:
dictionary[3] += 1
Note that dictionary[3] += 1 is not atomic, so it must be locked.

How to run three functions at the same time (and return values from each)?

I have three functions, each returning a list. The problem is that running each function takes around 20-30 seconds. So running the entire script ends up taking about 2 min.
I want to use multiprocessing or multithreading (whichever is easier to implement) to have all three functions running at the same time.
The other hurdle I ran into was I that I'm not sure how to return the list from each of the functions.
def main():
masterlist = get_crs_in_snow()
noop_crs = get_noops_in_snow()
made_crs = get_crs_in_git()
# take the prod master list in SNOW, subtract what's been made or is in the noop list
create_me = [obj for obj in masterlist if obj not in made_crs and obj not in noop_crs]
print "There are {0} crs in Service Now not in Ansible".format(len(create_me))
for cr in create_me:
print str(cr[0]),
if __name__ == '__main__':
main()
I figure I can get some significant improvements in run time just by multithreading or multiprocessing the following line:
masterlist = get_crs_in_snow()
noop_crs = get_noops_in_snow()
made_crs = get_crs_in_git()
How do I have these three functions run at the same time?
This is completely untested since I don't have the rest of your code, but it may give you an idea of what can be done. I have adapted your code into the multiprocessing pattern:
from multiprocessing import Pool
def dispatcher(n):
if n == 0:
return get_crs_in_snow()
if n == 1:
return get_noops_in_snow()
if n == 2:
return get_crs_in_git()
def main():
pool = Pool(processes=3)
v = pool.map(dispatcher, range(3))
masterlist = v[0]
noop_crs = v[1]
made_crs = v[2]
# take the prod master list in SNOW, subtract what's been made or is in the noop list
create_me = [obj for obj in masterlist if obj not in made_crs and obj not in noop_crs]
print "There are {0} crs in Service Now not in Ansible".format(len(create_me))
for cr in create_me:
print str(cr[0]),
if __name__ == '__main__':
main()
Try the threading library.
import threading
threading.Thread(target=get_crs_in_snow).start()
threading.Thread(target=get_noops_in_snow).start()
threading.Thread(target=get_crs_in_git).start()
As far as getting their return values, you could wrap the calls to recommon in some class functions and have them save the result to a member variable. Or, you could wrap the calls to recommon in some local functions and simply pass in a mutable object (list or dictionary) to the function, and have the function modify that mutable object.
Or, as others have stated, multiprocessing may be a good way to do what you want.

Categories