I'm beginner with the multiprocessing module in python and I want to use concurrent execution ONLY for my def func. Moreover I'm using some constants in my code and I have problem with them.
The code is (python 3.6.8):
from multiprocessing import Pool
FIRST_COUNT=10
print("Enter your path")
PATH=input()
some_list=[]
for i in range(10000):
some_list.append(i)
def func(some_list):
.....
if __name__ == "__main__":
chunks = [some_list[i::4] for i in range(4)]
pool = Pool(processes=4)
pool.map(func,chunks)
When I try to start this programm, I see the message Enter your path 5 times and 5 times I need to input my path. i.e this code execute 1 + 4(for each processes) times.
I want to use FIRST_COUNT, PATH and some_list like a constants, and use multiprocesseing only for func. How can i do this. Please, help me.
You should put code inside if __name__ == "__main__": to execute it only once
if __name__ == "__main__":
FIRST_COUNT = 10
PATH = input("Enter your path: ")
some_list = list(range(10000))
#some_list = []
#for i in range(10000):
# some_list.append(i)
chunks = [some_list[i::4] for i in range(4)]
pool = Pool(processes=4)
results = pool.map(func, chunks)
print(results)
If you want to use FIRST_COUNT, PATH then better send it to func as arguments.
You will have to create tuples with FIRST_COUNT, PATH in chunks
chunks = [(FIRST_COUNT, PATH, some_list[i::4]) for i in range(4)]
and function will have to get it as tuple and unpack it
def func(args):
first_count, paht, some_list = args
Working example
from multiprocessing import Pool
def func(args):
first_count, path, some_list = args
result = sum(some_list)
print(first_count, path, result)
return result
if __name__ == "__main__":
FIRST_COUNT = 10
PATH = input("Enter your path: ")
some_list = list(range(10000))
#some_list = []
#for i in range(10000):
# some_list.append(i)
chunks = [(FIRST_COUNT, PATH, some_list[i::4]) for i in range(4)]
pool = Pool(processes=4)
all_results = pool.map(func, chunks)
print('all results:', all_results)
EDIT: You can also use starmap() instead of map()
all_results = pool.starmap(func, chunks)
and then you can use (without unpacking arguments)
def func(first_count, path, some_list):
Related
I'm trying to speed up the running time of the script with multiprocessing. When I've tried the same multiprocessing code with more simple definitions like resizing images on different directories the multiprocessing works well but when I tried it with the code seen below, it runs but it does not give any output or raise any errors and I was wondering what could be the reason for this.
I was also wondering how could I use multiprocessing with this code, maybe inheritance is the problem?
class Skeleton:
def __init__(self, path, **kwargs):
if type(path) is str:
self.path = path
self.inputStack = loadStack(self.path).astype(bool)
if kwargs != {}:
aspectRatio = kwargs["aspectRatio"]
self.inputStack = ndimage.interpolation.zoom(self.inputStack, zoom=aspectRatio, order=2,
prefilter=False)
def setThinningOutput(self, mode="reflect"):
# Thinning output
self.skeletonStack = get_thinned(self.inputStack, mode)
def setNetworkGraph(self, findSkeleton=False):
# Network graph of the crowded region removed output
self.skeletonStack = self.inputStack
self.graph = get_networkx_graph_from_array(self.skeletonStack)
def setPrunedSkeletonOutput(self):
# Prune unnecessary segments in crowded regions removed skeleton
self.setNetworkGraph(findSkeleton=True)
self.outputStack = pr.getPrunedSkeleton(self.skeletonStack, self.graph)
saveStack(self.outputStack, self.path + "pruned/")
class Trabeculae (Skeleton):
pass
def TrabeculaeY (path):
path_mrb01_square = Trabeculae(path)
path_mrb01_square.setPrunedSkeletonOutput()
if __name__=='__main__':
path1 = (r' ')
path2 = (r' ')
path3 = (r' ')
the_list =[]
the_list.append(path1)
the_list.append(path2)
the_list.append(path3)
for i in range (0,len(the_list)):
p1 = multiprocessing.Process(target=TrabeculaeY, args=(the_list[i],))
p1.start()
p1.join()
Inheritance is not a problem for multiprocessing.
You must not join() the processes inside the loop. It means that the loop waits until p1 finished doing its work, before it continues with the next one.
Instead, start all processes in a loop, then wait for all processes in a second loop like this:
if __name__=='__main__':
path1 = (r' ')
path2 = (r' ')
path3 = (r' ')
the_list =[]
the_list.append(path1)
the_list.append(path2)
the_list.append(path3)
started_processes = []
for i in range (0,len(the_list)):
p1 = multiprocessing.Process(target=TrabeculaeY, args=(the_list[i],))
p1.start()
started_processes.append(p1)
for p in started_processes:
p.join()
Full code I used for testing:
import multiprocessing
class Skeleton:
def __init__(self, path, **kwargs):
self.path = path
pass
def setThinningOutput(self, mode="reflect"):
pass
def setNetworkGraph(self, findSkeleton=False):
pass
def setPrunedSkeletonOutput(self):
print(self.path)
class Trabeculae(Skeleton):
pass
def TrabeculaeY(path:str):
path_mrb01_square = Trabeculae(path)
path_mrb01_square.setPrunedSkeletonOutput()
if __name__ == '__main__':
the_list = [r'1', r'2', r'3']
started_processes = []
for path in the_list:
process = multiprocessing.Process(target=TrabeculaeY, args=path)
process.start()
started_processes.append(process)
for process in started_processes:
process.join()
I want to run a "main"-function for n times. This function starts other functions when it is running.
The "main"-function is called "repeat" and when it is running it first starts the function "copula_sim" and from there I get an output which is called "total_summe_liste". This list will be added to "mega_summe_list" which safes all outputs from the n runs. The sorted "total_summe_liste" will be safed as " RM_list" which is the input for the functions "VaR_func", "CVaR_func" and "power_func" which all generate an output which is sorted in the specific list "RM_VaR_list", "RM_CVaR_list" or "RM_PSRM_list". After that "RM_list" and "total_summe_liste" will be cleared before the next run begins.
In the end I got "mega_summe_list", "RM_VaR_list", "RM_CVaR_list" and "RM_PSRM_list" which will be used to generate an plot and a dataframe.
Now I want to run the "repeat"-function parallel. For example when I want to run this function n=10 times I want to run it on 10 cpu cores at the same time. The reason is that "copula_sim" is a monte-carlo-simulation which take a while when I make a big simulation.
What I have is this:
total_summe_liste = []
RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []
def repeat():
global RM_list
global total_summe_liste
global RM_VaR_list
global RM_CVaR_list
global RM_PSRM_list
global mega_summe_list
copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
mega_summe_list += total_summe_liste
RM_list = sorted(total_summe_liste)
VaR_func(alpha)
RM_VaR_list.append(VaR)
CVaR_func(alpha)
RM_CVaR_list.append(CVaR)
power_func(gamma)
RM_PSRM_list.append(risk)
RM_list = []
total_summe_liste = []
n = 10
for i in range(0,n):
repeat()
which is working so far.
I tryed:
if __name__ == '__main__':
jobs = []
for i in range(0,10):
p = mp.Process(target=repeat)
jobs.append(p)
p.start()
But when I run this the "mega_summe_list" is empty.. When I add "print(VaR) to repeat then it shows me all 10 VaR when its done. So the parallel task is working so far.
What is the problem?
The reason for this issue is because, the list mega_summe_list is not shared between the processes.
When you invoke parallel processing in python all the functions and variables are imported and run independently in different processes.
So, for instance when you start 5 processes, 5 different copies of these variables are imported and run independently. So, when you access mega_summe_list in main it is still empty, because it is empty in this process.
To enable synchronization between processes, you can use a list proxy from the multiprocessing package.
A Multiprocessing manager maintains an independent server process where in these python objects are held.
Below is the code used to create a multiprocessing Manager List,
from multiprocessing import Manager
mega_summe_list = Manager().List()
Above code can be used instead of mega_summe_list = [] while using multiprocessing.
Below is an example,
from multiprocessing.pool import Pool
from multiprocessing import Manager
def repeat_test(_):
global b, mp_list
a = [1,2,3]
b += a
mp_list += a # Multiprocessing Manager List
a = []
if __name__ == "__main__":
b = []
mp_list = Manager().list()
p = Pool(5)
p.map(repeat_test, range(5))
print("a: {0}, \n mp_list: {1}".format(b, mp_list))
Output:
b: [],
mp_list: [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]
Hope this solves your problem.
You should use the Multiprocessing Pool, then you can do something like:
p = Pool(10)
p.map(repeat, range(10))
I solved the problem this way:
This function is the function I want to repeat n times in parallel way:
from multiprocessing import Process
from multiprocessing import Manager
from multiprocessing.pool import Pool
def repeat(shared_list, VaR_list, CVaR_list, PSRM_list, i):
global RM_list
global total_summe_liste
copula_sim(runs_sim, rand_x, rand_y, mu, full_log=False)
shared_list += total_summe_liste
RM_list = sorted(total_summe_liste)
VaR_func(alpha)
VaR_list.append(VaR)
CVaR_func(alpha)
CVaR_list.append(CVaR)
power_func(gamma)
PSRM_list.append(risk)
RM_list = []
total_summe_liste = []
This part manages the shared lists and do the paralleling stuff. Thanks #noufel13!
RM_VaR_list = []
RM_CVaR_list = []
RM_PSRM_list = []
mega_summe_list = []
if __name__ == "__main__":
with Manager() as manager:
shared_list = manager.list()
VaR_list = manager.list()
CVaR_list = manager.list()
PSRM_list = manager.list()
processes = []
for i in range(12):
p = Process(target=repeat, args=(shared_list, VaR_list, CVaR_list, PSRM_list, i)) # Passing the list
p.start()
processes.append(p)
for p in processes:
p.join()
RM_VaR_list += VaR_list
RM_CVaR_list += CVaR_list
RM_PSRM_list += PSRM_list
mega_summe_list += shared_list
RM_frame_func()
plotty_func()
Thank you!
The only question left is how I handle big arrays? Is there a way to do this morr efficiently? One of the 12 shared lists can have more than 100.000.000 items so in total the mega_summe_list has got about 1.200.000.000 items...
I want to use use multiprocessing to do the following:
class myClass:
def proc(self):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [_ for i in range(cpu_count())])#What is the correct approach?
return data
Try this:
def proc(self, i):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [i for i in range(cpu_count())])#What is the correct approach?
return data
Since you don't have to pass an argument to the processes, there's no reason to map, just call apply_async() as many times as needed.
Here's what I'm saying:
from multiprocessing import cpu_count
from multiprocessing.pool import Pool
from random import randint
class MyClass:
def proc(self):
#processing random numbers
return randint(1, 10)
def gen_data(self, num_procs):
with Pool() as pool: # The default pool size will be the number of cpus.
results = [pool.apply_async(self.proc) for _ in range(num_procs)]
pool.close()
pool.join() # Wait until all worker processes exit.
return [result.get() for result in results] # Gather results.
if __name__ == '__main__':
obj = MyClass()
print(obj.gen_data(8))
I am trying to understand how to use the multiprocessing module in Python. The code below spawns four processes and outputs the results as they become available. It seems to me that there must be a better way for how the results are obtained from the Queue; some method that does not rely on counting how many items the Queue contains but that just returns items as they become available and then gracefully exits once the queue is empty. The docs say that Queue.empty() method is not reliable. Is there a better alternative for how to consume the results from the queue?
import multiprocessing as mp
import time
def multby4_wq(x, queue):
print "Starting!"
time.sleep(5.0/x)
a = x*4
queue.put(a)
if __name__ == '__main__':
queue1 = mp.Queue()
for i in range(1, 5):
p = mp.Process(target=multbyc_wq, args=(i, queue1))
p.start()
for i in range(1, 5): # This is what I am referring to as counting again
print queue1.get()
Instead of using queue, how about using Pool?
For example,
import multiprocessing as mp
import time
def multby4_wq(x):
print "Starting!"
time.sleep(5.0/x)
a = x*4
return a
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(multby4_wq, range(1, 5)):
print result
Pass multiple arguments
Assume you have a function that accept multiple parameters (add in this example). Make a wrapper function that pass arguments to add (add_wrapper).
import multiprocessing as mp
import time
def add(x, y):
time.sleep(1)
return x + y
def add_wrapper(args):
return add(*args)
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(add_wrapper, [(1,2), (3,4), (5,6), (7,8)]):
print result
I want a long-running process to return its progress over a Queue (or something similar) which I will feed to a progress bar dialog. I also need the result when the process is completed. A test example here fails with a RuntimeError: Queue objects should only be shared between processes through inheritance.
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
q = multiprocessing.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
I've been able to get this to work using individual Process objects (where I am alowed to pass a Queue reference) but then I don't have a pool to manage the many processes I want to launch. Any advise on a better pattern for this?
The following code seems to work:
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
manager = multiprocessing.Manager()
q = manager.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
Note that the Queue is got from a manager.Queue() rather than multiprocessing.Queue(). Thanks Alex for pointing me in this direction.
Making q global works...:
import multiprocessing, time
q = multiprocessing.Queue()
def task(count):
for i in xrange(count):
q.put("%d mississippi" % i)
return "Done"
def main():
pool = multiprocessing.Pool()
result = pool.map_async(task, range(10))
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
If you need multiple queues, e.g. to avoid mixing up the progress of the various pool processes, a global list of queues should work (of course, each process will then need to know what index in the list to use, but that's OK to pass as an argument;-).