python multiprocessing, manager initiates process spawn loop - python

I have a simple python multiprocessing script that sets up a pool of workers that attempt to append work-output to a Manager list. The script has 3 call stacks: - main calls f1 that spawns several worker processes that call another function g1. When one attempts to debug the script (incidentally on Windows 7/64 bit/VS 2010/PyTools) the script runs into a nested process creation loop, spawning an endless number of processes. Can anyone determine why? I'm sure I am missing something very simple. Here's the problematic code: -
import multiprocessing
import logging
manager = multiprocessing.Manager()
results = manager.list()
def g1(x):
y = x*x
print "processing: y = %s" % y
results.append(y)
def f1():
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
pool = multiprocessing.Pool(processes=4)
for (i) in range(0,15):
pool.apply_async(g1, [i])
pool.close()
pool.join()
def main():
f1()
if __name__ == "__main__":
main()
PS: tried adding multiprocessing.freeze_support() to main to no avail.

Basically, what sr2222 mentions in his comment is correct. From the multiprocessing manager docs, it says that the ____main____ module must be importable by the children. Each manager " object corresponds to a spawned child process", so each child is basically re-importing your module (you can see by adding a print statement at module scope to my fixed version!)...which leads to infinite recursion.
One solution would be to move your manager code into f1():
import multiprocessing
import logging
def g1(results, x):
y = x*x
print "processing: y = %s" % y
results.append(y)
def f1():
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
manager = multiprocessing.Manager()
results = manager.list()
pool = multiprocessing.Pool(processes=4)
for (i) in range(0,15):
pool.apply_async(g1, [results, i])
pool.close()
pool.join()
def main():
f1()
if __name__ == "__main__":
main()

Related

Sharing a Pool object across processes in python multiprocessing

I am trying to share a pool object between multiple processes using the following code
from multiprocessing import Process, Pool
import time
pool = Pool(5)
def print_hello():
time.sleep(1)
return "hello"
def pipeline():
print("In pipeline")
msg = pool.apply_async(print_hello()).get(timeout=1.5)
print("In pipeline")
print(msg)
def post():
p = Process(target = pipeline)
p.start()
return
if __name__ == '__main__':
post()
print("Returned from post")
However the code exists with the timout since get() doesnot return. I believe this has to do with pool being a globally accessible variable because it works just fine when I move pool to being local to pipeline function. Can anyone give me suggestions if there exists a workaround for this problem ?
Edit: finally got working with thread instead of process for running pipeline function.

Using Multiprocessing with Modules

I am writing a module such that in one function I want to use the Pool function from the multiprocessing library in Python 3.6. I have done some research on the problem and the it seems that you cannot use if __name__=="__main__" as the code is not being run from main. I have also noticed that the python pool processes get initialized in my task manager but essentially are stuck.
So for example:
class myClass()
...
lots of different functions here
...
def multiprocessFunc()
do stuff in here
def funcThatCallsMultiprocessFunc()
array=[array of filenames to be called]
if __name__=="__main__":
p = Pool(processes=20)
p.map_async(multiprocessFunc,array)
I tried to remove the if __name__=="__main__" part but still no dice. any help would appreciated.
It seems to me that your have just missed out a self. from your code. I should think this will work:
class myClass():
...
# lots of different functions here
...
def multiprocessFunc(self, file):
# do stuff in here
def funcThatCallsMultiprocessFunc(self):
array = [array of filenames to be called]
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array) #added self. here
Now having done some experiments, I see that map_async could take quite some time to start up (I think because multiprocessing creates processes) and any test code might call funcThatCallsMultiprocessFunc and then quit before the Pool has got started.
In my tests I had to wait for over 10 seconds after funcThatCallsMultiprocessFunc before calls to multiprocessFunc started. But once started, they seemed to run just fine.
This is the actual code I've used:
MyClass.py
from multiprocessing import Pool
import time
import string
class myClass():
def __init__(self):
self.result = None
def multiprocessFunc(self, f):
time.sleep(1)
print(f)
return f
def funcThatCallsMultiprocessFunc(self):
array = [c for c in string.ascii_lowercase]
print(array)
p = Pool(processes=20)
p.map_async(self.multiprocessFunc, array, callback=self.done)
p.close()
def done(self, arg):
self.result = 'Done'
print('done', arg)
Run.py
from MyClass import myClass
import time
def main():
c = myClass()
c.funcThatCallsMultiprocessFunc()
for i in range(30):
print(i, c.result)
time.sleep(1)
if __name__=="__main__":
main()
The if __name__=='__main__' construct is an import protection. You want to use it, to stop multiprocessing from running your setup on import.
In your case, you can leave out this protection in the class setup. Be sure to protect the execution points of the class in the calling file like this:
def apply_async_with_callback():
pool = mp.Pool(processes=30)
for i in range(z):
pool.apply_async(parallel_function, args = (i,x,y, ), callback = callback_function)
pool.close()
pool.join()
print "Multiprocessing done!"
if __name__ == '__main__':
apply_async_with_callback()

Basic multiprocessing with infinity loop and queue

import random
import queue as Queue
import _thread as Thread
a = Queue.Queue()
def af():
while True:
a.put(random.randint(0,1000))
def bf():
while True:
if (not a.empty()): print (a.get())
def main():
Thread.start_new_thread(af, ())
Thread.start_new_thread(bf, ())
return
if __name__ == "__main__":
main()
the above code works fine with extreme high CPU usage, i tried to use multiprocessing with no avail. i have tried
def main():
multiprocessing.Process(target=af).run()
multiprocessing.Process(target=bf).run()
and
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
pool.apply_async(af)
pool.apply_async(bf)
both not working, can anyone please help me? thanks a bunch ^_^
def main():
multiprocessing.Process(target=af).run() # will not return
multiprocessing.Process(target=bf).run()
The above code does not work because af does not return; no chance to call bf. You need to separate run call to start/join so that both can run in parallel. (+ to make them share manage.Queue)
To make the second code work, you need to pass a (manager.Queue object) to functions. Otherwise they will use Queue.Queue global object which is not shared between processes; need to modify af, bf to accepts a, and main to pass a.
def af(a):
while True:
a.put(random.randint(0, 1000))
def bf(a):
while True:
print(a.get())
def main():
manager = multiprocessing.Manager()
a = manager.Queue()
pool = multiprocessing.Pool()
proc1 = pool.apply_async(af, [a])
proc2 = pool.apply_async(bf, [a])
# Wait until process ends. Uncomment following line if there's no waiting code.
# proc1.get()
# proc2.get()
In the first alternative main you use Process, but the method you should call to start the activity is not run(), as one would think, but rather start(). You will want to follow that up with appropriate join() statements. Following the information in multiprocessing (available here: https://docs.python.org/2/library/multiprocessing.html), here is a working sample:
import random
from multiprocessing import Process, Queue
def af(q):
while True:
q.put(random.randint(0,1000))
def bf(q):
while True:
if not q.empty():
print (q.get())
def main():
a = Queue()
p = Process(target=af, args=(a,))
c = Process(target=bf, args=(a,))
p.start()
c.start()
p.join()
c.join()
if __name__ == "__main__":
main()
To add to the accepted answer, in the original code:
while True:
if not q.empty():
print (q.get())
q.empty() is being called every time which is unnecessary since q.get() if the queue is empty will wait until something is available here documentation.
Similar answer here
I assume that this could affect the performance since calling the .empty() every iteration should consume more resources (it should be more noticeable if Thread was used instead of Process because Python Global Interpreter Lock (GIL))
I know it's an old question but hope it helps!

asynchronous subprocess with Popen

Using Windows 7 + python 2.6, I am trying to run a simulation model in parallel. I can launch multiple instances of the executable by double-clicking on them in my file browser. However, asynchronous calls with Popen result in each successive instance interrupting the previous one. For what it's worth, the executable returns text to the console, but I don't need to collect results interactively.
Here's where I am so far:
import multiprocessing, subprocess
def run(c):
exe = os.path.join("<location>","folder",str(c),"program.exe")
run = os.path.join("<location>","folder",str(c),"run.dat")
subprocess.Popen([exe,run],creationflags = subprocess.CREATE_NEW_CONSOLE)
def main():
pool = multiprocessing.Pool(3)
for c in range(10):
pool.apply_async(run,(str(c),))
pool.close()
pool.join()
if __name__ == '__main__':
main()
After scouring SO for a solution, I've learned that using multiprocessing may be redundant, but I need some way to limit the number of cores working.
Enabled by #J.F. Sebastian's comment regarding the cwd argument.
import multiprocessing, subprocess
def run(c):
exe = os.path.join("<location>","folder",str(c),"program.exe")
run = os.path.join("<location>","folder",str(c),"run.dat")
subprocess.check_call([exe,run],cwd=os.path.join("<location>","folder"),creationflags = subprocess.CREATE_NEW_CONSOLE)
def main():
pool = multiprocessing.Pool(3)
for c in range(10):
pool.apply_async(run,(str(c),))
pool.close()
pool.join()
if __name__ == '__main__':
main()

python3 multiprocessing example crashed my pc :(

I am new to multiprocessing
I have run example code for two 'highly recommended' multiprocessing examples given in response to other stackoverflow multiprocessing questions. Here is an example of one (which i dare not run again!)
test2.py (running from pydev)
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
When I run this my computer slows down imminently. It gets incrementally slower. After some time I managed to get into the task manager only to see MANY MANY python.exe under the processes tab. after trying to end process on some, my mouse stopped moving. It was the second time i was forced to reboot.
I am too scared to attempt a third example...
running - Intel(R) Core(TM) i7 CPU 870 # 2.93GHz (8 CPUs), ~2.9GHz on win7 64
If anyone know what the issue is and can provide a VERY SIMPLE example of multiprocessing (send a string too a multiprocess, alter it and send it back for printing) I would be very grateful.
From the docs:
Make sure that the main module can be safely imported by a new Python
interpreter without causing unintended side effects (such a starting a
new process).
Thus, on Windows, you must wrap your code inside a
if __name__=='__main__':
block.
For example, this sends a string to the worker process, the string is reversed and the result is printed by the main process:
import multiprocessing as mp
def worker(inq,outq):
obj = inq.get()
obj = obj[::-1]
outq.put(obj)
if __name__=='__main__':
inq = mp.Queue()
outq = mp.Queue()
p = mp.Process(target=worker, args=(inq,outq))
p.start()
inq.put('Fancy Dan')
# Wait for the worker to finish
p.join()
result = outq.get()
print(result)
Because of the way multiprocessing works on Windows (child processes import the __main__ module) the __main__ module cannot actually run anything when imported -- any code that should execute when run directly must be protected by the if __name__ == '__main__' idiom. Your corrected code:
import multiprocessing
class MyFancyClass(object):
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print(proc_name, self.name)
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
Might I suggest this link? It's using threads, instead of multiprocessing, but many of the principles are the same.

Categories