python multiprocessing on eclipse - python

I write python code on eclipse using pydev.
The code is following :
from multiprocessing import Process, Queue
import time
g_workercount = 1
def calc_step():
print('calc_step started')
q = Queue()
p_worker = []
for i in range(0, g_workercount):
ww = Process(target=worker_calc_step, args=(q,i,))
ww.start()
p_worker.append(ww)
for ww in p_worker:
ww.join()
print('calc_step ended')
def worker_calc_step(q, n):
print('worker_calc_step started')
print('worker_calc_step ended')
if __name__ == '__main__':
calc_step()
print('finished')
It is a very simple code, and I expected the ouput would be :
calc_step started
worker_calc_step started
worker_calc_step ended
calc_step ended
finished
It is ok executing on console,
but running on eclipse is not ok like :
calc_step started
calc_step ended
finished
I guess before starting worker process, the main process would finished.
So I added sleep code in the main process function, but it is also same in eclipse.
Do you have any idea about it?
It is a little difficult making a multiproccess code in python and eclipse for me.
Thanks.

I'm not really sure what may be wrong there... I've tried it here inside and outside Eclipse/PyDev in Python 2 and 3 and got the same results (where I got the expected output).
Some questions to help diagnose the issue:
Which OS are you using?
What's the Python version?
Have you tried running it under the debugger to see where it might fail?
Have you tried printing to a file instead of stdout? (maybe it's starting the process but it's not properly printing to stdout?)

Related

Multiprocessor Numerical Adder

So I am currently involved in a university project looking at thousands of samples of genetic data for cancer patients, might program was going to take too long to run so I used multiprocessing, it worked fine on an apple mac my friend borrowed me,but the moment I transferred it over to a university windows system it has failed and im unsure why the program doesn't work anymore.
I decided to strip my code as simply as possible to see the error,my program itself without the multiprocessing element to speed up the number of samples works fine. I believe the problem revolves around the code below. Instead of placing my very long program ive switched out it for a simple addition, and it still does not work, uses a very high cpu and I cannot see where I am going wrong. Kind Regards.
Expected result is instant 5,15,25,35 instantaneously, I have windows 10 on my computer Im currently using.
import multiprocessing
from multiprocessing import Pool
import collections
value=collections.namedtuple('value',['vectx','vecty'])
Values=(value(vectx=0,vecty=5),value(vectx=5,vecty=10),value(vectx=10,vecty=15),value(vectx=15,vecty=20))
print(1)
def Alter(x):
vectx=x.vectx
vecty=x.vecty
Z=(vectx+vecty)
return(Z)
if __name__ == '__main__':
with Pool(2) as p:
result=p.map(Alter, Values)
print(2)
new=[]
for i in result:
new.append(i)
print(new)
I don't know why exactly but this part
print(2)
new=[]
for i in result:
new.append(i)
print(new)
needs to be in the suite of the if statement. Similar to the example in the documentation.
if __name__ == '__main__':
with Pool(2) as p:
result=p.map(Alter, Values)
print(2)
new=[]
for i in result:
new.append(i)
print(new)
I suspect that - Compulsory usage of if __name__==“__main__” in windows while using multiprocessing - may be relevant.
If you run your original code from a command shell (like PowerShell or command prompt) with python -m mymodulename you will see all the stuff that is going on - Tracebacks from multiple spawned processes.

Python multiprocessing pool.map self._event.wait(timeout) is hanging. Why is pool.map wait not responding?

multiprocessing pool.map works nicely on my old PC but does not work on the new PC.
It hangs in the call to
def wait(self,timeout=None)
self._event.wait(timeout)
at which time the cpu utilization drops to zero% with no further response like it has gone to sleep.
I wrote a simple test.py as follows
import multiprocessing as mp
letters = ['A','B','C']
def doit(letter):
for i in range(1000):
print(str(letter) + ' ' + str(i))
if __name__ == '__main__':
pool = mp.Pool()
pool.map(doit,letters)
This works on the old PC with i7-7700k(4cores,8logical), python365-64bit, Win10Pro, PyCharm2018.1 where the stdout displays letters and numbers in non-sequential order as expected.
Though this same code does not work on the new build i9-7960(16core-32logical), python37-64bit, Win10Pro, PyCharm2018.3
New PC bios version has not been updated from 2017/11 (4 months older)
pool.py appears to be the same on both machines (2006-2008 R Oudkerk)
The codeline where it hangs in the 'wait' function is ...
self._event.wait(timeout)
Any help please on where I might look next to find the cause.
Thanks in advance.
....
EDIT::
My further interpretation -
1. GIL (Global interpreter Lock) is not relevant here as this relates to multi-threading only, not multiprocessing.
2. multiprocessing.manager is unnecessary here as the code is consuming static input and producing independent output. So pool.close and pool.join are not required either, as I am not post-process joining results
3. This link is a good introduction to multiprocessing though I don't see a solution in here.
https://docs.python.org/2/library/multiprocessing.html#windows

Trying to understand multiprocessing and queues across python modules

I'm trying to understand multiprocessing. My actual application is to display log messages in real time on a pyqt5 GUI, but I ran into some problems using queues so I made a simple program to test it out.
The issue I'm seeing is that I am unable to add elements to a Queue across python modules and across processes. Here is my code and my output, along with the expected output.
Config file for globals:
# cfg.py
# Using a config file to import my globals across modules
#import queue
import multiprocessing
# q = queue.Queue()
q = multiprocessing.Queue()
Main module:
# mod1.py
import cfg
import mod2
import multiprocessing
def testq():
global q
print("q has {} elements".format(cfg.q.qsize()))
if __name__ == '__main__':
testq()
p = multiprocessing.Process(target=mod2.add_to_q)
p.start()
p.join()
testq()
mod2.pullfromq()
testq()
Secondary module:
# mod2.py
import cfg
def add_to_q():
cfg.q.put("Hello")
cfg.q.put("World!")
print("qsize in add_to_q is {}".format(cfg.q.qsize()))
def pullfromq():
if not cfg.q.empty():
msg = cfg.q.get()
print(msg)
Here is the output that I actually get from this:
q has 0 elements
qsize in add_to_q is 2
q has 0 elements
q has 0 elements
vs the output that I would expect to get:
q has 0 elements
qsize in add_to_q is 2
q has 2 elements
Hello
q has 1 elements
So far I have tried using both multiprocessing.Queue and queue.Queue. I have also tested this with and without Process.join().
If I run the same program without using multiprocessing, I get the expected output shown above.
What am I doing wrong here?
EDIT:
Process.run() gives me the expected output, but it also blocks the main process while it is running, which is not what I want to do.
My understanding is that Process.run() runs the created process in the context of the calling process (in my case the main process), meaning that it is no different from the main process calling the same function.
I still don't understand why my queue behavior isn't working as expected
I've discovered the root of the issue and I'll document it here for future searches, but I'd still like to know if there's a standard solution to creating a global queue between modules so I'll accept any other answers/comments.
I found the problem when I added the following to my cfg.py file.
print("cfg.py is running in process {}".format(multiprocessing.current_process()))
This gave me the following output:
cfg.py is running in process <_MainProcess(MainProcess, started)>
cfg.py is running in process <_MainProcess(Process-1, started)>
cfg.py is running in process <_MainProcess(Process-2, started)>
It would appear that I'm creating separate Queue objects for each process that I create, which would certainly explain why they aren't interacting as expected.
This question has a comment stating that
a shared queue needs to originate from the master process, which is then passed to all of its subprocesses.
All this being said, I'd still like to know if there is an effective way to share a global queue between modules without having to pass it between methods.

Using multiprocessing in python script with Django models

I am writing a custom script to run multiple instances of the same functions using multiprocessing with django models.
The code which concerns this post consists of:
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = Process(target=script, args=(counters[counter][count],))
p.start()
p.join()
the loops execute correctly, but I am having a problem with the __name__ == '__main__' statement. I could hack it together to say __name__=__main__ before that line, but then I would run into a problem where p.start() throws an error for:
PicklingError: Can't pickle <function nordstrom_script at 0x0000000003B2A208>: it's not found as __main__.nordstrom_script
I am relatively new to python/django and have never experimented with multiprocessing before, so please excuse my lack of knowledge if something is dreadfully wrong with my logic.
Any help resolving this would be greatly appreciated. I know that django does not work well with multiprocessing, and the problem comes from me using:
>>>python manage.py shell
>>>execscript('scripts/script.py')
and not
>>>python scripts/script.py
This version is directly runnable, and works for me. Could you modify this code to produce the same error? Note that it only processes the 1st arg of 'counters', I assume this is by design.
source
import multiprocessing
def produce(arg):
print 'arg:',arg
scripts = [produce]
counters = [ [3350000, 7000000] ]
if __name__ == '__main__':
for count, script in enumerate(scripts):
for counter in range(0, len(counters)):
p = multiprocessing.Process(
target=script, args=(counters[counter][count],)
)
p.start()
p.join()
output
arg: 3350000

Python threading.thread.start() doesn't return control to main thread

I'm trying to a program that executes a piece of code in such a way that the user can stop its execution at any time without stopping the main program. I thought I could do this using threading.Thread, but then I ran the following code in IDLE (Python 3.3):
from threading import *
import math
def f():
eval("math.factorial(1000000000)")
t = Thread(target = f)
t.start()
The last line doesn't return: I eventually restarted the shell. Is this a consequence of the Global Interpreter Lock, or am I doing something wrong? I didn't see anything specific to this problem in the threading documentation (http://docs.python.org/3/library/threading.html)
I tried to do the same thing using a process:
from multiprocessing import *
import math
def f():
eval("math.factorial(1000000000)")
p = Process(target = f)
p.start()
p.is_alive()
The last line returns False, even though I ran it only a few seconds after I started the process! Based on my processor usage, I am forced to conclude that the process never started in the first place. Can somebody please explain what I am doing wrong here?
Thread.start() never returns! Could this have something to do with the C implementation of the math library?
As #eryksun pointed out in the comment: math.factorial() is implemented as a C function that doesn't release GIL so no other Python code may run until it returns.
Note: multiprocessing version should work as is: each Python process has its own GIL.
factorial(1000000000) has hundreds millions of digits. Try import time; time.sleep(10) as dummy calculation instead.
If you have issues with multithreaded code in IDLE then try the same code from the command line, to make sure that the error persists.
If p.is_alive() returns False after p.start() is already called then it might mean that there is an error in f() function e.g., MemoryError.
On my machine, p.is_alive() returns True and one of cpus is at 100% if I paste your code from the question into Python shell.
Unrelated: remove wildcard imports such as from multiprocessing import *. They may shadow other names in your code so that you can't be sure what a given name means e.g., threading could define eval function (it doesn't but it could) with a similar but different semantics that might break your code silently.
I want my program to be able to handle ridiculous inputs from the user gracefully
If you pass user input directly to eval() then the user can do anything.
Is there any way to get a process to print, say, an error message without constructing a pipe or other similar structure?
It is an ordinary Python code:
print(message) # works
The difference is that if several processes run print() then the output might be garbled. You could use a lock to synchronize print() calls.

Categories