Multiprocessor Numerical Adder - python

So I am currently involved in a university project looking at thousands of samples of genetic data for cancer patients, might program was going to take too long to run so I used multiprocessing, it worked fine on an apple mac my friend borrowed me,but the moment I transferred it over to a university windows system it has failed and im unsure why the program doesn't work anymore.
I decided to strip my code as simply as possible to see the error,my program itself without the multiprocessing element to speed up the number of samples works fine. I believe the problem revolves around the code below. Instead of placing my very long program ive switched out it for a simple addition, and it still does not work, uses a very high cpu and I cannot see where I am going wrong. Kind Regards.
Expected result is instant 5,15,25,35 instantaneously, I have windows 10 on my computer Im currently using.
import multiprocessing
from multiprocessing import Pool
import collections
value=collections.namedtuple('value',['vectx','vecty'])
Values=(value(vectx=0,vecty=5),value(vectx=5,vecty=10),value(vectx=10,vecty=15),value(vectx=15,vecty=20))
print(1)
def Alter(x):
vectx=x.vectx
vecty=x.vecty
Z=(vectx+vecty)
return(Z)
if __name__ == '__main__':
with Pool(2) as p:
result=p.map(Alter, Values)
print(2)
new=[]
for i in result:
new.append(i)
print(new)

I don't know why exactly but this part
print(2)
new=[]
for i in result:
new.append(i)
print(new)
needs to be in the suite of the if statement. Similar to the example in the documentation.
if __name__ == '__main__':
with Pool(2) as p:
result=p.map(Alter, Values)
print(2)
new=[]
for i in result:
new.append(i)
print(new)
I suspect that - Compulsory usage of if __name__==“__main__” in windows while using multiprocessing - may be relevant.
If you run your original code from a command shell (like PowerShell or command prompt) with python -m mymodulename you will see all the stuff that is going on - Tracebacks from multiple spawned processes.

Related

How to utilize multiprocessing to run separate trials on separate cores

My problem seems to be a simple one, but so far I haven't found a satisfactory answer. The code I am running is very time consuming, and I need to run it many times (ideally 100 times or more) and average the results from each trial. I have been told to try multiprocessing, and I have made some progress (in JupyterLab).
#my_code.py
def Run_Code():
<code>
return result
import multiprocessing as mp
import numpy as np
import my_code as mc
trial_amount = 2
if __name__ == '__main__':
pool = mp.Pool(2)
result = pool.map(mc.Run_code, np.arange(trial_amount))
print(result)
I was guided by this introduction (https://sebastianraschka.com/Articles/2014_multiprocessing.html#sections). The ultimate goal is to just run each trial simultaneously (or as many as possible at once and once it finished start another trial, and so on) and put the results in a list that will then be averaged. I tried this and it continued running for hours, much longer than a single trial, and never finished.
Try mpi4py - https://stackoverflow.com/a/15717768/1021819 gives an example of how it goes.
There is a great and simple tutorial here:
https://mpi4py.readthedocs.io/en/stable/tutorial.html
It really is just a few lines. The following also has enough to get you going. It splits a loop of work over cores then aggregates the results on master:
https://stackoverflow.com/a/51318100/1021819

Python multiprocessing pool.map self._event.wait(timeout) is hanging. Why is pool.map wait not responding?

multiprocessing pool.map works nicely on my old PC but does not work on the new PC.
It hangs in the call to
def wait(self,timeout=None)
self._event.wait(timeout)
at which time the cpu utilization drops to zero% with no further response like it has gone to sleep.
I wrote a simple test.py as follows
import multiprocessing as mp
letters = ['A','B','C']
def doit(letter):
for i in range(1000):
print(str(letter) + ' ' + str(i))
if __name__ == '__main__':
pool = mp.Pool()
pool.map(doit,letters)
This works on the old PC with i7-7700k(4cores,8logical), python365-64bit, Win10Pro, PyCharm2018.1 where the stdout displays letters and numbers in non-sequential order as expected.
Though this same code does not work on the new build i9-7960(16core-32logical), python37-64bit, Win10Pro, PyCharm2018.3
New PC bios version has not been updated from 2017/11 (4 months older)
pool.py appears to be the same on both machines (2006-2008 R Oudkerk)
The codeline where it hangs in the 'wait' function is ...
self._event.wait(timeout)
Any help please on where I might look next to find the cause.
Thanks in advance.
....
EDIT::
My further interpretation -
1. GIL (Global interpreter Lock) is not relevant here as this relates to multi-threading only, not multiprocessing.
2. multiprocessing.manager is unnecessary here as the code is consuming static input and producing independent output. So pool.close and pool.join are not required either, as I am not post-process joining results
3. This link is a good introduction to multiprocessing though I don't see a solution in here.
https://docs.python.org/2/library/multiprocessing.html#windows

multiprocessing Linux and Windows - passing variables

I wrote a program that does use multiprocessing module. The program was initially written for Linux but I want to allow also Windows users to use it. The main issue is when passing variables (or rather interpolation functions) to subroutines - on Linux I don't have to make explict use of the interpolation function 'interpolator' and my call for subrotines looks like like:
pool = multiprocessing.Pool()
print 'Executing main loop...'
result2 = []
for i in range(0,NN):
pool.apply_async(SliceCalculate, (i), callback=result2.append)
pool.close()
pool.join()
and all seem to work fine ! The 'SliceCalculate' function uses inside:
interpolator=interpolate.NearestNDInterpolator(xyzpoints,weights,rescale=True)
and finds 'interpolator' automaticaly. On Windows the call needs to look like:
pool = multiprocessing.Pool()
print 'Executing main loop...'
result2 = []
for i in range(0,NN):
pool.apply_async(SliceCalculate, (i,interpolator), callback=result2.append)
pool.close()
pool.join()
and also works. Except one thing - performance drops to something like 40% (same machine used).I compared both version on Linux - on Windows the program meant for Windows has same poor performance while if I try to run the Linux version (no passing 'interpolator') I don't get any results. Any ideas what is wrong ?
I can't share the whole program - its simply too long.
PS. I did some more tests and it seems like when the 'interpolator' is big (e.g. the interpolation values in 3D go to 100x100x100 what I need) then the performance is as described above but when I limit it to e.g. 40x40x40 interpolation points (source points) then the performance of both solutions is same and starts to vary as the 'interpolator' size increases (NearestNDInterpolator us used). Could it be the OS 'issue' and actually I can't do much more ?

Python threading.thread.start() doesn't return control to main thread

I'm trying to a program that executes a piece of code in such a way that the user can stop its execution at any time without stopping the main program. I thought I could do this using threading.Thread, but then I ran the following code in IDLE (Python 3.3):
from threading import *
import math
def f():
eval("math.factorial(1000000000)")
t = Thread(target = f)
t.start()
The last line doesn't return: I eventually restarted the shell. Is this a consequence of the Global Interpreter Lock, or am I doing something wrong? I didn't see anything specific to this problem in the threading documentation (http://docs.python.org/3/library/threading.html)
I tried to do the same thing using a process:
from multiprocessing import *
import math
def f():
eval("math.factorial(1000000000)")
p = Process(target = f)
p.start()
p.is_alive()
The last line returns False, even though I ran it only a few seconds after I started the process! Based on my processor usage, I am forced to conclude that the process never started in the first place. Can somebody please explain what I am doing wrong here?
Thread.start() never returns! Could this have something to do with the C implementation of the math library?
As #eryksun pointed out in the comment: math.factorial() is implemented as a C function that doesn't release GIL so no other Python code may run until it returns.
Note: multiprocessing version should work as is: each Python process has its own GIL.
factorial(1000000000) has hundreds millions of digits. Try import time; time.sleep(10) as dummy calculation instead.
If you have issues with multithreaded code in IDLE then try the same code from the command line, to make sure that the error persists.
If p.is_alive() returns False after p.start() is already called then it might mean that there is an error in f() function e.g., MemoryError.
On my machine, p.is_alive() returns True and one of cpus is at 100% if I paste your code from the question into Python shell.
Unrelated: remove wildcard imports such as from multiprocessing import *. They may shadow other names in your code so that you can't be sure what a given name means e.g., threading could define eval function (it doesn't but it could) with a similar but different semantics that might break your code silently.
I want my program to be able to handle ridiculous inputs from the user gracefully
If you pass user input directly to eval() then the user can do anything.
Is there any way to get a process to print, say, an error message without constructing a pipe or other similar structure?
It is an ordinary Python code:
print(message) # works
The difference is that if several processes run print() then the output might be garbled. You could use a lock to synchronize print() calls.

How do I manage multiple processes in Python?

I have a simple (i hope) question:
my problems started when i wrote a GUI.
i cannot refresh the user interface while executing heavy computations.
-if i use threads there is the G.I.L. (not too slow but the gui freezes)
i tryed so many things that my last hope is starting a new process (and here the problem)
first of all:
-i never used processes before (it could be a semantic error)
-i don't know the limitations ( and exceptions ) of processes
-i am running with cpython 3.1.2 , on Mac os x v 10.6.8
here is an example (not the real code but the result is the same) of what i need to solve:
from multiprocessing import *
def bob(q):
print(q)
A=Process(target=bob,args=("something"))
A.start()
A.is_alive()
A.join()
and the output is:
True
it doesn't print "something",so i guess it doesn't run the process,but "A.is_alive()" says it is running and when the interpreter arrives to "A.join()" it waits more or less forever
can someone explain me this?
You need to add comma: args=("something",).
Comma creates a tuple otherwise it is just a string in parentheses.
You should give a list of arguments, not just the argument. This does the job for me:
from multiprocessing import *
def bob(q):
print(q)
A=Process(target=bob,args=["something"])
A.start()
A.is_alive()
A.join()
The following using sleep-sort (http://stackoverflow.com/questions/6474318/what-is-the-time-complexity-of-the-sleep-sort) to sort upper case characters A-Z
somestring="DGAECBF"
from multiprocessing import *
def bob(t):
import time
time.sleep(ord(t)-ord("A"))
print(t)
p=[]
for c in somestring :
p.append(Process(target=bob,args=([c])))
p[-1].start()
for pp in p:
pp.join()

Categories