python multiprocessing pool.map hangs - python

I cannot make even the simplest of examples of parallel processing using the multiprocessing package run in python 2.7 (using spyder as a UI on windows) and I need help figuring out the issue. I have run conda update so all of the packages should be up to date and compatible.
Even the first example in the multiprocessing package documentation (given below) wont work, it generates 4 new processes but the console just hangs. I have tried everything I can find over the last 3 days but none of the code that runs without hanging will allocate more than 25% of my computing power to this task (I have a 4 core computer).
I have given up on running the procedure I have designed and need parallel processing for at this point and I am only trying to get proof of concept so I can build from there. Can someone explain and point me in the right direction? Thanks
Example 1 from https://docs.python.org/2/library/multiprocessing.html
#
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool()
print(p.map(f, [1, 2, 3]))
Example 2 (modified from original) from http://chriskiehl.com/article/parallelism-in-one-line/
from multiprocessing import Pool
def fn(i):
return [i,i*i,i*i*i]
test = range(10)
if __name__ == '__main__':
pool = Pool()
results = map(fn,test)
pool.close()
pool.join()
I apologize if there is indeed an answer to this as it seems as though I should be able to manage such a modest task but I am not a programmer and the resources I have found have been less than helpful given my very limited level of knowledge. Please let me know what further information is needed.
Thank you.

After installing spyder on my virtualmachine, it seems to be a spyder specific bug. Example 1 works in IDLE, executed via the command line, executed from within spyder (first saved and then executed), but not when executed line by line in spyder.
I would suggest simply to create a new file in spyder, add the lines of code, save it, and then run it..
For related reports see:
https://groups.google.com/forum/#!topic/spyderlib/LP5d8QZTXd0
QtConse in Spyder cannot use multiprocessing.Manager
Multiprocessing working in Python but not in iPython
https://github.com/spyder-ide/spyder/issues/1900

Related

How to use multiprocessing in jupyter notebook via windows 10 [duplicate]

This question already has answers here:
Jupyter notebook never finishes processing using multiprocessing (Python 3)
(5 answers)
Closed 4 years ago.
I tried to use multiprocessing for a simple code in windows 10, python2.7 and jupyter notebook. When I run the code the kernel got stack without any errors thrown. I checked the task manager for performance and saw 8 (the number of cores in my CPU) processes running with 0% use of the CPU for each and everyone of them.
I looked almost everywhere but didn't find anything. I also tried it in the anaconda prompt but got endless loop of errors.
Here is my code:
import multiprocessing
n_cpu = multiprocessing.cpu_count()
def foo(x):
return x**2
if __name__ == '__main__':
pool = multiprocessing.Pool(processes = n_cpu)
res = pool.map(foo, [1,2,3])
pool.close()
Is this your actual code? With foo being so simple, this code will run so quickly that it's likely to look like 0% CPU usage.
If you see that there are sub-processes spawned, then pool.map is doing it's job. If you want to see that all of them actually use full CPU, give them some real meat to chew on (maybe something like sum([sum(range(x)) for x in range(1000)]).

Running scipy.odeint through multiple cores

I'm working in Jupyter (Anaconda) with Python 2.7.
I'm trying to get an odeint function I wrote to run multiple times, however it takes an incredible amount of time.
While trying to figure out how to decrease the run time, I realized that when I ran it only took up about 12% of my CPU.
I operate off of an Intel Core i7-3740QM # 2.70GHz:
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-3740QM+%40+2.70GHz&id=1481
So, I'm assuming this is because of Python's GIL causing my script to only run off of one core.
After some research on parallel processing in Python, I thought I found the answer by using this code:
import sys
import multiprocessing as mp
Altitude = np.array([[550],[500],[450],[400],[350],[300]])
if __name__ == "__main__":
processes = 4
p = mp.Pool(processes)
mp_solutions = p.map(calc, Altitude)
This doesn't seem to work though. Once I run it, Jupyter just becomes constantly busy. My first thought was that it was just a high computation level so it was taking a long time, but then I looked at my CPU usage and although there were multiple instances of Python processes, none of them were using any CPU.
I can't figure out what the reasoning for this is. I found this post as well and tried using their code but it simply did the same thing:
Multiple scipy.integrate.ode instances
Any help would be much appreciated.
Thanks!

Puzzling behaviour of __name__ variable using multiprocessing (Python 2.7)

Today i was messing with the multiprocessing library, and i noticed something wierd. I was trying to figure out if it was possible to have nested scripts using multiprocessing (a script that uses multiprocessing to run a part of a script which uses multiprocessing to run more parts of the script). To figure this out i started looking at what the __name__ variable is for the child-scripts, because, if you are familiar with multiprocessing, you know this is going to be a problem
When i wrote a test script, the result surprised me. I wrote this simple script:
from multiprocessing import Pool
def Child(Inf):
print "Child" + __name__
if __name__ == "__main__":
Process = Pool(4)
Process.map(Child, [0,0,0,0])
print "Parent" + __name__
(don't mind the list of four zero's)
The console printed out this:
Child__main__
Child__main__
Child__main__
Child__main__
Parent__main__
This means that the __name__ of the child processes is also __main__
if __name__ == "__main__":
This is the part that puzzles me. After testing, it appeared that the child process gets run 4 times, while the if statement only gets run once. This makes sense when reading the code, but the testing shows that all processes are called the same, and the computer shouldn't be able to tell the difference between them, because the child looks at a variable that is no different from the parent.
I am puzzled by this, because i thought i understood how putting in the if statement prevents the child-processes from running the main program as well, but this appears to be untrue.
Am a missing an important clue or is this just something weird i shouldn't be looking at? :p
Regards,
Harm
What happens is that each process is not fed with the same input.
The parent process will receive and execute the full .py file you input while child processes will fork from the parent with certain functions loaded into memory and will be requested to run one specific function instead of running the entire program (which would lead to infinite recursion...).
The fact __name__ variable is the same is because each child process is a copy of the parent. They are just at different executing points.
On Windows OS:
I didn't notice until now but somehow Python runs the code again when creating multiple processes on Windows.
On Python 3.5 (maybe other versions of Python 3 too, but I didn't check), it will set the __name__ variable to __mp_main__ and avoid the problem.
On Python 2.7, if the value is really __main__, the only explanation I have is that the variable is being set after the input code runs. Otherwise the if block would be executed as well.
Edit: I just tested on Python 2.7.11 and the __name__ variable is set to __parents_main__ before being set to __main__. I would not rely on this because it was changed on Python 3 as you can see.
There is no reason the __name__ should be different at all. It will just give you the name of the module (or __main__ in case of the program itself) where the code using it resides. And you are always calling it either from the original program or from a forked version of it.

Compiled Python Multiprocessing Locks CPU

I'm running a scraper program (using the requests library) that's using a simplistic threading scheme. Each thread goes to the internet, scrapes some data, and returns a dictionary. The multithreading code (using multiprocessing library's Pool) which I'm using looks like the following:
def get_stats():
symbols = create_input_list('.\\combined_in.csv')
pool = Pool(4)
results = pool.map(return_info, symbols)
print results
curr_date_time = datetime.now().strftime('%m-%d-%y_[%H_%M_%S]')
out_uri = '.\scraped_info_out_' + curr_date_time + '.csv'
create_output_file(out_uri, results)
This works GREAT as a script running in powershell, but not so well when compiled to an exe. I used py2exe initially, which created the exe just fine, but when run opens a blank terminal, locks the whole computer, spawns about 10 processes that I can see in task manager, and eventually has to be manually rebooted. The py2exe script is similarly simple and looks like:
from distutils.core import setup
import py2exe
setup(console=['scraper.py'])
Thinking that py2exe might just not play nice with the multiprocessing library, I also tried pyInstaller, with the same result. Additionally, I do have the blocker on the main function call as follows.
if __name__ == '__main__':
get_stats()
Is there a simple trick that I'm missing for when compiling with the multiprocessing library? I'm trying to figure out why it would work fine as a script but break so hard as an exe.
I cant answer about py2exe ... but pyinstaller this is a known problem with a well documented work around (that has always worked for me)
https://stackoverflow.com/a/27694505/541038
provides a good overview of the problem and the solution

Multiprocessing launching too many instances of Python VM

I am writing some multiprocessing code (Python 2.6.4, WinXP) that spawns processes to run background tasks. In playing around with some trivial examples, I am running into an issue where my code just continuously spawns new processes, even though I only tell it to spawn a fixed number.
The program itself runs fine, but if I look in Windows TaskManager, I keep seeing new 'python.exe' processes appear. They just keep spawning more and more as the program runs (eventually starving my machine).
For example,
I would expect the code below to launch 2 python.exe processes. The first being the program itself, and the second being the child process it spawns. Any idea what I am doing wrong?
import time
import multiprocessing
class Agent(multiprocessing.Process):
def __init__(self, i):
multiprocessing.Process.__init__(self)
self.i = i
def run(self):
while True:
print 'hello from %i' % self.i
time.sleep(1)
agent = Agent(1)
agent.start()
It looks like you didn't carefully follow the guidelines in the documentation, specifically this section where it talks about "Safe importing of main module".
You need to protect your launch code with an if __name__ == '__main__': block or you'll get what you're getting, I believe.
I believe it comes down to the multiprocessing module not being able to use os.fork() as it does on Linux, where an already-running process is basically cloned in memory. On Windows (which has no such fork()) it must run a new Python interpreter and tell it to import your main module and then execute the start/run method once that's done. If you have code at "module level", unprotected by the name check, then during the import it starts the whole sequence over again, ad infinitum
When I run this in Linux with python2.6, I see a maximum of 4 python2.6 processes and I can't guarantee that they're all from this process. They're definitely not filling up the machine.
Need new python version? Linux/Windows difference?
I don't see anything wrong with that. Works fine on Ubuntu 9.10 (Python 2.6.4).
Are you sure you don't have cron or something starting multiple copies of your script? Or that the spawned script is not calling anything that would start a new instance, for example as a side effect of import if your code runs directly on import?

Categories