Running scipy.odeint through multiple cores

Running scipy.odeint through multiple cores - python

I'm working in Jupyter (Anaconda) with Python 2.7.
I'm trying to get an odeint function I wrote to run multiple times, however it takes an incredible amount of time.
While trying to figure out how to decrease the run time, I realized that when I ran it only took up about 12% of my CPU.
I operate off of an Intel Core i7-3740QM # 2.70GHz:
https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i7-3740QM+%40+2.70GHz&id=1481
So, I'm assuming this is because of Python's GIL causing my script to only run off of one core.
After some research on parallel processing in Python, I thought I found the answer by using this code:
import sys
import multiprocessing as mp
Altitude = np.array([[550],[500],[450],[400],[350],[300]])
if __name__ == "__main__":
processes = 4
p = mp.Pool(processes)
mp_solutions = p.map(calc, Altitude)
This doesn't seem to work though. Once I run it, Jupyter just becomes constantly busy. My first thought was that it was just a high computation level so it was taking a long time, but then I looked at my CPU usage and although there were multiple instances of Python processes, none of them were using any CPU.
I can't figure out what the reasoning for this is. I found this post as well and tried using their code but it simply did the same thing:
Multiple scipy.integrate.ode instances
Any help would be much appreciated.
Thanks!

Related

Why do two CPUs work when I am only working with one process?

I'm running a script like
def test0():
start = time()
for i in range(int(1e8)):
i += 1
print(time() - start)
When I run this on my machine which has 4 CPUs I get the following trace for CPU usage using the Ubuntu 20.04 system monitor.
In the image you can see I ran two experiments separated by some time. But in each experiment, the activity of 2 of the CPUs peaks. Why?

This seems normal to me. The process that is running your python code is not, at least by default, pinned to a specific core. This means that the process can be switched between different cores, which is what is happening in this case. Those spikes are not simultaneous, it indicates that the process was switched from one core to another.
On Linux, you can observe this using
watch -tdn0.5 ps -mo pid,tid,%cpu,psr -p 172810
where 172810 is PID of the python process (which you can get, for example, from the output of top)
If you want to pin the process to a particular core, you can use psutils in your code.
import psutil
p = psutil.Process()
p.cpu_affinity([0]) # pinning the process to cpu 0
Now, you should see only one core spiking. (but avoid doing this if you don't have a good reason for it).

multiprocessing pool map incredibly slow running a Monte Carlo ensemble of python simulations

I have a python code that runs a 2D diffusion simulation for a set of parameters. I need to run the code many times, O(1000), like a Monte Carlo approach, using different parameter settings each time. In order to do this more quickly I want to use all the cores on my machine (or cluster), so that each core runs one instance of the code.
In the past I have done this successfully for serial fortran codes by writing a python wrapper that then used multiprocessing map (or starmap in the case of multiple arguments) to call the fortan code in an ensemble of simulations. It works very nicely in that you loop over the 1000 simulations, and the python wrapper farms out a new integration to a core as soon as it becomes free after completing a previous integration.
However, now when I set this up to do the same to run multiple instances of my python (instead of fortran) code, I find it is incredibly slow, much slower than simply running the code 1000 times in serial on a single core. Using the system monitor I see that one core is working at a time, and it never goes above 10-20% load, while of course I expected to see N cores running near 100% (as is the case when I farm out fortran jobs).
I thought it might be a write issue, and so I checked the code carefully to ensure that all plotting is switched off, and in fact there is no file/disk access at all, I now merely have one print statement at the end to print out a final diagnostic.
The structure of my code is like this
I have the main python code in toy_diffusion_2d.py which has a single arg of a dictionary with the run parameters in it:
def main(arg)
loop over timesteps:
calculation simulation over a large-grid
print the result statistic
And then I wrote a "wrapper" script, where I import the main simulation code and try to run it in parallel:
from multiprocessing import Pool,cpu_count
import toy_diffusion_2d
# dummy list of arguments
par1=[1,2,3]
par2=[4,5,6]
# make a list of dictionaries to loop over, 3x3=9 simulations in all.
arglist=[{"par1":p1,"par2":p2} for p1 in par1 for p2 in par2]
ncore=min(len(arglist),int(cpu_count()))
with Pool(processes=ncore) as p:
p.map(toy_diffusion_2d.main,arglist)
The above is a shorter paraphrased example, my actual codes are longer, so I have placed them here:
Main code: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_2d.py
You can run this with the default values like this:
python3 toy_diffusion_2d.py
Wrapper script: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_loop.py
You can run a 4 member ensemble like this:
python3 toy_diffusion_loop.py --diffK=[33000,37500] --tau_sub=[20,30]
(note that the final stat is slightly different each run, even with the same values as the model is stochastic, a version of the stochastic allen-cahn equations in case any one is interested, but uses a stupid explicit solver on the diffusion term).
As I said, the second parallel code works, but as I say it is reeeeeallly slow... like it is constantly gating.
I also tried using starmap, but that was not any different, it is almost like the desktop only allows one python interpreter to run at a time...? I spent hours on it, I'm almost at the point to rewrite the code in Fortran. I'm sure I'm just doing something really stupid to prevent parallel execution.
EDIT(1): this problem is occurring on
4.15.0-112-generic x86_64 GNU/Linux, with Python 3.6.9
In response to the comments, in fact I also find it runs fine on my MAC laptop...
EDIT(2): so it seems my question was a bit of a duplicate of several other postings, apologies! As well as the useful links provided by Pavel, I also found this page very helpful: Importing scipy breaks multiprocessing support in Python I'll edit in the solution below to the accepted answer.

The code sample you provide works just fine on my MacOS Catalina 10.15.6. I can guess you're using some Linux distributive, where, according to this answer, it can be the case that numpy import meddles with core affinity due to being linked with OpenBLAS library.
If your Unix supports scheduler interface, something like this will work:
>>> import os
>>> os.sched_setaffinity(0, set(range(cpu_count)))
Another question that has a good explanation of this problem is found here and the solution suggested is this:
os.system('taskset -cp 0-%d %s' % (ncore, os.getpid()))
inserted right before the multiprocessing call.

Python script runs slower when executed multiple times

I have a python script with a normal runtime of ~90 seconds. However, when I change only minor things in it (like alternating the colors in my final pyplot figure) and execute it thusly multiple times in quick succession, its runtime increases up to close to 10 minutes.
Some bullet points of what I'm doing:
I'm not downloading anything neither creating new files with my script.
I merely open some locally saved .dat-files using numpy.genfromtxt and crunch some numbers with them.
I transform my data into a rec-array and use indexing via array.columnname extensively.
For each file I loop over a range of criteria that basically constitute different maximum and minimum values for evaluation, and embedded in that I use an inner loop over the lines of the data arrays. A few if's here and there but nothing fancy, really.
I use the multiprocessing module as follows
import multiprocessing
npro = multiprocessing.cpu_count() # Count the number of processors
pool = multiprocessing.Pool(processes=npro)
bigdata = list(pool.map(analyze, range(len(FileEndings))))
pool.close()
with analyze being my main function and FileEndings its input, a string, to create the right name of the file I want to load and the evaluate. Afterwards, I use it a second time with
pool2 = multiprocessing.Pool(processes=npro)
listofaverages = list(pool2.map(averaging, range(8)))
pool2.close()
averaging being another function of mine.
I use numba's #jit decorator to speed up the basic calculations I do in my inner loops, nogil, nopython, and cache all set to be True. Commenting these out doesn't resolve the issue.
I run the scipt on Ubuntu 16.04 and am using a recent Anaconda build of python to compile.
I write the code in PyCharm and run it in its console most of the time. However, changing to bash doesn't help either.
Simply not running the script for about 3 minutes lets it go back to its normal runtime.
Using htop reveals that all processors are at full capacity when running. I am also seeing a lot of processes stemming from PyCharm (50 or so) that are each at equal MEM% of 7.9. The CPU% is at 0 for most of them, a few exceptions are in the range of several %.
Has anyone experienced such an issue before? And if so, any suggestions what might help? Or are any of the things I use simply prone to cause these problems?

May be closed, the problem was caused by a malfunction of the fan in my machine.

python multiprocessing pool.map hangs

I cannot make even the simplest of examples of parallel processing using the multiprocessing package run in python 2.7 (using spyder as a UI on windows) and I need help figuring out the issue. I have run conda update so all of the packages should be up to date and compatible.
Even the first example in the multiprocessing package documentation (given below) wont work, it generates 4 new processes but the console just hangs. I have tried everything I can find over the last 3 days but none of the code that runs without hanging will allocate more than 25% of my computing power to this task (I have a 4 core computer).
I have given up on running the procedure I have designed and need parallel processing for at this point and I am only trying to get proof of concept so I can build from there. Can someone explain and point me in the right direction? Thanks
Example 1 from https://docs.python.org/2/library/multiprocessing.html
#
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool()
print(p.map(f, [1, 2, 3]))
Example 2 (modified from original) from http://chriskiehl.com/article/parallelism-in-one-line/
from multiprocessing import Pool
def fn(i):
return [i,i*i,i*i*i]
test = range(10)
if __name__ == '__main__':
pool = Pool()
results = map(fn,test)
pool.close()
pool.join()
I apologize if there is indeed an answer to this as it seems as though I should be able to manage such a modest task but I am not a programmer and the resources I have found have been less than helpful given my very limited level of knowledge. Please let me know what further information is needed.
Thank you.

After installing spyder on my virtualmachine, it seems to be a spyder specific bug. Example 1 works in IDLE, executed via the command line, executed from within spyder (first saved and then executed), but not when executed line by line in spyder.
I would suggest simply to create a new file in spyder, add the lines of code, save it, and then run it..
For related reports see:
https://groups.google.com/forum/#!topic/spyderlib/LP5d8QZTXd0
QtConse in Spyder cannot use multiprocessing.Manager
Multiprocessing working in Python but not in iPython
https://github.com/spyder-ide/spyder/issues/1900

Different time taken by python script every time it is runned?

I am working on a Opencv based Python project. I am working on program development which takes less time to execute. For that i have tested my small program print hello world on python to test the time taken to run the program. I had run many time and every time it run it gives me a different run time.
Can you explain me why a simple program is taking different time to execute?
I need my program to be independent of system processes ?

Python gets different amounts of system resources depending upon what else the CPU is doing at the time. If you're playing Skyrim with the highest graphics levels at the time, then your script will run slower than if no other programs were open. But even if your task bar is empty, there may be invisible background processes confounding things.
If you're not already using it, consider using timeit. It performs multiple runs of your program in order to smooth out bad runs caused by a busy OS.
If you absolutely insist on requiring your program to run in the same amount of time every time, you'll need to use an OS that doesn't support multitasking. For example, DOS.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Running scipy.odeint through multiple cores - python

Related

Why do two CPUs work when I am only working with one process?

multiprocessing pool map incredibly slow running a Monte Carlo ensemble of python simulations

Python script runs slower when executed multiple times

python multiprocessing pool.map hangs

Different time taken by python script every time it is runned?

Categories

Resources