Python script runs slower when executed multiple times

Python script runs slower when executed multiple times - python

I have a python script with a normal runtime of ~90 seconds. However, when I change only minor things in it (like alternating the colors in my final pyplot figure) and execute it thusly multiple times in quick succession, its runtime increases up to close to 10 minutes.
Some bullet points of what I'm doing:
I'm not downloading anything neither creating new files with my script.
I merely open some locally saved .dat-files using numpy.genfromtxt and crunch some numbers with them.
I transform my data into a rec-array and use indexing via array.columnname extensively.
For each file I loop over a range of criteria that basically constitute different maximum and minimum values for evaluation, and embedded in that I use an inner loop over the lines of the data arrays. A few if's here and there but nothing fancy, really.
I use the multiprocessing module as follows
import multiprocessing
npro = multiprocessing.cpu_count() # Count the number of processors
pool = multiprocessing.Pool(processes=npro)
bigdata = list(pool.map(analyze, range(len(FileEndings))))
pool.close()
with analyze being my main function and FileEndings its input, a string, to create the right name of the file I want to load and the evaluate. Afterwards, I use it a second time with
pool2 = multiprocessing.Pool(processes=npro)
listofaverages = list(pool2.map(averaging, range(8)))
pool2.close()
averaging being another function of mine.
I use numba's #jit decorator to speed up the basic calculations I do in my inner loops, nogil, nopython, and cache all set to be True. Commenting these out doesn't resolve the issue.
I run the scipt on Ubuntu 16.04 and am using a recent Anaconda build of python to compile.
I write the code in PyCharm and run it in its console most of the time. However, changing to bash doesn't help either.
Simply not running the script for about 3 minutes lets it go back to its normal runtime.
Using htop reveals that all processors are at full capacity when running. I am also seeing a lot of processes stemming from PyCharm (50 or so) that are each at equal MEM% of 7.9. The CPU% is at 0 for most of them, a few exceptions are in the range of several %.
Has anyone experienced such an issue before? And if so, any suggestions what might help? Or are any of the things I use simply prone to cause these problems?

May be closed, the problem was caused by a malfunction of the fan in my machine.

Related

multiprocessing pool map incredibly slow running a Monte Carlo ensemble of python simulations

I have a python code that runs a 2D diffusion simulation for a set of parameters. I need to run the code many times, O(1000), like a Monte Carlo approach, using different parameter settings each time. In order to do this more quickly I want to use all the cores on my machine (or cluster), so that each core runs one instance of the code.
In the past I have done this successfully for serial fortran codes by writing a python wrapper that then used multiprocessing map (or starmap in the case of multiple arguments) to call the fortan code in an ensemble of simulations. It works very nicely in that you loop over the 1000 simulations, and the python wrapper farms out a new integration to a core as soon as it becomes free after completing a previous integration.
However, now when I set this up to do the same to run multiple instances of my python (instead of fortran) code, I find it is incredibly slow, much slower than simply running the code 1000 times in serial on a single core. Using the system monitor I see that one core is working at a time, and it never goes above 10-20% load, while of course I expected to see N cores running near 100% (as is the case when I farm out fortran jobs).
I thought it might be a write issue, and so I checked the code carefully to ensure that all plotting is switched off, and in fact there is no file/disk access at all, I now merely have one print statement at the end to print out a final diagnostic.
The structure of my code is like this
I have the main python code in toy_diffusion_2d.py which has a single arg of a dictionary with the run parameters in it:
def main(arg)
loop over timesteps:
calculation simulation over a large-grid
print the result statistic
And then I wrote a "wrapper" script, where I import the main simulation code and try to run it in parallel:
from multiprocessing import Pool,cpu_count
import toy_diffusion_2d
# dummy list of arguments
par1=[1,2,3]
par2=[4,5,6]
# make a list of dictionaries to loop over, 3x3=9 simulations in all.
arglist=[{"par1":p1,"par2":p2} for p1 in par1 for p2 in par2]
ncore=min(len(arglist),int(cpu_count()))
with Pool(processes=ncore) as p:
p.map(toy_diffusion_2d.main,arglist)
The above is a shorter paraphrased example, my actual codes are longer, so I have placed them here:
Main code: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_2d.py
You can run this with the default values like this:
python3 toy_diffusion_2d.py
Wrapper script: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_loop.py
You can run a 4 member ensemble like this:
python3 toy_diffusion_loop.py --diffK=[33000,37500] --tau_sub=[20,30]
(note that the final stat is slightly different each run, even with the same values as the model is stochastic, a version of the stochastic allen-cahn equations in case any one is interested, but uses a stupid explicit solver on the diffusion term).
As I said, the second parallel code works, but as I say it is reeeeeallly slow... like it is constantly gating.
I also tried using starmap, but that was not any different, it is almost like the desktop only allows one python interpreter to run at a time...? I spent hours on it, I'm almost at the point to rewrite the code in Fortran. I'm sure I'm just doing something really stupid to prevent parallel execution.
EDIT(1): this problem is occurring on
4.15.0-112-generic x86_64 GNU/Linux, with Python 3.6.9
In response to the comments, in fact I also find it runs fine on my MAC laptop...
EDIT(2): so it seems my question was a bit of a duplicate of several other postings, apologies! As well as the useful links provided by Pavel, I also found this page very helpful: Importing scipy breaks multiprocessing support in Python I'll edit in the solution below to the accepted answer.

The code sample you provide works just fine on my MacOS Catalina 10.15.6. I can guess you're using some Linux distributive, where, according to this answer, it can be the case that numpy import meddles with core affinity due to being linked with OpenBLAS library.
If your Unix supports scheduler interface, something like this will work:
>>> import os
>>> os.sched_setaffinity(0, set(range(cpu_count)))
Another question that has a good explanation of this problem is found here and the solution suggested is this:
os.system('taskset -cp 0-%d %s' % (ncore, os.getpid()))
inserted right before the multiprocessing call.

Is it possible to restart spyder console and run script again, every n iterations?

Currently I am trying to do 3000 iterations of some function using a for-loop (using Spyder). The function however, which I imported from a package, contains a bug. Because of the bug, the function gets slower with every iteration. When I restart the IPython console, the function is fast again but it again gets slower with every iteration.
So, this is what I want to do:
Run (for example) 100 iterations
Restart console
Run the next 100 iterations
Restart console
And so forth, until I've done 3000 iterations.
I know this sounds like a cumbersome process but I don't know how to do it differently since I don't have the skills (and time) to debug the complicated function I am using.
Thanks in advance

Big difference in performance looping through 1-1000 in PyCharm and PythonIDLE/Shell

Recently, when I was fiddling Python with different IDE/shells, I was most surprised at the performance differences among them.
The code I wrote is a simple for-loop through 1-1000. When executed by PythonIDLE or Windows Powershell, it took about 16 seconds to finish it while PyCharm almost finished it immediately within about 500ms.
I'm wondering why the difference is so huge.
for x in range(0, 1000, 1):
print(x)

The time to execute the loop is almost zero. The time you're seeing elapse is due to the printing, which is tied to the output facilities of the particular shell you are using. For example, the sort of buffering it does, maybe the graphics routines being used to render the text, etc. There is no practical application for printing numbers in a loop as fast as possible to a human-readable display, so perhaps you can try the same test writing to a file instead. I expect the times will be more similar.
On my laptop your code takes 4.8 milliseconds if writing to the terminal. It takes only 460 microseconds if writing to a file.
TL;DR: run stupid benchmarks, get stupid times.

IDLE is written in Python and uses tkinter, which wraps tcl/tk. By default, IDLE runs your code in a separate process, with output sent through a socket for display in IDLE's Shell window. So there is extra overhead for each print call. For me, on a years-old Windows machine, the 1000 line prints take about 3 seconds, or 3 milliseconds per print.
If you print the the 1000 lines with one print call, as with
print('\n'.join(str(i) for i in range(1000)))
the result may take a bit more that 3 milliseconds but it is still subjectly almost 'instantaneous'.
Note: in 3.6.7 and 3.7.1, single 'large' prints, where 'large' can be customized by the user, are squeezed down to a label that can be expanded either in-place or in a separate window.

Different time taken by python script every time it is runned?

I am working on a Opencv based Python project. I am working on program development which takes less time to execute. For that i have tested my small program print hello world on python to test the time taken to run the program. I had run many time and every time it run it gives me a different run time.
Can you explain me why a simple program is taking different time to execute?
I need my program to be independent of system processes ?

Python gets different amounts of system resources depending upon what else the CPU is doing at the time. If you're playing Skyrim with the highest graphics levels at the time, then your script will run slower than if no other programs were open. But even if your task bar is empty, there may be invisible background processes confounding things.
If you're not already using it, consider using timeit. It performs multiple runs of your program in order to smooth out bad runs caused by a busy OS.
If you absolutely insist on requiring your program to run in the same amount of time every time, you'll need to use an OS that doesn't support multitasking. For example, DOS.

Python: Run multiple instances at once with different value set for each instance

I'm starting up in Python and have done a lot of programming in the past in VB. Python seems much easier to work with and far more powerful. I'm in the process of ditching Windows altogether and have quite a few VB programs I've written and want to use them on Linux without having to touch anything that involves Windows.
I'm trying to take one of my VB programs and convert it to Python. I think I pretty much have.
One thing I never could find a way to do in VB was to use Program 1, a calling program, to call Program 2 and run in it multiple times. I had a program that would search a website looking for new updated material, everything was updated numerically(1234567890_1.mp3, for example). Not every value was used and I would have to search to find which files existed and which didn't. Typically the site would run through around 100,000 possible files a day with only 2-3 files actually being used each day. I had the program set up to search 10,000 files and if it found a file that existed it downloaded it and then moved to the next possible file and tested it. I would run this program, simultaneously 10 times and have each program set up to search a separate 10,000 file block. I always wanted to set up it so I could have a calling program that would have the user set the Main Block(1234) and the Secondary Block(5) with the Secondary Block possibly being a range of values. The calling program then would start up 10 separate programs(6, err 0-9 in reality) and would use Main Block and Secondary Block as the values to set up the call for each of the 10 Program 2s. When each one of the 10 programs got called, all running at the same time, they would be called with the appropriate search locations so they would be searching the website to find what new files had been added throughout the previous day. It would only take 35-45 minutes to complete each day versus multiple hours if I ran through everything in one long continuous program.
I think I could do this with Python using a Program 1(set) and Program 2(read) .txt file. I'm not sure if I would run into problems possibly with changing the set value before Program 2 could have read the value and started using it. I think I would have a to add a pause into the program to play it safe... I'm not really sure.
Is there another way I could pass a value from Program 1 to Program 2 to accomplish the task I'm looking to accomplish?

It seems simple enough to have a master class (a block as you would call it), that would contain all of the different threads/classes in a list. Communication wise, there could simply be a communication stucture like so:
Thread <--> Master <--> Thread
or
Thread <---> Thread <---> Thread
The latter would look more like a web structure.
The first option would probably be a bit more complicated since you have a "middle man". However, this does allow for mass communication. Also all that would need to passed to the other classes is the class Master, or a function that will provide communication
The second option allows for direct communication between threads. So if there is two threads that need to work together a lot, and you don't want to have to deal with other threads possibly interacting to the commands, then the second option is the best. However, the list of classes would have to be sent to the function.
Either way you are going to need to have a Master thread at the central of communication (for simplicity reasons). Otherwise you could get into sock and file stuff, but the other one is faster, more efficient, and is less likely to cause headaches.
Hopefully you understood what I was saying. It is all theoretical, but I have used similar systems in my programs.

There are a bunch of ways to do this; probably the simplest is to use os.spawnv to run multiple instances of Program2 and pass the appropriate values to each, ie
import os
import sys
def main(urlfmt, start, end, threads):
start, end, threads = int(start), int(end), int(threads)
per_thread = (end - start + 1) // threads
for i in range(threads):
s = str(start + i*per_thread)
e = str(min(s + per_thread - 1, end))
outfile = 'output{}.txt'.format(i+1)
os.spawnv(os.P_NOWAIT, 'program2.exe', [urlfmt, s, e, outfile])
if __name__=="__main__":
if len(sys.argv) != 4:
print('Usage: web_search.py urlformat start end threads')
else:
main(*(sys.argv))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.