Ok a bit of background first - I'm intensely working on a Python implementation of Grammatical Evolution. So far I've managed to solve a lot of deployment issues, optimisations regarding exec, strings and class methods.
Currently I'm working on multi-threading and after getting a large amount of advice from different sources I decided to use Mpi4py since it is already featured in the PyOpus library that I'm using as part of my framework.
Running the example directly from windows command line works like a charm, but I've created a hypothetical situation in my head that I would like to solve (or discover that it is too much of a hassle).
The issue:
I'm running this piece of code from a separate file as suggested by Sypder community:
from IPython import get_ipython
ip = get_ipython()
ip.run_cell("!mpiexec -n 4 python myfile.py")
The original file contents are simple:
from pyopus.parallel.cooperative import cOS
from pyopus.parallel.mpi import MPI
from funclib import jobProcessor
if __name__=='__main__':
# Set up MPI
cOS.setVM(MPI())
# This generator produces 100 jobs which are tuples of the form
# (function, args)
jobGen=((jobProcessor, [value]) for value in range(100))
# Dispatch jobs and collect results
results=cOS.dispatch(jobList=jobGen, remote=True)
# Results are put in the list in the ame order as the jobs are generated by jobGen
print("Results: "+str(results))
# Finish, need to do this if MPI is used
cOS.finalize()
Now the question - how do I access the results variable? Because running the file leaves me with the ip object and no idea how to access the variables stored within (or even knowing if the results exist within it).
Thank you!
Related
I have a python code that runs a 2D diffusion simulation for a set of parameters. I need to run the code many times, O(1000), like a Monte Carlo approach, using different parameter settings each time. In order to do this more quickly I want to use all the cores on my machine (or cluster), so that each core runs one instance of the code.
In the past I have done this successfully for serial fortran codes by writing a python wrapper that then used multiprocessing map (or starmap in the case of multiple arguments) to call the fortan code in an ensemble of simulations. It works very nicely in that you loop over the 1000 simulations, and the python wrapper farms out a new integration to a core as soon as it becomes free after completing a previous integration.
However, now when I set this up to do the same to run multiple instances of my python (instead of fortran) code, I find it is incredibly slow, much slower than simply running the code 1000 times in serial on a single core. Using the system monitor I see that one core is working at a time, and it never goes above 10-20% load, while of course I expected to see N cores running near 100% (as is the case when I farm out fortran jobs).
I thought it might be a write issue, and so I checked the code carefully to ensure that all plotting is switched off, and in fact there is no file/disk access at all, I now merely have one print statement at the end to print out a final diagnostic.
The structure of my code is like this
I have the main python code in toy_diffusion_2d.py which has a single arg of a dictionary with the run parameters in it:
def main(arg)
loop over timesteps:
calculation simulation over a large-grid
print the result statistic
And then I wrote a "wrapper" script, where I import the main simulation code and try to run it in parallel:
from multiprocessing import Pool,cpu_count
import toy_diffusion_2d
# dummy list of arguments
par1=[1,2,3]
par2=[4,5,6]
# make a list of dictionaries to loop over, 3x3=9 simulations in all.
arglist=[{"par1":p1,"par2":p2} for p1 in par1 for p2 in par2]
ncore=min(len(arglist),int(cpu_count()))
with Pool(processes=ncore) as p:
p.map(toy_diffusion_2d.main,arglist)
The above is a shorter paraphrased example, my actual codes are longer, so I have placed them here:
Main code: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_2d.py
You can run this with the default values like this:
python3 toy_diffusion_2d.py
Wrapper script: http://clima-dods.ictp.it/Users/tompkins/files/toy_diffusion_loop.py
You can run a 4 member ensemble like this:
python3 toy_diffusion_loop.py --diffK=[33000,37500] --tau_sub=[20,30]
(note that the final stat is slightly different each run, even with the same values as the model is stochastic, a version of the stochastic allen-cahn equations in case any one is interested, but uses a stupid explicit solver on the diffusion term).
As I said, the second parallel code works, but as I say it is reeeeeallly slow... like it is constantly gating.
I also tried using starmap, but that was not any different, it is almost like the desktop only allows one python interpreter to run at a time...? I spent hours on it, I'm almost at the point to rewrite the code in Fortran. I'm sure I'm just doing something really stupid to prevent parallel execution.
EDIT(1): this problem is occurring on
4.15.0-112-generic x86_64 GNU/Linux, with Python 3.6.9
In response to the comments, in fact I also find it runs fine on my MAC laptop...
EDIT(2): so it seems my question was a bit of a duplicate of several other postings, apologies! As well as the useful links provided by Pavel, I also found this page very helpful: Importing scipy breaks multiprocessing support in Python I'll edit in the solution below to the accepted answer.
The code sample you provide works just fine on my MacOS Catalina 10.15.6. I can guess you're using some Linux distributive, where, according to this answer, it can be the case that numpy import meddles with core affinity due to being linked with OpenBLAS library.
If your Unix supports scheduler interface, something like this will work:
>>> import os
>>> os.sched_setaffinity(0, set(range(cpu_count)))
Another question that has a good explanation of this problem is found here and the solution suggested is this:
os.system('taskset -cp 0-%d %s' % (ncore, os.getpid()))
inserted right before the multiprocessing call.
I am using a commercial application called Abaqus/CAE1 with a built-in Python 2.6 interpreter and API. I've developed a long-running script that I'm attempting to split into simultaneous, independent tasks using Python's multiprocessing module. However, once spawned the processes just hang.
The script itself uses various objects/methods available only through Abaqus's proprietary cae module, which can only be loaded by starting up the Python bundled with Abaqus/CAE first, which then executes my script with Python's execfile.
To try to get multiprocessing working, I've attempted to run a script that avoids accessing any Abaqus objects, and instead just performs a calculation and prints the result to file2. This way, I can run the same script from the regular system Python installation as well as from the Python bundled with Abaqus.
The example code below works as expected when run from the command line using either of the following:
C:\some\path>python multi.py # <-- Using system Python
C:\some\path>abaqus python multi.py # <-- Using Python bundled with Abaqus
This spawns the new processes, and each runs the function and writes the result to file as expected. However, when called from the Abaqus/CAE Python environment using:
abaqus cae noGUI=multi.py
Abaqus will then start up, automatically import its own proprietary modules, and then executes my file using:
execfile("multi.py", __main__.__dict__)
where the global namespace arg __main__.__dict__ is setup by Abaqus. Abaqus then checks out licenses for each process successfully, spawns the new processes, and ... and that's it. The processes are created, but they all hang and do nothing. There are no error messages.
What might be causing the hang-up, and how can I fix it? Is there an environment variable that must be set? Are there other commercial systems that use a similar procedure that I can learn from/emulate?
Note that any solution must be available in the Python 2.6 standard library.
System details: Windows 10 64-bit, Python 2.6, Abaqus/CAE 6.12 or 6.14
Example Test Script:
# multi.py
import multiprocessing
import time
def fib(n):
a,b = 0,1
for i in range(n):
a, b = a+b, a
return a
def workerfunc(num):
fname = ''.join(('worker_', str(num), '.txt'))
with open(fname, 'w') as f:
f.write('Starting Worker {0}\n'.format(num))
count = 0
while count < 1000: # <-- Repeat a bunch of times.
count += 1
a=fib(20)
line = ''.join((str(a), '\n'))
f.write(line)
f.write('End Worker {0}\n'.format(num))
if __name__ == '__main__':
jobs = []
for i in range(2): # <-- Setting the number of processes manually
p = multiprocessing.Process(target=workerfunc, args=(i,))
jobs.append(p)
print 'starting', p
p.start()
print 'done starting', p
for j in jobs:
print 'joining', j
j.join()
print 'done joining', j
1A widely known finite element analysis package
2The script is a blend of a fairly standard Python function for fib(), and examples from PyMOTW
I have to write an answer as I cannot comment yet.
What I can imagine as a reason is that python multiprocessing spawns a whole new process with it's own non-shared memory. So if you create an object in your script, the start a new process, that new process contains a copy of the memory and you have two objects that can go into different directions. When something of abaqus is present in the original python process (which I suspect) that gets copied too and this copy could create such a behaviour.
As a solution I think you could extend python with C (which is capable to use multiple cores in a single process) and use threads there.
Just wanted to say that I have run into this exact issue. My solution at the current time is to compartmentalize my scripting. This may work for you if you're trying to run parameter sweeps over a given model, or run geometric variations on the same model, etc.
I first generate scripts to accomplish each portion of my modelling process:
Generate input file using CAE/Python.
Extract data that I want and put it in a text file.
With these created, I use text replacement to quickly generate N python scripts of each type, one for each discrete parameter set I'm interested in.
I then wrote a parallel processing tool in Python to call multiple Abaqus instances as subprocesses. This does the following:
Call CAE through subprocess.call for each model generation script. The script allows you to choose how many instances to run at once to keep you from taking every license on the server.
Execute the Abaqus solver for the generated models using the same, with parameters for cores per job and total number of cores used.
Extract data using the same process as 1.
There is some overhead in repeatedly checking out licenses for CAE when generating the models, but in my testing it is far outweighed by the benefit of being able to generate 10+ input files simultaneously.
I can put some of the scripts up on Github if you think the process outlined above would be helpful for your application.
Cheers,
Nathan
I have a python script with a normal runtime of ~90 seconds. However, when I change only minor things in it (like alternating the colors in my final pyplot figure) and execute it thusly multiple times in quick succession, its runtime increases up to close to 10 minutes.
Some bullet points of what I'm doing:
I'm not downloading anything neither creating new files with my script.
I merely open some locally saved .dat-files using numpy.genfromtxt and crunch some numbers with them.
I transform my data into a rec-array and use indexing via array.columnname extensively.
For each file I loop over a range of criteria that basically constitute different maximum and minimum values for evaluation, and embedded in that I use an inner loop over the lines of the data arrays. A few if's here and there but nothing fancy, really.
I use the multiprocessing module as follows
import multiprocessing
npro = multiprocessing.cpu_count() # Count the number of processors
pool = multiprocessing.Pool(processes=npro)
bigdata = list(pool.map(analyze, range(len(FileEndings))))
pool.close()
with analyze being my main function and FileEndings its input, a string, to create the right name of the file I want to load and the evaluate. Afterwards, I use it a second time with
pool2 = multiprocessing.Pool(processes=npro)
listofaverages = list(pool2.map(averaging, range(8)))
pool2.close()
averaging being another function of mine.
I use numba's #jit decorator to speed up the basic calculations I do in my inner loops, nogil, nopython, and cache all set to be True. Commenting these out doesn't resolve the issue.
I run the scipt on Ubuntu 16.04 and am using a recent Anaconda build of python to compile.
I write the code in PyCharm and run it in its console most of the time. However, changing to bash doesn't help either.
Simply not running the script for about 3 minutes lets it go back to its normal runtime.
Using htop reveals that all processors are at full capacity when running. I am also seeing a lot of processes stemming from PyCharm (50 or so) that are each at equal MEM% of 7.9. The CPU% is at 0 for most of them, a few exceptions are in the range of several %.
Has anyone experienced such an issue before? And if so, any suggestions what might help? Or are any of the things I use simply prone to cause these problems?
May be closed, the problem was caused by a malfunction of the fan in my machine.
I'm trying to write a simple task manager in python that will be used to run a large number of jobs in an LSF cluster. I'm stuck trying to determine (within a python script) the number of running jobs for a given user. On the command line this would come from the command bjobs.
IBM makes available a python wrapper to the LSF C API. Working with one of their examples and some documentation from a copy of the C API that I found online, I have been able to cobble together the following script.
from pythonlsf import lsf
lsf.lsb_init("test")
userArr = lsf.new_stringArray(1)
lsf.stringArray_setitem(userArr, 0, 'my_username')
intp_num_users = lsf.new_intp()
lsf.intp_assign(intp_num_users, 1)
user_info = lsf.lsb_userinfo(userArr,intp_num_users)
The variable user_info has attributes 'numPEND', 'numRESERVE', 'numRUN', and 'numStartJobs', but all of these are 0. They remain zero even when bjobs reports a running job.
Can anyone tell what I might be doing wrong in the code snippet above? I've read through both the C and python documentation several times, but can't find an error.
Is there a way to run cProfile or line_profile on a script on a server?
ie: how could I get the results for one of the two methods on http://www.Example.com/cgi-bin/myScript.py
Thanks!
Not sure what line_profile is. For cProfile, you just need to direct the results to a file you can later read on the server (depending on what kind of access you have to the server).
To quote the example from the docs,
import cProfile
cProfile.run('foo()', 'fooprof')
and put all the rest of the code into a def foo(): -- then later retrieve that fooprof file and analyze it at leisure (assuming your script runs with permissions to write it in the first place, of course).
Of course you can ensure different runs get profiled into different files, etc, etc -- whether this is practical also depends on what kind of access and permissions you're getting from your hosting provider, i.e., how are you allowed to persist data, in a way that lets you retrieve that data later? That's not a question of Python, it's a question of contracts between you and your hosting provider;-).