I am using a commercial application called Abaqus/CAE1 with a built-in Python 2.6 interpreter and API. I've developed a long-running script that I'm attempting to split into simultaneous, independent tasks using Python's multiprocessing module. However, once spawned the processes just hang.
The script itself uses various objects/methods available only through Abaqus's proprietary cae module, which can only be loaded by starting up the Python bundled with Abaqus/CAE first, which then executes my script with Python's execfile.
To try to get multiprocessing working, I've attempted to run a script that avoids accessing any Abaqus objects, and instead just performs a calculation and prints the result to file2. This way, I can run the same script from the regular system Python installation as well as from the Python bundled with Abaqus.
The example code below works as expected when run from the command line using either of the following:
C:\some\path>python multi.py # <-- Using system Python
C:\some\path>abaqus python multi.py # <-- Using Python bundled with Abaqus
This spawns the new processes, and each runs the function and writes the result to file as expected. However, when called from the Abaqus/CAE Python environment using:
abaqus cae noGUI=multi.py
Abaqus will then start up, automatically import its own proprietary modules, and then executes my file using:
execfile("multi.py", __main__.__dict__)
where the global namespace arg __main__.__dict__ is setup by Abaqus. Abaqus then checks out licenses for each process successfully, spawns the new processes, and ... and that's it. The processes are created, but they all hang and do nothing. There are no error messages.
What might be causing the hang-up, and how can I fix it? Is there an environment variable that must be set? Are there other commercial systems that use a similar procedure that I can learn from/emulate?
Note that any solution must be available in the Python 2.6 standard library.
System details: Windows 10 64-bit, Python 2.6, Abaqus/CAE 6.12 or 6.14
Example Test Script:
# multi.py
import multiprocessing
import time
def fib(n):
a,b = 0,1
for i in range(n):
a, b = a+b, a
return a
def workerfunc(num):
fname = ''.join(('worker_', str(num), '.txt'))
with open(fname, 'w') as f:
f.write('Starting Worker {0}\n'.format(num))
count = 0
while count < 1000: # <-- Repeat a bunch of times.
count += 1
a=fib(20)
line = ''.join((str(a), '\n'))
f.write(line)
f.write('End Worker {0}\n'.format(num))
if __name__ == '__main__':
jobs = []
for i in range(2): # <-- Setting the number of processes manually
p = multiprocessing.Process(target=workerfunc, args=(i,))
jobs.append(p)
print 'starting', p
p.start()
print 'done starting', p
for j in jobs:
print 'joining', j
j.join()
print 'done joining', j
1A widely known finite element analysis package
2The script is a blend of a fairly standard Python function for fib(), and examples from PyMOTW
I have to write an answer as I cannot comment yet.
What I can imagine as a reason is that python multiprocessing spawns a whole new process with it's own non-shared memory. So if you create an object in your script, the start a new process, that new process contains a copy of the memory and you have two objects that can go into different directions. When something of abaqus is present in the original python process (which I suspect) that gets copied too and this copy could create such a behaviour.
As a solution I think you could extend python with C (which is capable to use multiple cores in a single process) and use threads there.
Just wanted to say that I have run into this exact issue. My solution at the current time is to compartmentalize my scripting. This may work for you if you're trying to run parameter sweeps over a given model, or run geometric variations on the same model, etc.
I first generate scripts to accomplish each portion of my modelling process:
Generate input file using CAE/Python.
Extract data that I want and put it in a text file.
With these created, I use text replacement to quickly generate N python scripts of each type, one for each discrete parameter set I'm interested in.
I then wrote a parallel processing tool in Python to call multiple Abaqus instances as subprocesses. This does the following:
Call CAE through subprocess.call for each model generation script. The script allows you to choose how many instances to run at once to keep you from taking every license on the server.
Execute the Abaqus solver for the generated models using the same, with parameters for cores per job and total number of cores used.
Extract data using the same process as 1.
There is some overhead in repeatedly checking out licenses for CAE when generating the models, but in my testing it is far outweighed by the benefit of being able to generate 10+ input files simultaneously.
I can put some of the scripts up on Github if you think the process outlined above would be helpful for your application.
Cheers,
Nathan
Related
Let's say I have a python script to read and process a csv in which each line can be processed independently. Then lets say I have another python script that I am using to call the original script using os.system() as such:
Script A:
with open(sys.argv[1], r) as f:
# do some processing for each line
Script B:
import os
os.system('python Script_A.py somefile.csv')
How are computing resources shared between Script A and Script B? How does the resource allocation change if I call Script A as a subprocess of Script B instead of a system command? What happens if I multiprocess within either of those scenarios?
To further complicate, how would the GIL play with these different instances of python?
I am not looking for a library or a solution, but rather I'd like to understand, from the lens of python, how resources would be allocated in such scenarios so that I can optimize my code to my processing use-case.
Cheers!
Ok a bit of background first - I'm intensely working on a Python implementation of Grammatical Evolution. So far I've managed to solve a lot of deployment issues, optimisations regarding exec, strings and class methods.
Currently I'm working on multi-threading and after getting a large amount of advice from different sources I decided to use Mpi4py since it is already featured in the PyOpus library that I'm using as part of my framework.
Running the example directly from windows command line works like a charm, but I've created a hypothetical situation in my head that I would like to solve (or discover that it is too much of a hassle).
The issue:
I'm running this piece of code from a separate file as suggested by Sypder community:
from IPython import get_ipython
ip = get_ipython()
ip.run_cell("!mpiexec -n 4 python myfile.py")
The original file contents are simple:
from pyopus.parallel.cooperative import cOS
from pyopus.parallel.mpi import MPI
from funclib import jobProcessor
if __name__=='__main__':
# Set up MPI
cOS.setVM(MPI())
# This generator produces 100 jobs which are tuples of the form
# (function, args)
jobGen=((jobProcessor, [value]) for value in range(100))
# Dispatch jobs and collect results
results=cOS.dispatch(jobList=jobGen, remote=True)
# Results are put in the list in the ame order as the jobs are generated by jobGen
print("Results: "+str(results))
# Finish, need to do this if MPI is used
cOS.finalize()
Now the question - how do I access the results variable? Because running the file leaves me with the ip object and no idea how to access the variables stored within (or even knowing if the results exist within it).
Thank you!
I have a bunch of .py scripts as part of a project. Some of them i want to start and have running in the background whilst the others run through what they need to do.
For example, I have a script which takes a Screenshot every 10 seconds until the script is closed and i wish to have this running in the background whilst the other scripts get called and run through till finish.
Another example is a script which calculates the hash of every file in a designated folder. This has the potential to run for a fair amount of time so it would be good if the rest of the scripts could be kicked off at the same time so they do not have to wait for the Hash script to finish what it is doing before they are invoked.
Is Multiprocessor the right method for this kind of processing, or is there another way to achieve these results which would be better such as this answer: Run multiple python scripts concurrently
You could also use something like Celery to run the tasks async and you'll be able to call tasks from within your python code instead of through the shell.
It depends. With multiprocessing you can create a process manager, so it can spawn the processes the way you want, but there are more flexible ways to do it without coding. Multiprocessing is usually hard.
Check out circus, it's a process manager written in Python that you can use as a library, standalone or via remote API. You can define hooks to model dependencies between processes, see docs.
A simple configuration could be:
[watcher:one-shot-script]
cmd = python script.py
numprocesses = 1
warmup_delay = 30
[watcher:snapshots]
cmd = python snapshots.py
numprocesses = 1
warmup_delay = 30
[watcher:hash]
cmd = python hashing.py
numprocesses = 1
Currently, I have two programs, one running on Ruby and the other in Python. I need to read a file in Ruby but I need first a library written in Python to parse the file. Currently, I use XMLRPC to have the two programs communicate. Porting the Python library to Ruby is out of question. However, I find and read that using XMLRPC has some performance overhead. Recently, I read that another solution for the Ruby-Python conundrum is the use of pipes. So I tried to experiment on that one. For example, I wrote this master script in ruby:
(0..2).each do
slave = IO.popen(['python','slave.py'],mode='r+')
slave.write "master"
slave.close_write
line = slave.readline
while line do
sleep 1
p eval line
break if slave.eof
line = slave.readline
end
end
The following is the Python slave:
import sys
cmd = sys.stdin.read()
while cmd:
x = cmd
for i in range(0,5):
print "{'%i'=>'%s'}" % (i, x)
sys.stdout.flush()
cmd = sys.stdin.read()
Everything seems to work fine:
~$ ruby master.rb
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
{"0"=>"master"}
{"1"=>"master"}
{"2"=>"master"}
{"3"=>"master"}
{"4"=>"master"}
My question is, is it really feasible to implement the use of pipes for working with objects between Ruby and Python? One consideration is that there may be multiple instances of master.rb running. Will concurrency be an issue? Can pipes handle extensive operations and objects to be passed in between? If so, would it be a better alternative for RPC?
Yes. No. If you implement it, yes. Depends on what your application needs.
Basically if all you need is simple data passing pipes are fine, if you need to be constantly calling functions on objects in your remote process then you'll probably be better of using some form of existing RPC instead of reinventing the wheel. Whether that should be XMLRPC or something else is another matter.
Note that RPC will have to use some underlying IPC mechanism, which could well be pipes. but might also be sockets, message queues, shared memory, whatever.
I've written a little Python (2.7.2+) module (called TWProcessing) that can be described as an improvised batch manager. The way it works is that I pass it a long list of commands that it will then run in parallel, but limiting the total number of simultaneous processes. That way, if I have 500 commands I would like to run, it will loop through all of them, but only running X of them at a time so as to not overwhelm the machine. The value of X can be easily set when declaring an instance of this batch manager (the class is called TWBatchManager) :
batch = TWProcessing.TWBatchManager(MaxJobs=X)
I then add a list of jobs to this object in a very straightforward manner :
batch.Queue.append(/CMD goes here/)
Where Queue is a list of commands that the batch manager will run. When the queue has been filled, I then call Run() which loops through all the commands, only running X at a time :
batch.Run()
So far, everything works fine. Now what I'd like to do is be able to change the value of X (i.e. the maximum number of processes running at once) dynamically i.e. while the processes are still running. My old way of doing this was rather straightforward. I had a file called MAXJOBS that the class would know to look at, and, if it existed, it would check it regularly to see if the desired value has changed. Now I'd like to try something a bit more elegant. I would like to be able to write something along the lines of export MAXJOBS=newX in the bash shell that launched the script containing the batch manager, and have the batch manager realize that this is now the value of X it should be using. Obviously os.environ['MAXJOBS'] is not what I'm looking for, because this is a dictionary that is loaded on startup. os.getenv('MAXJOBS') doesn't cut it either, because the export will only affect child processes that the shell will spawn from then on. So what I need is a way to get back to the environment of the parent process that launched my python script. I know os.ppid will give me the parent pid, but I have no idea how to get from there to the parent environment. I've poked around the interwebz to see if there was a way in which the parent shell could modify the child process environment, and I've found that people tend to insist I not try anything like that, lest I be prepared to do some of the ugliest things one can possibly do with a computer.
Any ideas on how to pull this off? Granted my "read from a standard text file" idea is not so ugly, but I'm new to Python and am therefore trying to challenge myself to do things in an elegant and clean manner to learn as much as I can. Thanks in advance for your help.
For me it looks that you are asking for inter-process communication between a bash script and a python program.
I'm not completely sure about all your requirements, but it might be a candidate for a FIFO (named pipe):
1) make the fifo:
mkfifo batch_control
2) Start the python - server, which reads from the fifo. (Note: the following is only a minimalistic example; you must adapt things:
while True:
fd = file("batch_control", "r")
for cmd in fd:
print("New command [%s]" % cmd[:-1])
fd.close()
3) From the bash script you can than 'send' things to the python server by echo-ing strings into the fifo:
$ echo "newsize 800" >batch_control
$ echo "newjob /bin/ps" >batch_control
The output of the python server is:
New command [newsize 800]
New command [newjob /bin/ps]
Hope this helps.