I want to run a Python script in several parallel processes under MPI, and I need to pass command-line arguments. I'm using the argparse module in Python, but it's a little messy sometimes. If I don't specify the right arguments, all the processes complain, so I get many copies of the same error message.
I tried making only process 0 parse the arguments and then broadcast the results to the other processes, but then the other processes hang when the parsing fails and nothing gets broadcast.
How can I parse the command-line arguments, and print a readable message when the parsing fails?
The extra piece I needed was to wrap a try/finally around the argument parsing step in process 0. In the finally block, broadcast something to the other processes. If parsing failed, you will broadcast None, and they can all silently exit.
from mpi4py import MPI
from time import sleep
import argparse
def parseOptions(comm):
parser = argparse.ArgumentParser(
description='Print some messages.')
parser.add_argument('iteration_count', help='How many times', type=int)
parser.add_argument('message',
help='What to say',
nargs=argparse.OPTIONAL,
default='Hello, World!')
args = None
try:
if comm.Get_rank() == 0:
args = parser.parse_args()
finally:
args = comm.bcast(args, root=0)
if args is None:
exit(0)
return args
def main():
comm = MPI.COMM_WORLD # #UndefinedVariable
rank = comm.Get_rank()
size = comm.Get_size()
args = parseOptions(comm)
if rank == 0:
print args.message
for i in range(args.iteration_count):
if i%size == rank:
print '{} in rank {} started.'.format(i, rank)
sleep(.5)
print '...'
sleep(.5)
print '{} in rank {} ended.'.format(i, rank)
if __name__ == '__main__':
main()
I run the code with a command like this:
mpirun -np 4 python scratch.py 13
If you have an error case, it's usually easiest to just have the processes abort rather than trying to do something fancy to clean up. In your case, you could just have the origin process (rank 0) call abort and cause everyone else to quit:
comm.abort()
That way, you don't have everyone trying to match up the results. They just automatically abort.
Related
How can I keep the ROS Publisher publishing the messages while calling a sub-process:
import subprocess
import rospy
class Pub():
def __init__(self):
pass
def updateState(self, msg):
cmd = ['python3', planner_path, "--alias", search_options, "--plan-file", plan_path, domain_path, problem_path]
subprocess.run(cmd, shell=False, stdout=subprocess.PIPE)
self.plan_pub.publish(msg)
def myPub(self):
rospy.init_node('problem_formulator', anonymous=True)
self.plan_pub = rospy.Publisher("plan", String, queue_size=10)
rate = rospy.Rate(10) # 10hz
rospy.Subscriber('model', String, updateState)
rospy.sleep(1)
rospy.spin()
if __name__ == "__main__":
p_ = Pub()
p_.myPub()
Since subprocess.call is a blocking call your subscription callback may take a long time.
Run the command described by args. Wait for command to complete, then return the returncode attribute.
ROS itself will not call the callback again while it is executed already. This means you are blocking this and potentially also other callbacks to be called in time.
The most simple solution would be to replace subprocess.call by subprocess.Popen which
Execute a child program in a new process
nonblocking.
But keep in mind that this potentially starts the process multiple times quite fast.
Think about starting the process only conditionally if not already running. This can be achieved by checking the process to be finished in another thread. Simple but effective, use boolean flag. Here is a small prototype:
def updateState(self, msg):
#Start the process if not already running
if not self._process_running:
p = subprocess.Popen(...)
self._process_running = True
def wait_process():
while p.poll() is None:
time.sleep(0.1)
self._process_running = False
threading.Thread(target=wait_process).start()
#Other callback code
self.plan_pub.publish(msg)
I have to record a wav file and at the same time I have to analyze it with sox. I am using fifo type file for this operation.
So here I need to start 2 threads at the same time but even if I use the threads I am not able to achieve what I wanna do. Always one executing first and then the other. I want them to be in parallel so that I can do some stuff.
#this should be in one thread
def test_wav(self):
""" analyze the data """
bashCommand = "sox {} -n stat".format(self.__rawfile)
while self.__rec_thread.is_alive():
process = subprocess.Popen(bashCommand.split(),stdout=subprocess.PIPE,stderr=subprocess.PIPE)
wav_output = process.communicate()[1] #sox outputs the details in stderr
#do something and return
#this should be in another thread
def record_wav(self):
bashCommand = "arecord -d 10 -c 2 -r 48000 -f S32_LE > {}".format(self.__rawfile)
pid = subprocess.Popen(bashCommand.split())
pid.wait()
if pid.returncode != 0:
raise RecordException("Failed while recording with error {}".format(pid.returncode))
I tried the following code to make them threads but failed(Always one executing first and then the other. I want them to be in parallel so that I can do some stuff).
imported from threading import Thread
self.__rec_thread = Thread(target = self.record_wav())
amp_thread = Thread(target = self.test_wav())
self.__rec_thread.start()
amp_thread.start()
EDIT: First its executing the record(it minimum takes 10 sec because of the option -d 10) function completely and then the test wav function. Its like calling them one after another.
... target = self.record_wav() ...
is calling record_wav(): it executes immediately, and the program doesn't proceed until record_wav() completes. You almost always want to pass a function (or method) object to target=, almost never the result of executing the function/method. So just lose the parentheses:
... target = self.record_wav ...
if you probably use python3, you can use asyncio to run the shell command in goroutines way.
import asyncio
import sys
async def execute(command, cwd=None, shell=True):
process = await asyncio.create_subprocess_exec(*command,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
cwd=cwd,
shell=shell)
std_out, std_err = await process.communicate()
error = std_err.decode().strip()
result = std_out.decode().strip()
print(result)
print(error)
return result
if sys.platform == "win32":
loop = asyncio.ProactorEventLoop()
asyncio.set_event_loop(loop)
else:
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(
asyncio.gather(execute(["bash", "-c", "echo hello && sleep 2"]), execute(["bash", "-c", "echo ok && sleep 1"])))
except Exception as e:
raise e
finally:
loop.close()
below is my code and im really new to python. from my below code, i will actually create multiple threads (above 1000). but at some point, nearly 800 threads, i get an error message saying "error:cannot start new thread". i did read some about threadpool. i couldnt really understand. in my code, how can i implement threadpool? or at least please explain to me in a simple way
#!/usr/bin/python
import threading
import urllib
lock = threading.Lock()
def get_wip_info(query_str):
try:
temp = urllib.urlopen(query_str).read()
except:
temp = 'ERROR'
return temp
def makeURLcall(arg1, arg2, arg3, file_output, dowhat, result) :
url1 = "some URL call with args"
url2 = "some URL call with args"
if dowhat == "IN" :
result = get_wip_info(url1)
elif dowhat == "OUT" :
result = get_wip_info(url2)
lock.acquire()
report = open(file_output, "a")
report.writelines("%s - %s\n"%(serial, result))
report.close()
lock.release()
return
testername = "arg1"
stationcode = "arg2"
dowhat = "OUT"
result = "PASS"
file_source = "sourcefile.txt"
file_output = "resultfile.txt"
readfile = open(file_source, "r")
Data = readfile.readlines()
threads = []
for SNs in Data :
SNs = SNs.strip()
print SNs
thread = threading.Thread(target = makeURLcalls, args = (SNs, args1, testername, file_output, dowhat, result))
thread.start()
threads.append(thread)
for thread in threads :
thread.join()
Don't implement your own thread pool, use the one that ships with Python.
On Python 3, you can use concurrent.futures.ThreadPoolExecutor to use threads explicitly, on Python 2.6 and higher, you can import Pool from multiprocessing.dummy which is similar to the multiprocessing API, but backed by threads instead of processes.
Of course, if you need to do CPU bound work in CPython (the reference interpreter), you'd want to use multiprocessing proper, not multiprocessing.dummy; Python threads are fine for I/O bound work, but the GIL makes them pretty bad for CPU bound work.
Here's code to replace your explicit use of Threads with multiprocessing.dummy's Pool, using a fixed number of workers that each complete tasks as fast as possible one after another, rather than having an infinite number of one job threads. First off, since the local I/O is likely to be fairly cheap, and you want to synchronize the output, we'll make the worker task return the resulting data rather than write it out itself, and have the main thread do the write to local disk (removing the need for locking, as well as the need for opening the file over and over). This changes makeURLcall to:
# Accept args as a single sequence to ease use of imap_unordered,
# and unpack on first line
def makeURLcall(args):
arg1, arg2, arg3, dowhat, result = args
url1 = "some URL call with args"
url2 = "some URL call with args"
if dowhat == "IN" :
result = get_wip_info(url1)
elif dowhat == "OUT" :
result = get_wip_info(url2)
return "%s - %s\n" % (serial, result)
And now for the code that replaces your explicit thread use:
import multiprocessing.dummy as mp
from contextlib import closing
# Open input and output files and create pool
# Odds are that 32 is enough workers to saturate the connection,
# but you can play around; somewhere between 16 and 128 is likely to be the
# sweet spot for network I/O
with open(file_source) as inf,\
open(file_output, 'w') as outf,\
closing(mp.Pool(32)) as pool:
# Define generator that creates tuples of arguments to pass to makeURLcall
# We also read the file in lazily instead of using readlines, to
# start producing results faster
tasks = ((SNs.strip(), args1, testername, dowhat, result) for SNs in inf)
# Pulls and writes results from the workers as they become available
outf.writelines(pool.imap_unordered(makeURLcall, tasks))
# Once we leave the with block, input and output files are closed, and
# pool workers are cleaned up
it seems to me that in a python code that runs in parallel, an assert that is failed by at least one processor should abort all the processors, so that:
1) the error message is clearly visible (with the stack trace)
2) the remaining processors do not keep waiting forever.
However this is not what the standard assert does.
This question has already been asked in
python script running with mpirun not stopping if assert on processor 0 fails
but I am not satisfied by the answer. There it is suggested to use the comm.Abort() function, but this only answers point 2) above.
So I was wondering: is there a standard "assert" function for parallel codes (eg with mpi4py), or should I write my own assert for that purpose?
Thanks!
Edit -- here is my try (in a class but could be outside), which can surely be improved:
import mpi4py.MPI as mpi
import traceback
class My_code():
def __init__(self, some_parameter=None):
self.current_com = mpi.COMM_WORLD
self.rank = self.current_com.rank
self.nb_procs = self.current_com.size
self.my_assert(some_parameter is not None)
self.parameter = some_parameter
print "Ok, parameter set to " + repr(self.parameter)
# some class functions here...
def my_assert(self, assertion):
"""
this is a try for an assert function that kills
every process in a parallel run
"""
if not assertion:
print 'Traceback (most recent call last):'
for line in traceback.format_stack()[:-1]:
print(line.strip())
print 'AssertionError'
if self.nb_procs == 1:
exit()
else:
self.current_com.Abort()
I think that the following piece of code answers the question. It is derived from the discussion pointed by Dan D.
import mpi4py.MPI as mpi
import sys
# put this somewhere but before calling the asserts
sys_excepthook = sys.excepthook
def mpi_excepthook(type, value, traceback):
sys_excepthook(type, value, traceback)
if mpi.COMM_WORLD.size > 1:
mpi.COMM_WORLD.Abort(1)
sys.excepthook = mpi_excepthook
# example:
if mpi.COMM_WORLD.rank == 0:
# with sys.excepthook redefined as above this will kill every processor
# otherwise this would only kill processor 0
assert 1==0
# assume here we have a lot of print messages
for i in range(50):
print "rank = ", mpi.COMM_WORLD.rank
# with std asserts the code would be stuck here
# and the error message from the failed assert above would hardly be visible
mpi.COMM_WORLD.Barrier()
i'm trying to make my python program interactive in command line, user should be able to do stuff like :
python myprogram.py --create
then
python myprogram.py --send
The problem in this when is that the program stop and restart each time so i lose my variable and object that i created with the first command.
I'm using argparse on this way:
parser = argparse.ArgumentParser()
parser.add_argument('-c','--create' ,help='',action='store_true')
parser.add_argument('-s','--send',help='',action='store_true')
args = parser.parse_args()
if args.create:
create()
elif args.send :
send()
I don't want to stop the program between the command, how to do this ?
example : https://coderwall.com/p/w78iva
Here's a simple interactive script. I use argparse to parse the input lines, but otherwise it is not essential to the action. Still it can be an handy way of adding options to your 'create' command. For example, ipython uses argparse to handle its %magic commands:
import argparse
parser = argparse.ArgumentParser(prog='PROG', description='description')
parser.add_argument('cmd', choices=['create','delete','help','quit'])
while True:
astr = raw_input('$: ')
# print astr
try:
args = parser.parse_args(astr.split())
except SystemExit:
# trap argparse error message
print 'error'
continue
if args.cmd in ['create', 'delete']:
print 'doing', args.cmd
elif args.cmd == 'help':
parser.print_help()
else:
print 'done'
break
This could be stripped down to the while loop, the raw_input line, and your own evaluation of the astr variable.
The keys to using argparse here are:
parse_args can take a list of strings (the result of split()) instead of using the default sys.argv[1:].
if parse_args sees a problem (or '-h') it prints a message and tries to 'exit'. If you want to continue, you need to trap that error, hence the try block.
the output of parse_args is a simple namespace object. You access the arguments as attributes.
you could easily substitute your own parser.
The diffrence in cmd and argparse is that cmd is a "line-oriented command interpreter" while argparse is a parser for sys.argv.
Your example parses sys.argv that you pass while running your program and then if it gets the value you start a function and then quits.
argparse will only parse the sys.argv while running the program.
You could add some code to be able to work with the args you pass like a function or class or make in program menu that you could operate with raw_input.
Example:
class Main():
def __init__(self, create=None, send=None):
if create:
self.create(create)
elif send:
self.send(send)
option = raw_input('What do you want to do now?')
print option
def create(self, val):
print val
def send(self, val):
print val
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument('-c','--create' ,help='',action='store_true')
parser.add_argument('-s','--send',help='',action='store_true')
args = parser.parse_args()
Main(args.create, args.send)
Other then that Python argparse and controlling/overriding the exit status code or python argparse - add action to subparser with no arguments? might help.
In the first it shows how you can override the quit and in the second how can you add subcommands or quitactions.