Kill an MPI process in all machines

Kill an MPI process in all machines - python

Suppose that I run an MPI program involving 25 processes on 25 different machines. The program is initiated at one of them called the "master" with a command like
mpirun -n 25 --hostfile myhostfile.txt python helloworld.py
This is executed on Linux with some bash script and it uses mpi4py. Sometimes, in the middle of execution, I want to stop the program in all machines. I don't care if this is done graciously or not since the data I might need is already saved.
Usually, I press Ctrl + C on terminal of the "master" and I think it works as described above. Is this true? In other words, will it stop this specific MPI program in all machines?
Another method I tried is to get the PID of the process in the "master" and kill it. I am not sure about this either.
Do the above methods work as described? If no, what else do you suggest? Note that I want to avoid the use of MPI calls for that purpose like MPI_Abort that some other discussions here and here suggest.

Related

Disassociating process pipes from the calling shell

I am trying to use Fabric to send commands which will run many physics simulations (executables) on many different computers which all share the same storage. I would like my script to
ssh into a machine
begin the simulation, for example by running run('nohup nice -n 5 ./interp 1 2 7') (the executable is called interp and run is a function from the Fabric.api library)
detach from the shell and run another simulation on another (or the same) computer.
However I cannot get Fabric to accomplish part 3. It hangs up on the first simulation and doesn't detach until the simulation stops, which defeats the whole point.
My problem, according to the documentation
is that
Because Fabric executes a shell on the remote end for each invocation of run or sudo (see also), backgrounding a process via the shell will not work as expected. Backgrounded processes may still prevent the calling shell from exiting until they stop running, and this in turn prevents Fabric from continuing on with its own execution.
The key to fixing this is to ensure that your process’ standard pipes are all disassociated from the calling shell
The documentation provides 3 suggestions, but it is not possible for me to "use a pre-existing daemonization technique," the computers I have access to do not have screen, tmux, or dtach installed (nor can I install them), and the second proposal of including >& /dev/null < /dev/null in my command has not worked either (as far as I can tell it changed nothing).
Is there another way I can disassociate the process pipes from the calling shell?

The documentation you linked to gives an example of nohup use which you haven't followed all that closely. Merging that example with what you've tried so far gives me something that I, since I don't have Fabric installed, cannot test, but might be interesting to try:
run('nohup nice -n 5 ./interp 1 2 7 < /dev/null &> /tmp/interp127.out &')
Redirect output to /dev/null rather than my contrived output file (/tmp/interp127.out) if you don't care what the interp command emits to its stdout/stderr.
Assuming the above works, I'm unsure how you would detect that a simulation has completed, but your question doesn't seem to concern itself with that detail.

Run bash scripts in parallel from python script

I'm facing a problem in python:
My script, at a certain point, has to run some test script written in bash, and I have to do it in parallel, and wait until they end.
I've already tried :
os.system("./script.sh &")
inside a for loop but it did not worked.
Any suggest?
Thank you!
edit
I have nt correctly explained my situation:
My phyton script resides in the home dir;
my sh scripts resides in other dirs, for instance /tests/folder1 and /tests/folder2;
Trying to use os.system implies the usage of os.chdir prior to call os.system (to avoid troubles on "no such files or directory", my .sh scripts contains some relative references), and also this method is blocking my terminal output.
Trying to use Popen and passing all the path fro home folder to my .sh lead to launch zombie processes without any responses or other.
Hope to find a solution,
Thank you guys!

Have you looked at subprocess? The convenience functions call and check_output block, but the default Popen object doesn't:
processes = []
processes.append(subprocess.Popen(['script.sh']))
processes.append(subprocess.Popen(['script2.sh']))
...
return_codes = [p.wait() for p in processes]

Can you use GNU Parallel?
ls test_scripts*.sh | parallel
Or:
parallel ::: script1.sh script2.sh ... script100.sh
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

How to kill a process in python 2.5 on windows 64bit platform?

I have a python script by which i am opening three executable files as follows:
Aa=subprocess.Popen([r"..\location\learning\A.exe"])
Bb=subprocess.Popen([r"..\location\learning\new\B.bat"])
Cc=subprocess.Popen([r"..\location\learning\new\B.bat"])
All the three files are getting opened. Now,next step i want kill these three opened modules.So, firstly i tried to kill "Aa" as follows:
PROCESS_TERMINATE= 1
k = ctypes.windll.kernel32klll
handle = k.OpenProcess(PROCESS_TERMINATE, False,Aa.pid)
k.TerminateProcess(handle, -1)
k.CloseHandle(handle)
But after adding these piece of lines the three modules 'Aa','Bb' and 'Cc' they don't gets opened.So,i want to know some clean solution so that firstly all the three modules gets executed and then after a while they get closed itself. As,i am using python 2.5 on windows 64 bit platform so kindly suggest solution accordingly.

The process returned by subprocess.Popen() has a terminate() method to kill the child process since Python 2.6. To use that you would have to upgrade your Python install or install two versions of Python in parallel.
But a better way is usually to find out how you can tell this process to stop and use that. Many processes stop when you send them certain standard input or when you close the pipes which Popen created.

Python script embedded in shell script, does not exit and daemon solution does not fit the needs

This question extends/revives this one.
The relevance to revive this topic is due to the failure in solving the same problem with the given answers.
The bash script executes a python script embedded. Something like
#!/bin/bash
./pyscript.py
chmod +x pyscript.py permission was given.
Alternative ways to run the script were used.
(python -u pyscript.py or /usr/bin/python pyscript.py)
As the title states the python program does not exit.
I have tried the following attempts within the python script to solve the issue:
sys.exit(0); %the program catches the correct exception
os._exit(1) %does not work and the correct exception is catched
sys.stdout.flush() %to clean the buffer of the stdout
The daemon solution is not suitable for what I need, because running in the background independently from the main script will not wait for the execution of the python program untill the end.
What are the alternative solutions that remain for this case?

Have you tried to use strace -p $PID on the python process? The output will not always be useful however.
From the code perspective, in addition to threads I would check if there are any signal handlers (which maybe do not terminate for some reason).
As far as threads are concerned, you might be interested in this, although I believe someone mentioned it in the other thread.

Finnally the problem is solved.
The program in python wich I've been trying to kill the process runs with multiple threads.
sys.exit(0) only terminates the thread in which the program is called.
The os._exit(1) was called with the sys.exit(0) before its execution (fail!).
By running os._exit(1) without sys.exit(0) before, the program exit the python script.
The reason must be that sys.exit() only terminates the thread in which it is called and allows the program to clean resources, while os._exit() does an abrupt program termination.
Found here.
With this solution it's better guarantee the termination of any task the program should end and then call os._exit.

what I usually do to separate a script from the main shell terminal process is sending the script inside a screen session detached. Then I kill the pid of the script without any trouble.
But for this particular case I want the program waiting for the end of the python subscript and not as a parallel process.

Also, you might want to try the trace module, i.e. running your program with #!/usr/bin/env python -m trace --trace. If python is executing some of your code (which it probably is), it should show you details on that.

Optionally daemonize a Python process

I am looking into daemonization of my Python script and I have found some libraries that can help: daemonic, daemonize and daemon. Each of them have some issues:
daemonic and daemonize will terminate when they cannot create the PID file. daemonic will not even log or print anything. Looking into the code, they actually call os.exit(). I would like an exception or other error message, so I can fallback to running my code in the foreground.
daemon doesn't even install correctly for Python 3. And seeing the last commit was in 2010, I don't expect any updates soon (if ever).
How can I portably (both Python2 and 3) and optionally (falling back to running in foreground) create a daemonized Python script? Of course I could fallback to using the & operator when starting it, but I would like to implement PEP3143.

I am using two solutions
based on zdaemon
based on supervisor
Both packages are written in Python and are daemonizing anything, what can be run from command line. The requirement is, that the command to be run is running in foreground and not trying to daemonize itself.
supervisor is even part of Linux distributions and even though it comes in a bit outdated version, it is very well usable.
Note, that as it controls general command line driven program, it does not require python version being matched with the controlled code.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Kill an MPI process in all machines - python

Related

Disassociating process pipes from the calling shell

Run bash scripts in parallel from python script

How to kill a process in python 2.5 on windows 64bit platform?

Python script embedded in shell script, does not exit and daemon solution does not fit the needs

Optionally daemonize a Python process

Categories

Resources