I'm using GNU Parallel to run a Python script for a list of different arguments. Inside the Python script, I'm writing data to a file (in fact, the name of the file is the script argument). The Python script writes the data to the file after processing N trials, where N is another argument. Consequently, the data does not get written until all trials are finished. But the time to go through a trial can vary depending on a number of test arguments. For this reason, should the script take too long for a certain set of arguments, the script allows me to raise a KeyboardInterrupt error (Ctrl+C) and write the data it has obtained so far before terminating.
However, by using GNU Parallel, using Ctrl+C will kill the parallel command, and completely stop the Python jobs, hence no data-so-far being written.
Is it possible to raise KeyboardInterrupt in these Python scripts to have them finish handling the error before parallel is killed? Ideally, it would go something like 1. Execute parallel python script.py ::: args, 2. After an amount of time, cancel using Ctrl+C, 3. Parallel tells Python scripts to see a KeyboardInterrupt (or any error, it doesn't matter) and Parallel pauses to wait for Python jobs to finish handling, 4. Parallel terminates, 5. I have files with the data obtained in that time.
Note: I would like an answer that doesn't ask to rewrite the Python script's data writing method.
I believe you are looking for --termseq. myprog.pl:
#!/usr/bin/perl
$SIG{'TERM'} = sub { print "TERM received. Flush files.\n"; sleep(1); };
sleep(100);
Now run:
parallel --termseq TERM,2000,KILL,20 -u ./myprog.pl ::: 1 2 3
When GNU Parallel receives ctrl-c it will send SIGTERM to the child, wait 2000 ms and if the child is still alive kill the child.
Wait a few seconds and press ctrl-c
If you are absolutely sure the Python program will exit after receiving the SIGTERM then you can remove ,KILL,20. It is just a fall back if the Python program is stuck for some reason.
Related
I am working with a groundwater modeling executable (HYDRUS1D) which I call with a Python script. I want to do some Monte Carlo runs but sometimes the program gets hung up and does not converge for extended periods of time.
Is there a way to give the executable a certain amount of time to run, cancel it if it goes over this time, and then start a new simulation all without interrupting the Python script? The simulation should take no more than 3-5 seconds, so I am hoping to give it a maximum of 10 seconds to finish.
I first run a function that changes some input parameters to the model, then execute Hydrus via the 'run_single_sim' function:
for value in n_variations_21:
for value2 in n_variations_23:
write_hydraulic_params('foo',layers,value,value2)
run_single_sim()
Where run_single_sim() executes Hydrus via os.system:
def run_single_sim():
os.system('./hydrus LEVEL_01.DIR')
I have tried a few solutions involving threading such as this, and this; but it seems like my script gets stuck on the os.system call and therefore cannot check to see how long the thread has been running or kill the thread after sleeping the script for some specified amount of time.
You asked "how to stop an executable called via Python ...", but I feel
this question is simply about "how to stop an executable".
What's interesting is that we have a child that might misbehave.
The parent is uninteresting, could be rust, ruby, random other language.
The timeout issue you pose is a sensible question,
and there's a stock answer for it, in the GNU coreutils package.
Instead of
os.system('./hydrus LEVEL_01.DIR')
you want
os.system('timeout 10 ./hydrus LEVEL_01.DIR')
Here is a quick demo, using a simpler command than hydrus.
$ timeout 2 sleep 1; echo $?
0
$
$ timeout 2 sleep 3; echo $?
124
As an entirely separate matter, prefer check_output()
over the old os.system().
You quoted a pair of answer articles that deal with threading.
But you're spawning a separate child process,
with no shared memory, so threading's not relevant here.
We wish to eventually send a SIGTERM signal to an ill behaved process,
and we hope it obeys the signal by quickly dropping out.
Timing out a child that explicitly ignores such signals would
be a slightly stickier problem.
An uncatchable SIGKILL can be sent
by using the --kill-after=duration flag.
I want my program to wait until a specific file will contain text instead of empty string. Another program writes data to the file. When I run the first program my computer starts overheating because of the while loop that continously checks the file content. What can I do instead of that loop?
A better solution would be to start that process from within your Python script:
from subprocess import call
retcode = call(['myprocess', 'arg1', 'arg2', 'argN'])
Check if retcode is zero, this means success--your process ran successfully with no problems. You could also use os.system instead of subprocess.call. Once the process is finished, you would know now you can read the file.
Why this method is better than monitoring files?
The process might fail and there might be no output in the file you're trying to read from.
In this case scenario, your process will check the file again and again, looking for data, this wastes kernel I/O operation time. There's nothing that could guarantee that the process will succeed at all times.
The process may receive signals, (i,e. STOP and CONT), if the process received the STOP signal, the kernel will stop the process and there might be nothing that you could read from the output file, especially if you intend to read all the data at once like when you're sorting a file. Once the process receives CONT signal, there the process will start again. Basically, this means your Python script will be trying to read simultaneously from the file while the process is stopped.
The disadvantage of this method is that, the process needs to finish first before your Python script process the output from the file. The subprocess.call blocks, the next line won't be executed by Python interpreter until the spawned process finishes first, you could instead use subprocess.Popen which is non-blocking. Even better and if possible, redirect the output of the process to stdout and use Popen to read the output of your process from its stdout and then write the output from the Python script to a file.
thanks for helping!
I want to start and stop a Python script from a shell script. The start works fine, but I want to stop / terminate the Python script after 10 seconds. (it's a counter that keeps counting). bud is won't stop.... I think it is hanging on the first line.
What is the right way to start wait for 10 seconds en stop?
Shell script:
python /home/pi/count1.py
sleep 10
kill /home/pi/count1.py
It's not working yet. I get the point of doing the script on the background. That's working!. But I get another comment form my raspberry after doing:
python /home/pi/count1.py &
sleep 10; kill /home/pi/count1.py
/home/pi/sebastiaan.sh: line 19: kill: /home/pi/count1.py: arguments must be process or job IDs
It's got to be in the: (but what? Thanks for helping out!)
sleep 10; kill /home/pi/count1.py
You're right, the shell script "hangs" on the first line until the python script finishes. If it doesn't, the shell script won't continue. Therefore you have to use & at the end of the shell command to run it in the background. This way, the python script starts and the shell script continues.
The kill command doesn't take a path, it takes a process id. After all, you might run the same program several times, and then try to kill the first, or last one.
The bash shell supports the $! variable, which is the pid of the last background process.
Your current example script is wrong, because it doesn't run the python job and the sleep job in parallel. Without adornment, the script will wait for the python job to finish, then sleep 10 seconds, then kill.
What you probably want is something like:
python myscript.py & # <-- Note '&' to run in background
LASTPID=$! # Save $! in case you do other background-y stuff
sleep 10; kill $LASTPID # Sleep then kill to set timeout.
You can terminate any process from any other if OS let you do it. I.e. if it isn't some critical process belonging to the OS itself.
The command kill uses PID to kill the process, not the process's name or command.
Use pkill for that.
You can also, send it a different signal instead of SIGTERM (request to terminate a program) that you may wish to detect inside your Python application and respond to it.
For instance you may wish to check if the process is alive and get some data from it.
To do this, choose one of the users custom signals and register them within your Python program using signal module.
To see why your script hangs, see Austin's answer.
I am trying to constantly monitor a process which is basically a Python program. If the program stops, then I have to start the program again. I am using another Python program to do so.
For example, say I have to constantly run a process called run_constantly.py. I initially run this program manually, which writes its process ID to the file "PID" (in the location out/PROCESSID/PID).
Now I run another program which has the following code to monitor the program run_constantly.py from a Linux environment:
def Monitor_Periodic_Process():
TIMER_RUNIN = 1800
foo = imp.load_source("Run_Module","run_constantly.py")
PROGRAM_TO_MONITOR = ['run_constantly.py','out/PROCESSID/PID']
while(1):
# call the function checkPID to see if the program is running or not
res = checkPID(PROGRAM_TO_MONITOR)
# if res is 0 then program is not running so schedule it
if (res == 0):
date_time = datetime.now()
scheduler.add_cron_job(foo.Run_Module, year=date_time.year, day=date_time.day, month=date_time.month, hour=date_time.hour, minute=date_time.minute+2)
scheduler.start()
scheduler.get_jobs()
time.sleep(TIMER_NOT_RUNIN)
continue
else:
#the process is running sleep and then monitor again
time.sleep(TIMER_RUNIN)
continue
I have not included the checkPID() function here. checkPID() basically checks if the process ID still exists (i.e. if the program is still running) and if it does not exist, it returns 0. In the above program, I check if res == 0, and if so, then I use Python's scheduler to schedule the program. However, the major problem that I am currently facing is that the process ID of this program and the run_constantly.py program turns to be same once I schedule the run_constantly.py using the scheduler.add_cron_job() function. So if the program run_constantly.py crashes, the following program still thinks that the run_constantly.py is running (since both process IDs are same), and therefore continues to go into the else loop to sleep and monitor again.
Can someone tell me how to solve this issue? Is there a simple way to constantly monitor a program and reschedule it when it has crashed?
There are many programs that can do this.
On Ubuntu there is upstart (installed by default)
Lots of people like http://supervisord.org/
monit as mentioned by #nathan
If you are looking for a python alternative there is a library that has just been released called circus which looks interesting.
And pretty much every linux distro probably has one of these built in.
The choice is really just down to which one you like better, but you would be far better off using one of these than writing it yourself.
Hope that helps
If you are willing to control the monitored program directly from python instead of using cron, have a look at the subprocess module :
The subprocess module allows you to spawn new processes,
connect to their input/output/error pipes, and obtain their return codes.
Check examples like track process status with python on SO for examples and references.
You could just use monit
http://mmonit.com/monit/
It monitors processes and restarts them (and other things.)
I thought I'd add a more versatile solution, which is one that I personally use all the time as well.
It's name is Immortal (source is at https://github.com/immortal/immortal)
To have it monitor and instantly restart a program if it stops, simply run the following command:
immortal <command>
So in your case I would run run_constantly.py like so:
immortal python run_constantly.py
The command ps aux | grep run_constantly.py should return 2 process IDs, one for the Immortal command, and one for the separate command Immortal started (just the regular command. As long as the Immortal process is running, run_constantly.py will stay running.
I'm writing a program in Python that uses a closed source API in Linux. The API sometimes works, and sometimes segfaults - crashing my program also. However, if the program runs for 10 seconds, its past the point where it has a chance of segfaulting and runs forever (errors only happen in the beginning).
I think I need some type of script that:
starts my python program,
waits 10 seconds,
checks if python is still running
if it is running, the script should end itself without ending python
if python is NOT running, then repeat.
Is such a program possible? Will a segfault kill the script also?
Yes, such a program is perfectly possible. You just have to run these two programs in separate processes - SEGFAULT only kills the process in which it has occured.
If you are under Linux, you can use either bash or python if you want. Just start the script that is failling in separate process. Code in python could look similar to this:
import subprocess
import time
start = time.clock()
ret = subprocess.call(['myprog', 'myarg0', ...])
end = time.clock()
if end - start > threshold:
restart()
Also, maybe a return code from such process has some meaningful value when it has finished because of SEGFAULT.
Can you isolate the calls to this buggy API inside a child process? That way you can check the exit status and handle crashes within a Try ... Catch