So I'm somewhat new to programming and mostly self-taught, so sorry if this question is a bit on the novice side.
I have a python script that runs over long periods (e.g. it downloads pages every few seconds for days at a time.) Sort of a monitoring script for a web app.
Every so often, something will disrupt it, and it'll need restarted. I've gotten these events to a bare minimum but it still happens every few days, and when it does get killed it could be bad news if I don't notice for a few hours.
Right now it's running in a screen session on a VPS.
Could someone point me in the right direction as far as knowing when the script dies / and having it automatically restart?
Would this be something to write in Bash? Or something else? I've never done anything like it before and don't know where to start or even look for information.
You could try supervisord, it's a tool for controlling daemon processes.
You should daemonize your program.
As described in Efficient Python Daemon, you can install and use the python-daemon which implements the well-behaved daemon specification of PEP 3143, "Standard daemon process library".
Create a file mydaemon.py with contents like this:
#!/usr/bin/env python
import daemon
import time
import logging
def do_something():
name = 'mydaemon'
logger = logging.getLogger(name)
handler = logging.FileHandler('/tmp/%s.log' % (name))
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.WARNING)
while True:
try:
time.sleep(5)
with open("/tmp/file-does-not-exist", "r") as f:
f.write("The time is now " + time.ctime())
except Exception, ex:
logger.error(ex)
def run():
with daemon.DaemonContext():
do_something()
if __name__ == "__main__":
run()
To actually run it use:
python mydaemon.py
Which will spawn do_something() within the DaemonContext and then the script mydaemon.py will exit. You can see the running daemon with: pgrep -fl mydaemon.py. This short example will simply log errors to a log file in /tmp/mydaemon.log. You'll need to kill the daemon manually or it will run indefinitely.
To run your own program, just replace the contents of the try block with a call to your code.
I believe a wrapper bash script that executes the python script inside a loop should do the trick.
while true; do
# Execute python script here
echo "Web app monitoring script disrupted ... Restarting script."
done
Hope this helps.
That depends on the kind of failure you want to guard against. If it's just the script crashing, the simplest thing to do would be to wrap your main function in a try/except:
import logging as log
while True:
try:
main()
except:
log.exception("main() crashed")
If something is killing the Python process, it might be simplest to run it in a shell loop:
while sleep 1; do python checker.py; done
And if it's crashing because the machine is going down… well… Quis custodiet ipsos custodes?
However, to answer your question directly: the absolute simplest way to check if it's running from the shell would be to grep the output of ps:
ps | grep "python checker.py" 2>&1 > /dev/null
running=$?
Of course, this isn't fool-proof, but it's generally Good Enough.
Related
In a program I am writing in python I need to completely restart the program if a variable becomes true, looking for a while I found this command:
while True:
if reboot == True:
os.execv(sys.argv[0], sys.argv)
When executed it returns the error [Errno 8] Exec format error. I searched for further documentation on os.execv, but didn't find anything relevant, so my question is if anyone knows what I did wrong or knows a better way to restart a script (by restarting I mean completely re-running the script, as if it were been opened for the first time, so with all unassigned variables and no thread running).
There are multiple ways to achieve the same thing. Start by modifying the program to exit whenever the flag turns True. Then there are various options, each one with its advantages and disadvantages.
Wrap it using a bash script.
The script should handle exits and restart your program. A really basic version could be:
#!/bin/bash
while :
do
python program.py
sleep 1
done
Start the program as a sub-process of another program.
Start by wrapping your program's code to a function. Then your __main__ could look like this:
def program():
### Here is the code of your program
...
while True:
from multiprocessing import Process
process = Process(target=program)
process.start()
process.join()
print("Restarting...")
This code is relatively basic, and it requires error handling to be implemented.
Use a process manager
There are a lot of tools available that can monitor the process, run multiple processes in parallel and automatically restart stopped processes. It's worth having a look at PM2 or similar.
IMHO the third option (process manager) looks like the safest approach. The other approaches will have edge cases and require implementation from your side to handle edge cases.
This has worked for me. Please add the shebang at the top of your code and os.execv() as shown below
#!/usr/bin/env python3
import os
import sys
if __name__ == '__main__':
while True:
reboot = input('Enter:')
if reboot == '1':
sys.stdout.flush()
os.execv(sys.executable, [sys.executable, __file__] + [sys.argv[0]])
else:
print('OLD')
I got the same "Exec Format Error", and I believe it is basically the same error you get when you simply type a python script name at the command prompt and expect it to execute. On linux it won't work because a path is required, and the execv method is basically encountering the same error.
You could add the pathname of your python compiler, and that error goes away, except that the name of your script then becomes a parameter and must be added to the argv list. To avoid that, make your script independently executable by adding "#!/usr/bin/python3" to the top of the script AND chmod 755.
This works for me:
#!/usr/bin/python3
# this script is called foo.py
import os
import sys
import time
if (len(sys.argv) >= 2):
Arg1 = int(sys.argv[1])
else:
sys.argv.append(None)
Arg1 = 1
print(f"Arg1: {Arg1}")
sys.argv[1] = str(Arg1 + 1)
time.sleep(3)
os.execv("./foo.py", sys.argv)
Output:
Arg1: 1
Arg1: 2
Arg1: 3
.
.
.
I have some script in Python, which does some work. I want to re-run this script automatically. Also, I want to relaunch it on any crashes/freezes.
I can do something like this:
while True:
try:
main()
except Exception:
os.execv(sys.executable, ['python'] + sys.argv)
But, for unknown reason, this still crashes or freezes one time in few days. So I see crash, write "Python main.py" in cmd and it started, so I don't know why os.execv don't do this work by self. I guess it's because this code is part of this app. So, I prefer some script/app, which will control relaunch in external way. I hope it will be more stable.
So this script should work in this way:
Start any script
Check that process of this script is working, for example check some file time change and control it by process name|ID|etc.
When it dissapears from process list, launch it again
When file changed more than 5 minutes ago, stop process, wait few sec, launch it again.
In general: be cross-platform (Linux/Windows)
not important log all crashes.
I can do this by self (right now working on it), but I'm pretty sure something like this must already be done by somebody, I just can't find it in Google\Github.
UPDATE: added code from the #hansaplast answer to GitHub. Also added some changes to it: relauncher. Feel free to copy/use it.
As it needs to work both in windows and on linux I don't know a way to do that with standard tools, so here's a DIY solution:
from subprocess import Popen
import os
import time
# change into scripts directory
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
while True:
p = Popen(['python', 'my_script.py', 'arg1', 'arg2'])
time.sleep(20) # give the program some time to write into logfile
while True:
if p.poll() != None:
print('crashed or regularly terminated')
break
file_age_in_s = time.time() - os.path.getmtime('output.log')
if file_age_in_s > 60:
print('frozen, killing process')
p.kill()
break
time.sleep(1)
print('restarting..')
Explanation:
time.sleep(20): give script 20 seconds to write into the log file
poll(): regularly check if script died (either crashed or regularly terminated, you can check the return value of poll() to differentiate that)
getmtime(): regularly check output.log and check if that was changed the past 60 seconds
time.sleep(1): between every check wait for 1s as otherwise it would eat up too many system resources
The script assumes that the check-script and the run-script are in the same directory. If that is not the case, change the lines beneath "change into scripts directory"
I personally like supervisor daemon, but it has two issues here:
It is only for unix systems
It restarts app only on crashes, not freezes.
But it has simple XML-RPC API, so It makes your job to write an freeze-watchdog app simplier. You could just start your process under supervisor and restart it via supervisor API when you see it freezes.
You could install it via apt install supervisor on ubuntu and write config like this:
[program:main]
user=vladimir
command=python3 /var/local/main/main.py
process_name=%(program_name)s
directory=/var/local/main
autostart=true
autorestart=true
I already searched for solutions to my questions and found some, but they don't work for me or are very complicated for what I want to achieve.
I have a python (2.7) script that creates 3 BaseHTTPServers using threads. I now want to be able to close the python script from itself and restart it. For this, I create an extra file called "restart_script" with this content:
sleep 2
python2 myScript.py
I then start this script and after that, close my own python script:
os.system("nohup bash restart_script & ")
exit()
This works quite well, the python script closes and the new one pops up 2 seconds later, but the BaseHTTPServers do not come up, the report that the Address is already in use. (socket.error Errno 98).
I initiate the server with:
httpd = server_class((HOST_NAME, PORT_NUMBER), MyHandler)
Then I let it serve forever:
thread.start_new_thread(httpd.serve_forever, tuple())
I alternatively tried this:
httpd_thread = threading.Thread(target=httpd.serve_forever)
httpd_thread.daemon = True
httpd_thread.start()
But this has the same result.
If I kill the script using strg+c and then start it right again right after that, everything works fine. I think as long as I want to restart the script from its own, the old process is still somehow active and I need to somehow disown it so that the sockets can be cleared.
I am running on Linux (Xubuntu).
How can I really really kill my own script and then bring it up again seconds later so that all sockets are closed?
I found an answer to my specific problem.
I just use another script which starts my main program using os.system(). If the script wants to restart, I just close it regularly and the other script just starts it again, over and over...
If I want to actually close my script, I add a file and check in the other script if this file exists..
The restart-helper-script looks like this:
import os, time
cwd = os.getcwd()
#first start --> remove shutdown:
try:
os.remove(os.path.join(cwd, "shutdown"))
except:
pass
while True:
#check if shutdown requested:
if os.path.exists(os.path.join(cwd, "shutdown")):
break
#else start script:
os.system("python2 myMainScript.py")
#after it is done, wait 2 seconds: (just to make sure sockets are closed.. might be optional)
time.sleep(2)
I'm using a commercial application that uses Python as part of its scripting API. One of the functions provided is something called App.run(). When this function is called, it starts a new Java process that does the rest of the execution. (Unfortunately, I don't really know what it's doing under the hood as the supplied Python modules are .pyc files, and many of the Python functions are SWIG generated).
The trouble I'm having is that I'm building the App.run() call into a larger Python application that needs to do some guaranteed cleanup code (closing a database, etc.). Unfortunately, if the subprocess is interrupted with Ctrl+C, it aborts and returns to the command line without returning control to the main Python program. Thus, my cleanup code never executes.
So far I've tried:
Registering a function with atexit... doesn't work
Putting cleanup in a class __del__ destructor... doesn't work. (App.run() is inside the class)
Creating a signal handler for Ctrl+C in the main Python app... doesn't work
Putting App.run() in a Thread... results in a Memory Fault after the Ctrl+C
Putting App.run() in a Process (from multiprocessing)... doesn't work
Any ideas what could be happening?
This is just an outline- but something like this?
import os
cpid = os.fork()
if not cpid:
# change stdio handles etc
os.setsid() # Probably not needed
App.run()
os._exit(0)
os.waitpid(cpid)
# clean up here
(os.fork is *nix only)
The same idea could be implemented with subprocess in an OS agnostic way. The idea is running App.run() in a child process and then waiting for the child process to exit; regardless of how the child process died. On posix, you could also trap for SIGCHLD (Child process death). I'm not a windows guru, so if applicable and subprocess doesn't work, someone else will have to chime in here.
After App.run() is called, I'd be curious what the process tree looks like. It's possible its running an exec and taking over the python process space. If thats happening, creating a child process is the only way I can think of trapping it.
If try: App.run() finally: cleanup() doesn't work; you could try to run it in a subprocess:
import sys
from subprocess import call
rc = call([sys.executable, 'path/to/run_app.py'])
cleanup()
Or if you have the code in a string you could use -c option e.g.:
rc = call([sys.executable, '-c', '''import sys
print(sys.argv)
'''])
You could implement #tMC's suggestion using subprocess by adding
preexec_fn=os.setsid argument (note: no ()) though I don't see how creating a process group might help here. Or you could try shell=True argument to run it in a separate shell.
You might give another try to multiprocessing:
import multiprocessing as mp
if __name__=="__main__":
p = mp.Process(target=App.run)
p.start()
p.join()
cleanup()
Are you able to wrap the App.Run() in a Try/Catch?
Something like:
try:
App.Run()
except (KeyboardInterrupt, SystemExit):
print "User requested an exit..."
cleanup()
I have a python script, which I daemonise using this code
def daemonise():
from os import fork, setsid, umask, dup2
from sys import stdin, stdout, stderr
if fork(): exit(0)
umask(0)
setsid()
if fork(): exit(0)
stdout.flush()
stderr.flush()
si = file('/dev/null', 'r')
so = file('daemon-%s.out'%os.getpid(), 'a+')
se = file('daemon-%s.err'%os.getpid(), 'a+')
dup2(si.fileno(), stdin.fileno())
dup2(so.fileno(), stdout.fileno())
dup2(se.fileno(), stderr.fileno())
print 'this file has the output from daemon%s'%os.getpid()
print >> stderr, 'this file has the errors from daemon%s'%os.getpid()
The script is in
while True: try: funny_code(); sleep(10); except:pass;
loop. It runs fine for a few hours and then dies unexpectedly. How do I go about debugging such demons, err daemons.
[Edit]
Without starting a process like monit, is there a way to write a watchdog in python, which can watch my other daemons and restart when they go down? (Who watches the watchdog.)
You really should use python-daemon for this which is a library that implements PEP 3141 for a standard daemon process library. This way you will ensure that your application does all the right things for whichever type of UNIX it is running under. No need to reinvent the wheel.
Why are you silently swallowing all exceptions? Try to see what exceptions are being caught by this:
while True:
try:
funny_code()
sleep(10)
except BaseException, e:
print e.__class__, e.message
pass
Something unexpected might be happening which is causing it to fail, but you'll never know if you blindly ignore all the exceptions.
I recommend using supervisord (written in Python, very easy to use) for daemonizing and monitoring processes. Running under supervisord you would not have to use your daemonise function.
What I've used in my clients is daemontools. It is a proven, well tested tool to run anything daemonized.
You just write your application without any daemonization, to run on foreground; Then create a daemontools service folder for it, and it will discover and automatically restart your application from now on, and every time the system restarts.
It can also handle log rotation and stuff. Saves a lot of tedious, repeated work.