Daemon dies unexpectedly - python

I have a python script, which I daemonise using this code
def daemonise():
from os import fork, setsid, umask, dup2
from sys import stdin, stdout, stderr
if fork(): exit(0)
umask(0)
setsid()
if fork(): exit(0)
stdout.flush()
stderr.flush()
si = file('/dev/null', 'r')
so = file('daemon-%s.out'%os.getpid(), 'a+')
se = file('daemon-%s.err'%os.getpid(), 'a+')
dup2(si.fileno(), stdin.fileno())
dup2(so.fileno(), stdout.fileno())
dup2(se.fileno(), stderr.fileno())
print 'this file has the output from daemon%s'%os.getpid()
print >> stderr, 'this file has the errors from daemon%s'%os.getpid()
The script is in
while True: try: funny_code(); sleep(10); except:pass;
loop. It runs fine for a few hours and then dies unexpectedly. How do I go about debugging such demons, err daemons.
[Edit]
Without starting a process like monit, is there a way to write a watchdog in python, which can watch my other daemons and restart when they go down? (Who watches the watchdog.)

You really should use python-daemon for this which is a library that implements PEP 3141 for a standard daemon process library. This way you will ensure that your application does all the right things for whichever type of UNIX it is running under. No need to reinvent the wheel.

Why are you silently swallowing all exceptions? Try to see what exceptions are being caught by this:
while True:
try:
funny_code()
sleep(10)
except BaseException, e:
print e.__class__, e.message
pass
Something unexpected might be happening which is causing it to fail, but you'll never know if you blindly ignore all the exceptions.
I recommend using supervisord (written in Python, very easy to use) for daemonizing and monitoring processes. Running under supervisord you would not have to use your daemonise function.

What I've used in my clients is daemontools. It is a proven, well tested tool to run anything daemonized.
You just write your application without any daemonization, to run on foreground; Then create a daemontools service folder for it, and it will discover and automatically restart your application from now on, and every time the system restarts.
It can also handle log rotation and stuff. Saves a lot of tedious, repeated work.

Related

Ensuring order of commands in Python

I have a .jar file that I'm running with arguments via Popen. This server takes about 4 seconds to start up and then dumps out "Server Started" on the terminal and then runs until the user quits the terminal. However, the print and webbrowser.open execute immediately because of Popen and if I use call, they never run at all. Is there a way to ensure that the print and webbrowser don't run until after the server is started other than using wait? Maybe grep for server started?
from subprocess import Popen
import glob
import sys
import webbrowser
reasoner = glob.glob("reasoner*.jar")
reasoner = reasoner.pop()
port = str(input("Enter connection port: "))
space = ""
portArg = ("-p", port)
portArg = space.join(portArg)
print "Navigate to the Reasoner at http://locahost:" + port
reasoner_process = Popen(["java", "-jar", reasoner, "-i", "0.0.0.0", portArg, "--dbconnect", "jdbc:h2:tcp://localhost//tmp/UXDemo;user=sa;password=admin"])
# I want the following to execute after the .jar process above
print "Opening http://locahost:" + port + "..."
webbrowser.open("http://locahost:" + port)
What you're looking to do is a very simple, special version of interacting with a CLI app. So, you have two options.
First, you can use a library like pexpect that's designed to handle driving almost any CLI application. It may be overkill, and there is a bit of a learning curve, but once you get the basics down this will make your problem trivial: you launch the JAR, block expecting "Server Started", then close.
Alternatively, you can do this manually with the Popen pipes. In general this has a lot of problems, but when you know there's going to exactly one output that fits easily into 128 bytes and you don't want to do anything but block on that output and then close the pipe, none of those problems comes up. So:
reasoner_process = Popen(args, stdout=PIPE)
line = reasoner_process.stdout.readline()
if line.strip() != 'Server Started':
# error handling
# Any code that you want to do while the server is running goes here
reasoner_process.stdout.close()
reasoner_process.kill()
reasoner_process.wait()
But first make sure you actually have to kill it; often closing the pipe is sufficient, in which case you can and should leave out the kill(), in which case you can also check the exit code and raise if it's not 0.
Also, you probably want a with contextlib.closing(…) or whatever's appropriate, or just a try/finally to make sure you can raise an exception for error handling and not leak the child. (Python 3.2+ makes this a lot simpler, because it guarantees that both the pipes and the Popen itself are usable as context managers.)
Finally, I was assuming that "runs until the user quits the terminal" means you want to wait for it to start, then leave it running while you do other stuff, then kill it. If your workflow is different, you obviously need to change the order in which you do things.

Python losing control of subprocess?

I'm using a commercial application that uses Python as part of its scripting API. One of the functions provided is something called App.run(). When this function is called, it starts a new Java process that does the rest of the execution. (Unfortunately, I don't really know what it's doing under the hood as the supplied Python modules are .pyc files, and many of the Python functions are SWIG generated).
The trouble I'm having is that I'm building the App.run() call into a larger Python application that needs to do some guaranteed cleanup code (closing a database, etc.). Unfortunately, if the subprocess is interrupted with Ctrl+C, it aborts and returns to the command line without returning control to the main Python program. Thus, my cleanup code never executes.
So far I've tried:
Registering a function with atexit... doesn't work
Putting cleanup in a class __del__ destructor... doesn't work. (App.run() is inside the class)
Creating a signal handler for Ctrl+C in the main Python app... doesn't work
Putting App.run() in a Thread... results in a Memory Fault after the Ctrl+C
Putting App.run() in a Process (from multiprocessing)... doesn't work
Any ideas what could be happening?
This is just an outline- but something like this?
import os
cpid = os.fork()
if not cpid:
# change stdio handles etc
os.setsid() # Probably not needed
App.run()
os._exit(0)
os.waitpid(cpid)
# clean up here
(os.fork is *nix only)
The same idea could be implemented with subprocess in an OS agnostic way. The idea is running App.run() in a child process and then waiting for the child process to exit; regardless of how the child process died. On posix, you could also trap for SIGCHLD (Child process death). I'm not a windows guru, so if applicable and subprocess doesn't work, someone else will have to chime in here.
After App.run() is called, I'd be curious what the process tree looks like. It's possible its running an exec and taking over the python process space. If thats happening, creating a child process is the only way I can think of trapping it.
If try: App.run() finally: cleanup() doesn't work; you could try to run it in a subprocess:
import sys
from subprocess import call
rc = call([sys.executable, 'path/to/run_app.py'])
cleanup()
Or if you have the code in a string you could use -c option e.g.:
rc = call([sys.executable, '-c', '''import sys
print(sys.argv)
'''])
You could implement #tMC's suggestion using subprocess by adding
preexec_fn=os.setsid argument (note: no ()) though I don't see how creating a process group might help here. Or you could try shell=True argument to run it in a separate shell.
You might give another try to multiprocessing:
import multiprocessing as mp
if __name__=="__main__":
p = mp.Process(target=App.run)
p.start()
p.join()
cleanup()
Are you able to wrap the App.Run() in a Try/Catch?
Something like:
try:
App.Run()
except (KeyboardInterrupt, SystemExit):
print "User requested an exit..."
cleanup()

How to know if a running script dies?

So I'm somewhat new to programming and mostly self-taught, so sorry if this question is a bit on the novice side.
I have a python script that runs over long periods (e.g. it downloads pages every few seconds for days at a time.) Sort of a monitoring script for a web app.
Every so often, something will disrupt it, and it'll need restarted. I've gotten these events to a bare minimum but it still happens every few days, and when it does get killed it could be bad news if I don't notice for a few hours.
Right now it's running in a screen session on a VPS.
Could someone point me in the right direction as far as knowing when the script dies / and having it automatically restart?
Would this be something to write in Bash? Or something else? I've never done anything like it before and don't know where to start or even look for information.
You could try supervisord, it's a tool for controlling daemon processes.
You should daemonize your program.
As described in Efficient Python Daemon, you can install and use the python-daemon which implements the well-behaved daemon specification of PEP 3143, "Standard daemon process library".
Create a file mydaemon.py with contents like this:
#!/usr/bin/env python
import daemon
import time
import logging
def do_something():
name = 'mydaemon'
logger = logging.getLogger(name)
handler = logging.FileHandler('/tmp/%s.log' % (name))
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)
logger.setLevel(logging.WARNING)
while True:
try:
time.sleep(5)
with open("/tmp/file-does-not-exist", "r") as f:
f.write("The time is now " + time.ctime())
except Exception, ex:
logger.error(ex)
def run():
with daemon.DaemonContext():
do_something()
if __name__ == "__main__":
run()
To actually run it use:
python mydaemon.py
Which will spawn do_something() within the DaemonContext and then the script mydaemon.py will exit. You can see the running daemon with: pgrep -fl mydaemon.py. This short example will simply log errors to a log file in /tmp/mydaemon.log. You'll need to kill the daemon manually or it will run indefinitely.
To run your own program, just replace the contents of the try block with a call to your code.
I believe a wrapper bash script that executes the python script inside a loop should do the trick.
while true; do
# Execute python script here
echo "Web app monitoring script disrupted ... Restarting script."
done
Hope this helps.
That depends on the kind of failure you want to guard against. If it's just the script crashing, the simplest thing to do would be to wrap your main function in a try/except:
import logging as log
while True:
try:
main()
except:
log.exception("main() crashed")
If something is killing the Python process, it might be simplest to run it in a shell loop:
while sleep 1; do python checker.py; done
And if it's crashing because the machine is going down… well… Quis custodiet ipsos custodes?
However, to answer your question directly: the absolute simplest way to check if it's running from the shell would be to grep the output of ps:
ps | grep "python checker.py" 2>&1 > /dev/null
running=$?
Of course, this isn't fool-proof, but it's generally Good Enough.

Start background process/daemon from CGI script

I'm trying to launch a background process from a CGI scripts. Basically, when a form is submitted the CGI script will indicate to the user that his or her request is being processed, while the background script does the actual processing (because the processing tends to take a long time.) The problem I'm facing is that Apache won't send the output of the parent CGI script to the browser until the child script terminates.
I've been told by a colleague that what I want to do is impossible because there is no way to prevent Apache from waiting for the entire process tree of a CGI script to die. However, I've also seen numerous references around the web to a "double fork" trick which is supposed to do the job. The trick is described succinctly in this Stack Overflow answer, but I've seen similar code elsewhere.
Here's a short script I wrote to test the double-fork trick in Python:
import os
import sys
if os.fork():
print 'Content-type: text/html\n\n Done'
sys.exit(0)
if os.fork():
os.setsid()
sys.exit(0)
# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()
f = open('/tmp/lol.txt', 'w')
while 1:
f.write('test\n')
If I run this from the shell, it does exactly what I'd expect: the original script and first descendant die, and the second descendant keeps running until it's killed manually. But if I access it through CGI, the page won't load until I kill the second descendant or Apache kills it because of the CGI timeout. I've also tried replacing the second sys.exit(0) with os._exit(0), but there is no difference.
What am I doing wrong?
Don't fork - run batch separately
This double-forking approach is some kind of hack, which to me is indication it shouldn't be done :). For CGI anyway. Under the general principle that if something is too hard to accomplish, you are probably approaching it the wrong way.
Luckily you give the background info on what you need - a CGI call to initiate some processing that happens independently and to return back to the caller. Well sure - there are unix commands that do just that - schedule command to run at specific time (at) or whenever CPU is free (batch). So do this instead:
import os
os.system("batch <<< '/home/some_user/do_the_due.py'")
# or if you don't want to wait for system idle,
# os.system("at now <<< '/home/some_user/do_the_due.py'")
print 'Content-type: text/html\n'
print 'Done!'
And there you have it. Keep in mind that if there is some output to stdout/stderr, that will be mailed to the user (which is good for debugging but otherwise script probably should keep quiet).
PS. i just remembered that Windows also has version of at, so with minor modification of the invocation you can have that work under apache on windows too (vs fork trick that won't work on windows).
PPS. make sure the process running CGI is not excluded in /etc/at.deny from scheduling batch jobs
I think there are two issues: setsid is in the wrong place and doing buffered IO operations in one of the transient children:
if os.fork():
print "success"
sys.exit(0)
if os.fork():
os.setsid()
sys.exit()
You've got the original process (grandparent, prints "success"), the middle parent, and the grandchild ("lol.txt").
The os.setsid() call is being performed in the middle parent after the grandchild has been spawned. The middle parent can't influence the grandchild's session after the grandchild has been created. Try this:
print "success"
sys.stdout.flush()
if os.fork():
sys.exit(0)
os.setsid()
if os.fork():
sys.exit(0)
This creates a new session before spawning the grandchild. Then the middle parent dies, leaving the session without a process group leader, ensuring that any calls to open a terminal will fail, making sure there's never any blocking on terminal input or output, or sending unexpected signals to the child.
Note that I've also moved the success to the grandparent; there's no guarantee of which child will run first after calling fork(2), and you run the risk that the child would be spawned, and potentially try to write output to standard out or standard error, before the middle parent could have had a chance to write success to the remote client.
In this case, the streams are closed quickly, but still, mixing standard IO streams among multiple processes is bound to give difficulty: keep it all in one process, if you can.
Edit I've found a strange behavior I can't explain:
#!/usr/bin/python
import os
import sys
import time
print "Content-type: text/plain\r\n\r\npid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.stdout.flush()
if os.fork():
print "\nfirst fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.exit(0)
os.setsid()
print "\nafter setsid pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.stdout.flush()
if os.fork():
print "\nsecond fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.exit(0)
#os.sleep(1) # comment me out, uncomment me, notice following line appear and dissapear
print "\nafter second fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
The last line, after second fork pid, only appears when the os.sleep(1) call is commented out. When the call is left in place, the last line never appears in the browser. (But otherwise all the content is printed to the browser.)
I wouldn't suggets going about the problem this way. If you need to execute some task asynchronously, why not use a work queue like beanstalkd instead of trying to fork off the tasks from the request? There are client libraries for beanstalkd available for python.
I needed to break the stdout as well as the stderr like this:
sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
sys.sterr.flush()
os.close(sys.stderr.fileno()) # Break web pipe
if os.fork(): # Get out parent process
sys.exit()
#background processing follows here
Ok, I'm adding a simpler solution, if you don't need to start another script but continue in the same one to do the long process in background. This will let you give a waiting message instantly seen by the client and continue your server processing even if the client kill the browser session:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
import time
import datetime
print "Content-Type: text/html;charset=ISO-8859-1\n\n"
print "<html>Please wait...<html>\n"
sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
if os.fork(): # Get out parent process
sys.exit()
# Continue with new child process
time.sleep(1) # Be sure the parent process reach exit command.
os.setsid() # Become process group leader
# From here I cannot print to Webserver.
# But I can write in other files or do any long process.
f=open('long_process.log', 'a+')
f.write( "Starting {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(15)
f.write( "Still working {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(300)
f.write( "Still alive - Apache didn't scalped me!\n" )
f.flush()
time.sleep(150)
f.write( "Finishing {0} ...\n".format(datetime.datetime.now()) )
f.flush()
f.close()
I have read half the Internet for one week without success on this one, finally I tried to test if there is a difference between sys.stdout.close() and os.close(sys.stdout.fileno()) and there is an huge one: The first didn't do anything while the second closed the pipe from the web server and completly disconnected from the client. The fork is only necessary because the webserver will kill its processes after a while and your long process probably needs more time to complete.
As other answers have noted, it is tricky to start a persistent process from your CGI script because the process must cleanly dissociate itself from the CGI program. I have found that a great general-purpose program for this is daemon. It takes care of the messy details involving open file handles, process groups, root directory, etc etc for you. So the pattern of such a CGI program is:
#!/bin/sh
foo-service-ping || daemon --restart foo-service
# ... followed below by some CGI handler that uses the "foo" service
The original post describes the case where you want your CGI program to return quickly, while spawning off a background process to finish handling that one request. But there is also the case where your web application depends on a running service which must be kept alive. (Other people have talked about using beanstalkd to handle jobs. But how do you ensure that beanstalkd itself is alive?) One way to do this is to restart the service (if it's down) from within the CGI script. This approach makes sense in an environment where you have limited control over the server and can't rely on things like cron or an init.d mechanism.
There are situations where passing work off to a daemon or cron is not appropriate. Sometimes you really DO need to fork, let the parent exit (to keep Apache happy) and let something slow happen in the child.
What worked for me: When done generating web output, and before the fork:
fflush(stdout), close(0), close(1), close(2); // in the process BEFORE YOU FORK
Then fork() and have the parent immediately exit(0);
The child then AGAIN does
close(0), close(1), close(2);
and also a
setsid();
...and then gets on with whatever it needs to do.
Why you need to close them in the child even though they were closed in the primordial process in advance is confusing to me, but this is what worked. It didn't without the 2nd set of closes. This was on Linux (on a raspberry pi).
I haven't tried using fork but I have accomplished what you're asking by executing a sys.stdout.flush() after the original message, before calling the background process.
i.e.
print "Please wait..."
sys.stdout.flush()
output = some_processing() # put what you want to accomplish here
print output # in my case output was a redirect to a results page
My head still hurting on that one. I tried all possible ways to use your code with fork and stdout closing, nulling or anything but nothing worked. The uncompleted process output display depends on webserver (Apache or other) config, and in my case it wasn't an option to change it, so tries with "Transfer-Encoding: chunked;chunk=CRLF" and "sys.stdout.flush()" didn't worked either. Here is the solution that finally worked.
In short, use something like:
if len(sys.argv) == 1: # I'm in the parent process
childProcess = subprocess.Popen('./myScript.py X', bufsize=0, stdin=open("/dev/null", "r"), stdout=open("/dev/null", "w"), stderr=open("/dev/null", "w"), shell=True)
print "My HTML message that says to wait a long time"
else: # Here comes the child and his long process
# From here I cannot print to Webserver, but I can write in files that will be refreshed in my web page.
time.sleep(15) # To verify the parent completes rapidly.
I use the "X" parameter to make the distinction between parent and child because I call the same script for both, but you could do it simpler by calling another script. If a complete example would be useful, please ask.
For thous that have "sh: 1: Syntax error: redirection unexpected" with the at/batch solution try using something like this:
Make sure that the at command is installed and the user running the application ins't in /etc/at.deny
os.system("echo sudo /srv/scripts/myapp.py | /usr/bin/at now")

Graceful exiting of a program in Python?

I have a script that runs as a
while True:
doStuff()
What is the best way to communicate with this script if I need to stop it but I don't want to kill it if it is in the middle of an operation?
And I'm assuming you mean killing from outside the python script.
The way I've found easiest is
#atexit.register
def cleanup()
sys.unlink("myfile.%d" % os.getpid() )
f = open("myfile.%d" % os.getpid(), "w" )
f.write("Nothing")
f.close()
while os.path.exists("myfile.%d" % os.getpid() ):
doSomething()
Then to terminate the script just remove the myfile.xxx and the application should quit for you. You can use this even with multiple instances of the same script running at once if you only need to shut one down. And it tries to clean up after itself....
The best way is to rewrite the script so it doesn't use while True:.
Sadly, it's impossible to conjecture a good way to terminate this.
You could use the Linux signals.
You could use a timer and stop after a while.
You could have dostuff return a value and stop if the value is False.
You could check for a local file and stop if the file exists.
You could check an FTP site for a remote file and stop of the file exists.
You could check an HTTP web page for information that indicates if your loop should stop or not stop.
You could use OS-specific things like semaphores or shared memory.
I think the most elegant would be:
keep_running = true
while keep_running:
dostufF()
and then dostuff() can set keep_running = false whenever in no longer wants to keep running, then the while loop ends, and everything cleans up nicely.
If that's a console aplication and exiting by pressing Ctrl+C is ok, could that solve your problem?
try:
while True:
doStuff()
except KeyboardInterrupt:
doOtherStuff()
I guess the problem with that approach is that you wouldn't have any control exactly when and where in doStuff the execution is terminated.
Long time ago I've implemented such a thing. It catches Ctrl+C (or keyboard interrupt). It uses my package snuff-utils.
To install:
pip install snuff-utils
from snuff_utils.graceful_exit import graceful_exit
while True:
do_task_until_complete()
if graceful_exit:
do_stuff_before_exit()
break
On Ctrl+C it will log:
An interrupt signal has been received. The signal will be processed according to the logic of the application.
The goal I was after is to exit program but only after finishing already running task.
Be careful with multiprocessing/multithreading. It is not tested.
The signal module can trap signals and react accordingly?

Categories