How to configure Supervisord for FreeSWITCH? - python

I am trying to configure a Supervisor for controlling the FreeSWITCH. Following is the configuration at this moment present in supervisord.conf.
[program:freeswitch]
command=/usr/local/freeswitch/bin/freeswitch -nc -u root -g root
numprocs=1
stdout_logfile=/var/log/supervisor/freeswitch.log
stderr_logfile=/var/log/supervisor/freeswitch.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
When I an starting the supervisor using supervisord command, it starts the freeswitch process without any error. But when I am trying to restart the freeswitch using supervisorctl command, its not working and giving following errors.
freeswitch: ERROR (not running)
freeswitch: ERROR (abnormal termination)
I am not able to see any error reported in log(/var/log/supervisor/freeswitch.log). However I am seeing following:
1773 Backgrounding.
1777 Backgrounding.
1782 Backgrounding.
It seems its starting three process of freeswitch. Isn't this wrong?
Can someone point out what's problem here and provide correct configuration if require ?

supervisor requires that the run programs do not fork to background; after all it was created to run as background processes those programs for which it would be infeasible or difficult to make correctly daemonising code. Thus for each program you run with supervisor, make sure that it does not fork on background.
In the case of freeswitch, just remove the -nc option to cause it run on foreground; supervisor will direct the standard streams appropriately and restart the process if it crashed.

keep in mind that processes inherit the ulimits values of the parent process, so in your case freeswitch will run with the same ulimits as its parent process supervisord ... which I don't think is what you would want for a ressource intensive application such as freeswitch, this said using supervisord with freeswitch is actually a very bad idea.
if you must stick to it, then you will have to find out in the documentation how to raise all of ulimit values for supervisord.

Related

Watchdog for specific python process

I am working on Ubuntu 16.04 and I have a python running process in the background
python myFunction.py
From time to time, myFunction process gets killed for unknown reason, however, I want to restart it automatically. I have multiple python process running in the background, and I do not know which one runs myFunctions.py (e.g. by using the pgrep command).
Is it possible? Can I make a bash or python script to restart the command python myFunction.py whenever the python process running it gets killed?
You can look at Supervisord which is (from its own documentation) :
a client/server system that allows its users to monitor and control a
number of processes on UNIX-like operating systems
Supervisord will keep your script in check. If it crashes, it will restart it again. If your raspberry reboots it will make sure the scripts starts automically after booting.
It works based on a config file formatted like this (more info in the docs) :
[program:myFunction]
command=/path_to_script/myFunction.py
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/myFunction.log
stderr_logfile=/var/log/myFunction.error.log
directory=/path_to_script
I hope this will help you

Run python script with supervisord

I have a simple python script (discord bot) and it works well when I run it with command python3 discord_bot.py & or sh start_bot.sh.
But how can I run it with supervisord?
Update:
I have installed supervisord. But when I try to run process, I have error:
exit status 0; not expected
My supervisord config:
[program:AFI]
command=/home/maksymov/www/Bots/discord_bots/afi/start_bot.sh
autostart=true
autorestart=true
stderr_logfile=/var/log/afi.err.log
stdout_logfile=/var/log/afi.out.log
user=www-data
Probably you need to use one of the "supervisors". Like system.d or ramona
The first one is more general. The second is more "python-specific"
I guess your program tries to run as a daemon. I pasted the most relevant part from documentation:
Supervisord subprocess
Programs meant to be run under supervisor should not daemonize themselves. Instead, they should run in the foreground. They should not detach from the terminal from which they are started.
The easiest way to tell if a program will run in the foreground is to run the command that invokes the program from a shell prompt. If it gives you control of the terminal back, but continues running, it’s daemonizing itself and that will almost certainly be the wrong way to run it under supervisor.

Is it possible to run SLURM jobs in the background using SRUN instead of SBATCH?

I was trying to run slurm jobs with srun on the background. Unfortunately, right now due to the fact I have to run things through docker its a bit annoying to use sbatch so I am trying to find out if I can avoid it all together.
From my observations, whenever I run srun, say:
srun docker image my_job_script.py
and close the window where I was running the command (to avoid receiving all the print statements) and open another terminal window to see if the command is still running, it seems that my running script is for some reason cancelled or something. Since it isn't through sbatch it doesn't send me a file with the error log (as far as I know) so I have no idea why it closed.
I also tried:
srun docker image my_job_script.py &
to give control back to me in the terminal. Unfortunately, if I do that it still keeps printing things to my terminal screen, which I am trying to avoid.
Essentially, I log into a remote computer through ssh and then do a srun command, but it seems that if I terminate the communication of my ssh connection, the srun command is automatically killed. Is there a way to stop this?
Ideally I would like to essentially send the script to run and not have it be cancelled for any reason unless I cancel it through scancel and it should not print to my screen. So my ideal solution is:
keep running srun script even if I log out of the ssh session
keep running my srun script even if close the window from where I sent the command
keep running my srun script and let me leave the srun session and not print to my scree (i.e. essentially run to the background)
this would be my idea solution.
For the curious crowd that want to know the issue with sbatch, I want to be able to do (which is the ideal solution):
sbatch docker image my_job_script.py
however, as people will know it does not work because sbatch receives the command docker which isn't a "batch" script. Essentially a simple solution (that doesn't really work for my case) would be to wrap the docker command in a batch script:
#!/usr/bin/sh
docker image my_job_script.py
unfortunately I am actually using my batch script to encode a lot of information (sort of like a config file) of the task I am running. So doing that might affect jobs I do because their underlying file is changing. That is avoided by sending the job directly to sbatch since it essentially creates a copy of the batch script (as noted in this question: Changing the bash script sent to sbatch in slurm during run a bad idea?). So the real solution to my problem would be to actually have my batch script contain all the information that my script requires and then somehow in python call docker and at the same time pass it all the information. Unfortunately, some of the information are function pointers and objects, so its not even clear to me how I would pass such a thing to a docker command ran in python.
or maybe being able to run docker directly to sbatch instead of using a batch script with also solve the problem.
The outputs can be redirected with the options -o stdout and -e for stderr.
So, the job can be launched in background and with the outputs redirected:
$ srun -o file.out -e file.errr docker image my_job_script.py &
Another approach is to use a terminal multiplexer like tmux or screen.
For example, create a new tmux window type tmux. In that window, use srun with your script. From there, you can then detach the tmux window, which returns you to your main shell so you can go about your other business, or you can logoff entirely. When you want to check in on your script, just reattach to the tmux window. See the documentation tmux -h for how to detach and reattach on your OS.
Any output redirects using the -o or -e will still work with this technique and you can run multiple srun commands concurrently in different tmux windows. I’ve found this approach useful to run concurrent pipelines (genomics in this case).
I was wondering this too because the differences between sbatch and srun are not very clearly explainer or motivated. I looked at the code and found:
sbatch
sbatch pretty much just sends a shell script to the controller, tells it to run it and then exits. It does not need to keep running while the job is happening. It does have a --wait option to stay running until the job is finished but all it does is poll the controller every 2 seconds to ask it.
sbatch can't run a job across multiple nodes - the code simply isn't in sbatch.c. sbatch is not implemented in terms of srun, it's a totally different thing.
Also its argument must be a shell script. Bit of a weird limitation but it does have a --wrap option so that it can automatically wrap a real program in a shell script for you. Good luck getting all the escaping right with that!
srun
srun is more like an MPI runner. It directly starts tasks on lots of nodes (one task per node by default though you can override that with --ntasks). It's intended for MPI so all of the jobs will run simultaneously. It won't start any until all the nodes have a slot free.
It must keep running while the job is in progress. You can send it to the background with & but this is still different to sbatch. If you need to start a million sruns you're going to have a problem. A million sbatchs should (in theory) work fine.
There is no way to have srun exit and leave the job still running like there is with sbatch. srun itself acts as a coordinator for all of the nodes in the job, and it updates the job status etc. so it needs to be running for the whole thing.

Run Python script forever, logging errors and restarting when crashes

I have a python script that continuously process new data and writes to a mongodb. In the script, its a while loop and a sleep that runs the code continuously.
What is the recommended way to run the Python script forever, logging errors when they occur, and restarting when it crashes?
Will node.js's forever be suitable? I'm also running node/meteor on the same Ubuntu server.
supervisord is perfect for this sort of thing. While I used to check that programs were still running every couple of minutes with a cron job, supervisord runs all programs in an in-process thread, so in the event your program terminates, supervisord will automatically restart the process. I no longer need to parse the output of ps to see if a program crashed.
It has a simple declaritive config file and configurable logging. By default it creates a log file for your-program-name-stderr.log your-program-name-stdout.log which are automatically handled by logrotate when supervisord is installed from an OS package manager (Debian for me).
If you don't want to configure supervisord's logging, you should look at logging in python so you can control what goes into those files.
if you're on a debian derivative you should be able to install and start the daemon simply by executing apt-get install supervisord as root.
The config file is very straightforward too:
[program:myprogram]
command=/path/to/my/program/script
directory=/path/to/my/program/base
user=myuser
autostart=true
autorestart=true
redirect_stderr=True
supervisorctl also allows you to see what your program is doing interactively and can start and stop multiple programs with supervisorctl start myprogram etc
Recently wrote something similar. The basic pattern I follow is
while True:
try:
#functionality
except SpecificError:
#log exception
except: #catch everything else
finally:
time.sleep(600)
to handle reboots you can use init.d or cron jobs.
If you are writing a daemon, you should probably do it with this command:
http://manpages.ubuntu.com/manpages/lucid/man8/start-stop-daemon.8.html
You can spawn this from a System V /etc/init.d/ script, or use Upstart which is slowly replacing it.
Upstart: http://upstart.ubuntu.com/getting-started.html
System V: http://www.cyberciti.biz/tips/linux-write-sys-v-init-script-to-start-stop-service.html
I find System V easier to write, but if this will ever be packaged and distributed in a debian file, I recommend writing an Upstart conf.
Definitely keep the sleep so it won't keep a grip on CPU load.
I don't know if this is still relevant to you, but I have been reading forever about how to do this and want to share somewhere what I did.
For me, the goal was to have a python script running always (on my Linux computer). The python script also has a "while True " loop in it which should theoretically run forever, but if it for any reason I cannot think of would crash, I want the script to restart. Also, when I restart the computer it should run the script.
I am not an expert but for me the best and most understandable was to use systemd (assuming you use Linux).
There are two nice examples of how to do this given here and here, showing how to write your .service files in either /etc/systemd/system or /lib/systemd/system. If you want to be completely correct you should take the former:
" /etc/systemd/system/: units installed by the system administrator" 1
The documentation of systemd here is actually nice to read, even if you are not an expert.
Hope this helps someone!

Python daemon and systemd service

I have a simple Python script working as a daemon. I am trying to create systemd script to be able to start this script during startup.
Current systemd script:
[Unit]
Description=Text
After=syslog.target
[Service]
Type=forking
User=node
Group=node
WorkingDirectory=/home/node/Node/
PIDFile=/var/run/zebra.pid
ExecStart=/home/node/Node/node.py
[Install]
WantedBy=multi-user.target
node.py:
if __name__ == '__main__':
with daemon.DaemonContext():
check = Node()
check.run()
run contains while True loop.
I try to run this service with systemctl start zebra-node.service. Unfortunately service never finished stating sequence - I have to press Ctrl+C.
Script is running, but status is activating and after a while it change to deactivating.
Now I am using python-daemon (but before I tried without it and the symptoms were similar).
Should I implement some additional features to my script or is systemd file incorrect?
The reason, it does not complete the startup sequence is, that for Type forking your startup process is expected to fork and exit (see $ man systemd.service - search for forking).
Simply use only the main process, do not daemonize
One option is to do less. With systemd, there is often no need to create daemons and you may directly run the code without daemonizing.
#!/usr/bin/python -u
from somewhere import Node
check = Node()
check.run()
This allows using simpler Type of service called simple, so your unit file would look like.
[Unit]
Description=Simplified simple zebra service
After=syslog.target
[Service]
Type=simple
User=node
Group=node
WorkingDirectory=/home/node/Node/
ExecStart=/home/node/Node/node.py
StandardOutput=syslog
StandardError=syslog
[Install]
WantedBy=multi-user.target
Note, that the -u in python shebang is not necessary, but in case you print something out to the stdout or stderr, the -u makes sure, there is no output buffering in place and printed lines will be immediately caught by systemd and recorded in journal. Without it, it would appear with some delay.
For this purpose I added into unit file the lines StandardOutput=syslog and StandardError=syslog. If you do not care about printed output in your journal, do not care about these lines (they do not have to be present).
systemd makes daemonization obsolete
While the title of your question explicitly asks about daemonizing, I guess, the core of the question is "how to make my service running" and while using main process seems much simpler (you do not have to care about daemons at all), it could be considered answer to your question.
I think, that many people use daemonizing just because "everybody does it". With systemd the reasons for daemonizing are often obsolete. There might be some reasons to use daemonization, but it will be rare case now.
EDIT: fixed python -p to proper python -u. thanks kmftzg
It is possible to daemonize like Schnouki and Amit describe. But with systemd this is not necessary. There are two nicer ways to initialize the daemon: socket-activation and explicit notification with sd_notify().
Socket activation works for daemons which want to listen on a network port or UNIX socket or similar. Systemd would open the socket, listen on it, and then spawn the daemon when a connection comes in. This is the preferred approch because it gives the most flexibility to the administrator. [1] and [2] give a nice introduction, [3] describes the C API, while [4] describes the Python API.
[1] http://0pointer.de/blog/projects/socket-activation.html
[2] http://0pointer.de/blog/projects/socket-activation2.html
[3] http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[4] http://www.freedesktop.org/software/systemd/python-systemd/daemon.html#systemd.daemon.listen_fds
Explicit notification means that the daemon opens the sockets itself and/or does any other initialization, and then notifies init that it is ready and can serve requests. This can be implemented with the "forking protocol", but actually it is nicer to just send a notification to systemd with sd_notify().
Python wrapper is called systemd.daemon.notify and will be one line to use [5].
[5] http://www.freedesktop.org/software/systemd/python-systemd/daemon.html#systemd.daemon.notify
In this case the unit file would have Type=notify, and call
systemd.daemon.notify("READY=1") after it has established the sockets. No forking or daemonization is necessary.
You're not creating the PID file.
systemd expects your program to write its PID in /var/run/zebra.pid. As you don't do it, systemd probably thinks that your program is failing, hence deactivating it.
To add the PID file, install lockfile and change your code to this:
import daemon
import daemon.pidlockfile
pidfile = daemon.pidlockfile.PIDLockFile("/var/run/zebra.pid")
with daemon.DaemonContext(pidfile=pidfile):
check = Node()
check.run()
(Quick note: some recent update of lockfile changed its API and made it incompatible with python-daemon. To fix it, edit daemon/pidlockfile.py, remove LinkFileLock from the imports, and add from lockfile.linklockfile import LinkLockFile as LinkFileLock.)
Be careful of one other thing: DaemonContext changes the working dir of your program to /, making the WorkingDirectory of your service file useless. If you want DaemonContext to chdir into another directory, use DaemonContext(pidfile=pidfile, working_directory="/path/to/dir").
I came across this question when trying to convert some python init.d services to systemd under CentOS 7. This seems to work great for me, by placing this file in /etc/systemd/system/:
[Unit]
Description=manages worker instances as a service
After=multi-user.target
[Service]
Type=idle
User=node
ExecStart=/usr/bin/python /path/to/your/module.py
Restart=always
TimeoutStartSec=10
RestartSec=10
[Install]
WantedBy=multi-user.target
I then dropped my old init.d service file from /etc/init.d and ran sudo systemctl daemon-reload to reload systemd.
I wanted my service to auto restart, hence the restart options. I also found using idle for Type made more sense than simple.
Behavior of idle is very similar to simple; however, actual execution
of the service binary is delayed until all active jobs are dispatched.
This may be used to avoid interleaving of output of shell services
with the status output on the console.
More details on the options I used here.
I also experimented with keeping the old service and having systemd resart the service but I ran into some issues.
[Unit]
# Added this to the above
#SourcePath=/etc/init.d/old-service
[Service]
# Replace the ExecStart from above with these
#ExecStart=/etc/init.d/old-service start
#ExecStop=/etc/init.d/old-service stop
The issues I experienced was that the init.d service script was used instead of the systemd service if both were named the same. If you killed the init.d initiated process, the systemd script would then take over. But if you ran service <service-name> stop it would refer to the old init.d service. So I found the best way was to drop the old init.d service and the service command referred to the systemd service instead.
Hope this helps!
Also, you most likely need to set daemon_context=True when creating the DaemonContext().
This is because, if python-daemon detects that if it is running under a init system, it doesn't detach from the parent process. systemd expects that the daemon process running with Type=forking will do so. Hence, you need that, else systemd will keep waiting, and finally kill the process.
If you are curious, in python-daemon's daemon module, you will see this code:
def is_detach_process_context_required():
""" Determine whether detaching process context is required.
Return ``True`` if the process environment indicates the
process is already detached:
* Process was started by `init`; or
* Process was started by `inetd`.
"""
result = True
if is_process_started_by_init() or is_process_started_by_superserver():
result = False
Hopefully this explains better.

Categories