uWSGI workers stuck: why

uWSGI workers stuck: why - python

I'm using uwsgi version 2.0.13.1 with the following config:
bin/uwsgi -M -p 5 -C -A 4 -m -b 8192 -s :3031 --wsgi-file bin/django.wsgi --pidfile var/run/uwsgi.pid --touch-reload=var/run/reload-uwsgi.touch --max-requests=1000 --reload-on-rss=450 --py-tracebacker var/run/pytrace --auto-procname --stats 127.0.0.1:3040 --threads 40 --reload-mercy 600 --listen 200
(absolute path names cut)
When I run uwsgitop, all 5 workers appear as busy. When I try to get the stack trace for each worker / thread, using the py-tracebacker, I get no answer. The processes just hang.
How could I research what exact fact is what makes uwsgi processes hang?
How could I prevent this situation?
I know the harakiri parameter but I'm not sure if the process is killed if it has other active threads.
PD: "reload mercy" is set to a very high value avoid the killing of workers with still active threads (seems to be a bug). We have some Web requests which still take a long long time (which are in the way to be converted to jobs).
Thanks in advance.

Although I added already a comment, here goes a longer description.
Warning: the problem only arose when using more than one worker process AND more than one thread (-p --threads).
Short version: In Python 2.7.x some modules are not 100% thread safe during imports (logging, implicit imports of codecs). Try to import all of such problematic modules in the wsgi file given to uwsgi (i.e, before uwsgi forks).
Long version: In https://github.com/unbit/uwsgi/issues/1599 I analysed the problem and found that it could be related to python bug with the logging module. The problem can be solved importing and initializing any critical modules BEFORE uwsgi forks which happens after the wsgi script given to uwsgi is executed.
I finally resolved my problem importing the django settings.ROOT_URLCONF directly or indirectly from the wsgi file. This also has the benefits of reducing memory consumption (because the code base shared between workers is much bigger) and per-worker initialization time.

Related

fork process under uwsgi, django

I need to execute some slow tasks upon receiving a POST request.
My server runs under UWSGI which behaves in a weird manner
Localhost (python manage.py runserver):
when receiving request from browser, I do p = Process(target=workload); p.start(); return redirect(...). Browser immediately follows the redirect, and working process starts in the background.
UWSGI (2 workers):
Background process starts, but Browser doesn't get redirected. It waits until the child exit.
Note, I have added close-on-exec=true (as advised in documentation and in Running a subprocess in uwsgi application) parameter in UWSGI configuration, but that has no visible effect, application waits for child's exit

I imagine Python gets confused since the interpreter multiprocessing.Process() is defaulting to is the the uwsgi binary, not the regular Python interpreter.
Additionally, you may be using the fork spawn method (depending on OS) and forking the uWSGI worker isn't a great idea.
You will likely need to call multiprocessing.set_executable() and multiprocessing.set_spawn_method() when you're running under uWSGI with something like this in e.g. your Django settings.py:
import multiprocessing
import sys
try:
import uwsgi # magic module only available when under uwsgi
except ImportError:
uwsgi = None
if uwsgi:
multiprocessing.set_spawn_method('spawn')
multiprocessing.set_executable(os.path.join(sys.exec_prefix, 'python'))
However, you may want look into using e.g. uWSGI's spooler system or some other work queue/background task system such as Huey, rq, Minique, Celery, etc.

Threads not being executed under supervisord

I am working on a basic crawler which crawls 5 websites concurrently using threads.
For each site it creates a new thread. When I run the program from the shell then the output log indicates that all the 5 threads run as expected.
But when I run this program as a supervisord program then the log indicates that only 2 threads are being run everytime! The log indicates that the all the 5 threads have started but only the same two of them are being executed and the rest get stuck.
I cannot understand why this inconsistency is happening when it is run from a shell and when it run from supervisor. Is there something I am not taking into account?
Here is the code which creates the threads:
for sid in entries:
url = entries[sid]
threading.Thread(target=self.crawl_loop, \
args=(sid, url)).start()
UPDATES:
As suggested by tdelaney in the comments, I changed the working directory in the supervisord configuration and now all the threads are being run as expected. Though I still don't understand that why setting the working directory to the crawler file directory rectifies the issue. Perhaps some one who knows about how supervisor manages processes can explain?

AFAIK python threads can't do threads properly because it is not thread safe. It just gives you a facility to simulate simultaneous run of the code. Your code will still use 1 core only.
https://wiki.python.org/moin/GlobalInterpreterLock
https://en.wikibooks.org/wiki/Python_Programming/Threading
Therefore it is possible that it does not spawn more processes/threads.
You should use multiprocessing I think?
https://docs.python.org/2/library/multiprocessing.html

I was having the same silent problem, but then realised that I was setting daemon to true, which was causing supervisor problems.
https://docs.python.org/2/library/threading.html#threading.Thread.daemon
So the answer is, daemon = true when running the script yourself, false when running under supervisor.

Just to say, I was just experiencing a very similar problem.
In my case, I was working on a low powered machine (RaspberryPi), with threads that were dedicated to listening to a serial device (an Arduino nano on /dev/ttyUSB0). Code worked perfectly on the command line - but the serial reading thread stalled under supervisor.
After a bit of hacking around (and trying all of the options here), I tried running python in unbuffered mode and managed to solve the issue! I got the idea from https://stackoverflow.com/a/17961520/741316.
In essence, I simply invoked python with the -u flag.

How to configure Supervisord for FreeSWITCH?

I am trying to configure a Supervisor for controlling the FreeSWITCH. Following is the configuration at this moment present in supervisord.conf.
[program:freeswitch]
command=/usr/local/freeswitch/bin/freeswitch -nc -u root -g root
numprocs=1
stdout_logfile=/var/log/supervisor/freeswitch.log
stderr_logfile=/var/log/supervisor/freeswitch.log
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 600
; When resorting to send SIGKILL to the program to terminate it
; send SIGKILL to its whole process group instead,
; taking care of its children as well.
killasgroup=true
When I an starting the supervisor using supervisord command, it starts the freeswitch process without any error. But when I am trying to restart the freeswitch using supervisorctl command, its not working and giving following errors.
freeswitch: ERROR (not running)
freeswitch: ERROR (abnormal termination)
I am not able to see any error reported in log(/var/log/supervisor/freeswitch.log). However I am seeing following:
1773 Backgrounding.
1777 Backgrounding.
1782 Backgrounding.
It seems its starting three process of freeswitch. Isn't this wrong?
Can someone point out what's problem here and provide correct configuration if require ?

supervisor requires that the run programs do not fork to background; after all it was created to run as background processes those programs for which it would be infeasible or difficult to make correctly daemonising code. Thus for each program you run with supervisor, make sure that it does not fork on background.
In the case of freeswitch, just remove the -nc option to cause it run on foreground; supervisor will direct the standard streams appropriately and restart the process if it crashed.

keep in mind that processes inherit the ulimits values of the parent process, so in your case freeswitch will run with the same ulimits as its parent process supervisord ... which I don't think is what you would want for a ressource intensive application such as freeswitch, this said using supervisord with freeswitch is actually a very bad idea.
if you must stick to it, then you will have to find out in the documentation how to raise all of ulimit values for supervisord.

Python daemon and systemd service

I have a simple Python script working as a daemon. I am trying to create systemd script to be able to start this script during startup.
Current systemd script:
[Unit]
Description=Text
After=syslog.target
[Service]
Type=forking
User=node
Group=node
WorkingDirectory=/home/node/Node/
PIDFile=/var/run/zebra.pid
ExecStart=/home/node/Node/node.py
[Install]
WantedBy=multi-user.target
node.py:
if __name__ == '__main__':
with daemon.DaemonContext():
check = Node()
check.run()
run contains while True loop.
I try to run this service with systemctl start zebra-node.service. Unfortunately service never finished stating sequence - I have to press Ctrl+C.
Script is running, but status is activating and after a while it change to deactivating.
Now I am using python-daemon (but before I tried without it and the symptoms were similar).
Should I implement some additional features to my script or is systemd file incorrect?

The reason, it does not complete the startup sequence is, that for Type forking your startup process is expected to fork and exit (see $ man systemd.service - search for forking).
Simply use only the main process, do not daemonize
One option is to do less. With systemd, there is often no need to create daemons and you may directly run the code without daemonizing.
#!/usr/bin/python -u
from somewhere import Node
check = Node()
check.run()
This allows using simpler Type of service called simple, so your unit file would look like.
[Unit]
Description=Simplified simple zebra service
After=syslog.target
[Service]
Type=simple
User=node
Group=node
WorkingDirectory=/home/node/Node/
ExecStart=/home/node/Node/node.py
StandardOutput=syslog
StandardError=syslog
[Install]
WantedBy=multi-user.target
Note, that the -u in python shebang is not necessary, but in case you print something out to the stdout or stderr, the -u makes sure, there is no output buffering in place and printed lines will be immediately caught by systemd and recorded in journal. Without it, it would appear with some delay.
For this purpose I added into unit file the lines StandardOutput=syslog and StandardError=syslog. If you do not care about printed output in your journal, do not care about these lines (they do not have to be present).
systemd makes daemonization obsolete
While the title of your question explicitly asks about daemonizing, I guess, the core of the question is "how to make my service running" and while using main process seems much simpler (you do not have to care about daemons at all), it could be considered answer to your question.
I think, that many people use daemonizing just because "everybody does it". With systemd the reasons for daemonizing are often obsolete. There might be some reasons to use daemonization, but it will be rare case now.
EDIT: fixed python -p to proper python -u. thanks kmftzg

It is possible to daemonize like Schnouki and Amit describe. But with systemd this is not necessary. There are two nicer ways to initialize the daemon: socket-activation and explicit notification with sd_notify().
Socket activation works for daemons which want to listen on a network port or UNIX socket or similar. Systemd would open the socket, listen on it, and then spawn the daemon when a connection comes in. This is the preferred approch because it gives the most flexibility to the administrator. [1] and [2] give a nice introduction, [3] describes the C API, while [4] describes the Python API.
[1] http://0pointer.de/blog/projects/socket-activation.html
[2] http://0pointer.de/blog/projects/socket-activation2.html
[3] http://www.freedesktop.org/software/systemd/man/sd_listen_fds.html
[4] http://www.freedesktop.org/software/systemd/python-systemd/daemon.html#systemd.daemon.listen_fds
Explicit notification means that the daemon opens the sockets itself and/or does any other initialization, and then notifies init that it is ready and can serve requests. This can be implemented with the "forking protocol", but actually it is nicer to just send a notification to systemd with sd_notify().
Python wrapper is called systemd.daemon.notify and will be one line to use [5].
[5] http://www.freedesktop.org/software/systemd/python-systemd/daemon.html#systemd.daemon.notify
In this case the unit file would have Type=notify, and call
systemd.daemon.notify("READY=1") after it has established the sockets. No forking or daemonization is necessary.

You're not creating the PID file.
systemd expects your program to write its PID in /var/run/zebra.pid. As you don't do it, systemd probably thinks that your program is failing, hence deactivating it.
To add the PID file, install lockfile and change your code to this:
import daemon
import daemon.pidlockfile
pidfile = daemon.pidlockfile.PIDLockFile("/var/run/zebra.pid")
with daemon.DaemonContext(pidfile=pidfile):
check = Node()
check.run()
(Quick note: some recent update of lockfile changed its API and made it incompatible with python-daemon. To fix it, edit daemon/pidlockfile.py, remove LinkFileLock from the imports, and add from lockfile.linklockfile import LinkLockFile as LinkFileLock.)
Be careful of one other thing: DaemonContext changes the working dir of your program to /, making the WorkingDirectory of your service file useless. If you want DaemonContext to chdir into another directory, use DaemonContext(pidfile=pidfile, working_directory="/path/to/dir").

I came across this question when trying to convert some python init.d services to systemd under CentOS 7. This seems to work great for me, by placing this file in /etc/systemd/system/:
[Unit]
Description=manages worker instances as a service
After=multi-user.target
[Service]
Type=idle
User=node
ExecStart=/usr/bin/python /path/to/your/module.py
Restart=always
TimeoutStartSec=10
RestartSec=10
[Install]
WantedBy=multi-user.target
I then dropped my old init.d service file from /etc/init.d and ran sudo systemctl daemon-reload to reload systemd.
I wanted my service to auto restart, hence the restart options. I also found using idle for Type made more sense than simple.
Behavior of idle is very similar to simple; however, actual execution
of the service binary is delayed until all active jobs are dispatched.
This may be used to avoid interleaving of output of shell services
with the status output on the console.
More details on the options I used here.
I also experimented with keeping the old service and having systemd resart the service but I ran into some issues.
[Unit]
# Added this to the above
#SourcePath=/etc/init.d/old-service
[Service]
# Replace the ExecStart from above with these
#ExecStart=/etc/init.d/old-service start
#ExecStop=/etc/init.d/old-service stop
The issues I experienced was that the init.d service script was used instead of the systemd service if both were named the same. If you killed the init.d initiated process, the systemd script would then take over. But if you ran service <service-name> stop it would refer to the old init.d service. So I found the best way was to drop the old init.d service and the service command referred to the systemd service instead.
Hope this helps!

Also, you most likely need to set daemon_context=True when creating the DaemonContext().
This is because, if python-daemon detects that if it is running under a init system, it doesn't detach from the parent process. systemd expects that the daemon process running with Type=forking will do so. Hence, you need that, else systemd will keep waiting, and finally kill the process.
If you are curious, in python-daemon's daemon module, you will see this code:
def is_detach_process_context_required():
""" Determine whether detaching process context is required.
Return ``True`` if the process environment indicates the
process is already detached:
* Process was started by `init`; or
* Process was started by `inetd`.
"""
result = True
if is_process_started_by_init() or is_process_started_by_superserver():
result = False
Hopefully this explains better.

Error R14 (Memory quota exceeded) Not visible in New Relic

Keep getting a Error R14 (Memory quota exceeded) on Heroku.
Profiling the memory on the django app locally I don't see any issues. We've installed New Relic, and things seem to be fine there, except for one oddity:
http://screencast.com/t/Uv1W3bjd
Memory use hovers around 15mb per dyno, but for some reason the 'dynos running' thing quickly scales up to 10+. Not sure how that makes any sense since we are currently only running on web dyno.
We are also running celery, and things seem to look normal there as well (around 15mb). Although it is suspect because I believe we started having the error around when this launched.
Some of our requests do take awhile, as they do a soap request to echosign, which can take 6-10 seconds to respond sometimes. Is this somehow blocking and causing new dyno's to spin up?
Here is my proc file:
web: python manage.py collectstatic --noinput; python manage.py compress; newrelic-admin run-program python manage.py run_gunicorn -b "0.0.0.0:$PORT" -w 9 -k gevent --max-requests 250
celeryd: newrelic-admin run-program python manage.py celeryd -E -B --loglevel=INFO
The main problem is the memory error though.

I BELIEVE I may have found the issue.
Based on posts like these I thought that I should have somewhere in the area of 9-10 gunicorn workers. I believe this is incorrect (or at least, it is for the work my app is doing).
I had been running 9 gunicorn workers, and finally realized that was the only real difference between heroku and local (as far as configuration).
According to the gunicorn design document the advice for workers goes something like this:
DO NOT scale the number of workers to the number of clients you expect
to have. Gunicorn should only need 4-12 worker processes to handle
hundreds or thousands of requests per second.
Gunicorn relies on the operating system to provide all of the load
balancing when handling requests. Generally we recommend (2 x
$num_cores) + 1 as the number of workers to start off with. While not
overly scientific, the formula is based on the assumption that for a
given core, one worker will be reading or writing from the socket
while the other worker is processing a request.
And while the information out there about Heroku Dyno CPU abilities, I've now read that each dyno is running on something around 1/4 of a Core. Not super powerful, but powerful enough I guess.
Dialing my workers down to 3 (which is even high according to their rough formula) appears to have stopped my memory issues, at least for now. When I think about it, the interesting thing about the memory warning I would get is it would never go up. It got to around 103% and then just stayed there, whereas if it was actually a leak, it should have kept rising until being shut down. So my theory is my workers were eventually consuming just enough memory to go above 512mb.
HEROKU SHOULD ADD THIS INFORMATION SOMEWHERE!! And at the very least I should be able to top into my running dyno to see what's going on. Would have saved me hours and days.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.