I have a Docker container in which I am running a Python Flask API with GUnicorn and a server process, also written in Python. This server processes spawns long-running child processes and waits for them (one observation thread per child process). The root process is tini. This runs supervisord, which in turn spawns GUnicorn and the server process. The processes spawned by the server process are using multiple threads.
Sometimes, I observe defunct processes appearing in the process list. To counter this problem, I originally introduced tini as root process, but apparently, this is not enough.
As far as I understand, defunct processes are processes, which have been terminated, but not yet collected by their parent process. However, my server process specifically joins its child processes (using threads) and I would assume, GUnicorn and supervisord do the same.
How can I determine where these defunct processes are coming from and how can I debug/handle this problem?
Related
I've run successfully example gRPC server from github with thread executor with max workers=10. Also, I could run the same server with so_reuseport options multiple times. Each server process will spawn 10 threads to process incoming requests.
The problem is to run server without thread executor, so one server should process request in main thread. It's incompatible to run server with thread executor with max workers=1
I didn't find any documentation on that. Could some one point me to an example?
We have a django application served through gunicorn sync workers. There's a tornado server run in a thread from the django application itself(Let's not argue about architecture, it's legacy code) and is tightly coupled with worker.
It binds itself to a free port stored in a db. Now every time a worker restarts, I would need to start the tornado too, but since the port has been used it won't start.
I would need to somehow detect that worker went down and would need to store the state of port to available. However, I am not able to detect that anyhow.
signal.signal(signal.SIGTERM, stop_handler)
signal.signal(signal.SIGINT, stop_handler)
Above are only detected when I manually kill the worker but not when gunicorn restarts the worker itself.
How can I detect the same?
I have read the documentation of uWSGI mules, and some info from other sources. But I'm confused about the differences between mules and python threads. Can anyone explain to me, what can mules do that threads cannot and why do mules even exist?
uWSGI mule can be thought of as a separate worker process, which is not accessible via sockets (eg. direct web requests). It executes an instance of your application and can be used for offloading tasks using mulefunc Python decorator for example. Also, as mentioned in the documentation, mule can be configured to execute custom logic.
On the other hand, a thread runs in its parent's (uWSGI worker) address space. So if the worker dies or is reloaded, the thread behaves the same way. It can handle requests and also can execute specified tasks (functions) via thread decorator.
Python threads do no span on multiple CPUs, roughly said can't use all the CPU power, this is a Python GIL limitation What is the global interpreter lock (GIL) in CPython?
This is one of the reasons for using web servers, their duty is to spawn a process worker or use idle one for each task received (http request).
A mule function on the same principal, but is particular in a sense that it is intended to run tasks outside of an http request context. the idea behind, is that you could reserve some mules, each will be running in a separate process (span on multiple CPUs) as regular workers do, but they don't serve any http request, only tasks to be setup as mentioned in the uwsgi documentation.
Worth to mention that mules are also monitored by the master process of the web server, such they are respawned when killed or dead.
I have a server with above configuration and I am processing long tasks but I have to update user about the process state, which I am doing through Firebase. To respond to the client immediately I enqueue the job in redis using python-rq.
I am using flask and uwsgi and Nginx. In uwsgi conf file, there is a field which asks for number of processes.
My question is, Do I need to start multiple uwsgi processes, or more redis workers?
Does starting more uwsgi workers will create more redis workers?
How would the scaling work, My server has 1 vCPU and 2GB ram. I have aws autoscaling for production. Should I run more uWsgi workers and how many redis workers with only one queue.
I am starting the worker independently. The flask app is importing the connection and adding the job.
my startup script
my worker code
It depends upon how you're running rq workers. There can be two cases
1) Running rq workers from inside the app. Then increasing number of workers in uwsgi settings will automatically spawn num_rq_workers_in_app_conf * num_app_workers_in_uwsgi_conf
2) Running rq workers outside application like using supervisord. Where you can manually control number of rq workers independently of app.
According to me running rq workers under supervisord is a better option than point 1. It helps in effective debugging of each worker and one more issue which I've encountered while using rq is that rq-workers running via point 1 strategy unregisters themselves from rq i.e becomes dead for rq although running in background in few weeks interval.
I would like to run some code in a uWSGI app, but on a long-lived process, not inside workers. That's because the process blocks on a socket recv() call, and only one thread of execution should do this.
I am hoping to avoiding creating my own daemon by somehow starting a long-lived process on uWSGI startup that does not get spawned in each worker.
Does uWSGI support anything like this?
uWSGI Mules are like workers but without network access:
http://uwsgi-docs.readthedocs.org/en/latest/Mules.html