uWSGI: Spawning a long-lived process - python

I would like to run some code in a uWSGI app, but on a long-lived process, not inside workers. That's because the process blocks on a socket recv() call, and only one thread of execution should do this.
I am hoping to avoiding creating my own daemon by somehow starting a long-lived process on uWSGI startup that does not get spawned in each worker.
Does uWSGI support anything like this?

uWSGI Mules are like workers but without network access:
http://uwsgi-docs.readthedocs.org/en/latest/Mules.html

Related

Defunct processes in Docker container

I have a Docker container in which I am running a Python Flask API with GUnicorn and a server process, also written in Python. This server processes spawns long-running child processes and waits for them (one observation thread per child process). The root process is tini. This runs supervisord, which in turn spawns GUnicorn and the server process. The processes spawned by the server process are using multiple threads.
Sometimes, I observe defunct processes appearing in the process list. To counter this problem, I originally introduced tini as root process, but apparently, this is not enough.
As far as I understand, defunct processes are processes, which have been terminated, but not yet collected by their parent process. However, my server process specifically joins its child processes (using threads) and I would assume, GUnicorn and supervisord do the same.
How can I determine where these defunct processes are coming from and how can I debug/handle this problem?

Detect gunicorn worker restart

We have a django application served through gunicorn sync workers. There's a tornado server run in a thread from the django application itself(Let's not argue about architecture, it's legacy code) and is tightly coupled with worker.
It binds itself to a free port stored in a db. Now every time a worker restarts, I would need to start the tornado too, but since the port has been used it won't start.
I would need to somehow detect that worker went down and would need to store the state of port to available. However, I am not able to detect that anyhow.
signal.signal(signal.SIGTERM, stop_handler)
signal.signal(signal.SIGINT, stop_handler)
Above are only detected when I manually kill the worker but not when gunicorn restarts the worker itself.
How can I detect the same?

uWSGI mules VS native Python threads

I have read the documentation of uWSGI mules, and some info from other sources. But I'm confused about the differences between mules and python threads. Can anyone explain to me, what can mules do that threads cannot and why do mules even exist?
uWSGI mule can be thought of as a separate worker process, which is not accessible via sockets (eg. direct web requests). It executes an instance of your application and can be used for offloading tasks using mulefunc Python decorator for example. Also, as mentioned in the documentation, mule can be configured to execute custom logic.
On the other hand, a thread runs in its parent's (uWSGI worker) address space. So if the worker dies or is reloaded, the thread behaves the same way. It can handle requests and also can execute specified tasks (functions) via thread decorator.
Python threads do no span on multiple CPUs, roughly said can't use all the CPU power, this is a Python GIL limitation What is the global interpreter lock (GIL) in CPython?
This is one of the reasons for using web servers, their duty is to spawn a process worker or use idle one for each task received (http request).
A mule function on the same principal, but is particular in a sense that it is intended to run tasks outside of an http request context. the idea behind, is that you could reserve some mules, each will be running in a separate process (span on multiple CPUs) as regular workers do, but they don't serve any http request, only tasks to be setup as mentioned in the uwsgi documentation.
Worth to mention that mules are also monitored by the master process of the web server, such they are respawned when killed or dead.

Python-rq with flask + uwsgi + Nginx : Do I need more uwsgi processes or redis workers?

I have a server with above configuration and I am processing long tasks but I have to update user about the process state, which I am doing through Firebase. To respond to the client immediately I enqueue the job in redis using python-rq.
I am using flask and uwsgi and Nginx. In uwsgi conf file, there is a field which asks for number of processes.
My question is, Do I need to start multiple uwsgi processes, or more redis workers?
Does starting more uwsgi workers will create more redis workers?
How would the scaling work, My server has 1 vCPU and 2GB ram. I have aws autoscaling for production. Should I run more uWsgi workers and how many redis workers with only one queue.
I am starting the worker independently. The flask app is importing the connection and adding the job.
my startup script
my worker code
It depends upon how you're running rq workers. There can be two cases
1) Running rq workers from inside the app. Then increasing number of workers in uwsgi settings will automatically spawn num_rq_workers_in_app_conf * num_app_workers_in_uwsgi_conf
2) Running rq workers outside application like using supervisord. Where you can manually control number of rq workers independently of app.
According to me running rq workers under supervisord is a better option than point 1. It helps in effective debugging of each worker and one more issue which I've encountered while using rq is that rq-workers running via point 1 strategy unregisters themselves from rq i.e becomes dead for rq although running in background in few weeks interval.

celery launches more processes than configured

I'm running a celery machine, using redis as the broker with the following configuration:
celery -A project.tasks:app worker -l info --concurrency=8
When checking the number of celery running processes, I see more than 8.
Is there something that I am missing? Is there a limit for max concurrency?
This problem causes huge memory allocation, and is killing the machine.
With the default settings Celery will always start one more process than the number you ask. This additional process is a kind of bookkeeping process that is used to coordinate the other processes that are part of the worker. It communicates with the rest of Celery, and dispatches the tasks to the processes that actually run the tasks.
Switching to a different pool implementation than the "prefork" default might reduce the number of processes created but that's opening new can of worms.
For the concurrency problem, I have no suggestion.
For the memory problem, you can look at redis configuration in ~/.redis/redis.conf. You have a maxmemory attribute which fix a limit upon tasks…
See the Redis configuration

Categories