I have read the documentation of uWSGI mules, and some info from other sources. But I'm confused about the differences between mules and python threads. Can anyone explain to me, what can mules do that threads cannot and why do mules even exist?
uWSGI mule can be thought of as a separate worker process, which is not accessible via sockets (eg. direct web requests). It executes an instance of your application and can be used for offloading tasks using mulefunc Python decorator for example. Also, as mentioned in the documentation, mule can be configured to execute custom logic.
On the other hand, a thread runs in its parent's (uWSGI worker) address space. So if the worker dies or is reloaded, the thread behaves the same way. It can handle requests and also can execute specified tasks (functions) via thread decorator.
Python threads do no span on multiple CPUs, roughly said can't use all the CPU power, this is a Python GIL limitation What is the global interpreter lock (GIL) in CPython?
This is one of the reasons for using web servers, their duty is to spawn a process worker or use idle one for each task received (http request).
A mule function on the same principal, but is particular in a sense that it is intended to run tasks outside of an http request context. the idea behind, is that you could reserve some mules, each will be running in a separate process (span on multiple CPUs) as regular workers do, but they don't serve any http request, only tasks to be setup as mentioned in the uwsgi documentation.
Worth to mention that mules are also monitored by the master process of the web server, such they are respawned when killed or dead.
Related
I have two servers, there is one celery worker on each server. I use Redis as the broker to collaborate with workers.
My question is how can I make only one worker run for most of the time, and once this worker is broken, another worker will turn on as a backup worker?
Basically, just take one worker as a back-up.
I know how to specify a task to a certain worker by a queue on the worker respectively, after reading the doc [http://docs.celeryproject.org/en/latest/userguide/routing.html#redis-message-priorities]
This is, in my humble opinion, completely against the point of having distributed system to off-load CPU heavy, or long-running tasks, or have thousands of small tasks that you can't run elsewhere...
- You are running two servers anyway, so why keeping the other one idle? More workers mean you will be able to process more tasks concurrently.
If you are not convinced, and still want to do this, you need to write a tiny service on machine with idle Celery worker. This service will periodically check the health of the active worker, and if that check fails, you will run Celery worker on the backup server.
Here is a question for you - why this service simply does not restart the Celery worker on the active server? - It is pretty much possible to do that, so again, I see no justification for having a completely idle machine doing nothing. If you are on a cloud platform, you can easily spin up a new instance from an existing image of your Celery worker. This is scenario I use in production.
I'm running a celery machine, using redis as the broker with the following configuration:
celery -A project.tasks:app worker -l info --concurrency=8
When checking the number of celery running processes, I see more than 8.
Is there something that I am missing? Is there a limit for max concurrency?
This problem causes huge memory allocation, and is killing the machine.
With the default settings Celery will always start one more process than the number you ask. This additional process is a kind of bookkeeping process that is used to coordinate the other processes that are part of the worker. It communicates with the rest of Celery, and dispatches the tasks to the processes that actually run the tasks.
Switching to a different pool implementation than the "prefork" default might reduce the number of processes created but that's opening new can of worms.
For the concurrency problem, I have no suggestion.
For the memory problem, you can look at redis configuration in ~/.redis/redis.conf. You have a maxmemory attribute which fix a limit upon tasks…
See the Redis configuration
In uWSGi document, there is a sentence said, If you start uWSGI without threads, the Python GIL will not be enabled, so threads generated by your application will never run
I wonder how uWSGi disable python GIL?
It replaces the functions for getting and releasing GIL (they handle switching threads) with a dummy functions doing nothing. See related source code:
Initializing thread switching to dummy by default:
https://github.com/unbit/uwsgi/blob/edb93f6c174a61858be88c9c2eb2c34bf87ae07d/plugins/python/python_plugin.c#L309-L311
Dummy GIL functions:
https://github.com/unbit/uwsgi/blob/abac960e62700117cb96af3cd22e27e04242e096/plugins/python/gil.c
I would like to run some code in a uWSGI app, but on a long-lived process, not inside workers. That's because the process blocks on a socket recv() call, and only one thread of execution should do this.
I am hoping to avoiding creating my own daemon by somehow starting a long-lived process on uWSGI startup that does not get spawned in each worker.
Does uWSGI support anything like this?
uWSGI Mules are like workers but without network access:
http://uwsgi-docs.readthedocs.org/en/latest/Mules.html
When my new versions of my django application are deployed to heroku, the workers are forced to be restarted. I have some long running tasks which should perform some cleanup prior to being killed.
I have tried registering a worker_shutdown hook which doesn't every seem to get called.
I have also tried the answer in Notify celery task of worker shutdown but i am unclear of how to abort a given task from within this context as calling celery.task.control.active() throws an exception (celery is no longer running).
Thanks for any help.
If you control the deployment maybe you can run a script that does a Control.broadcast to a custom command that you can register beforehand and only after receiving the required replies (you'd have to implement that logic) you'd continue the deployment (or raise a TimeoutException)?
Also, celery already has a predefined command for shutdown which I'm guessing you could overload in your instance or Subclass of worker. Commands have the advantage of being passed a Panel instance which allows you access to the consumer. That should expose a lot of control right there...