Celery pool processes, tasks, and system processes & memory space

Celery pool processes, tasks, and system processes & memory space - python

Each task execution in a unique process space?
Do Celery pool (not Master) processes spawn off a process for each task execution?
In other words, is each task execution through a new process spawned by worker pool process?
Or is it the other way?
task is executed as part of worker pool process?
One implication of that: If celery task relies on data stored in the process memory space, that data is part of the worker pool process which is executing it. And, all tasks executed by the worker pool process have access to that copy of the data.

These details depend on the concurrency model you pick for your workers.
In the default, prefork model (based on processes), every task is executed inside one of the pre-forked processes (worker processes). So yes - it is a process pool. You can configure Celery to create a new worker-process for each task, but that is not the default behaviour. By default Celery does not replace old worker processes with new ones, but you can control that with the worker_max_tasks_per_child setting.

Related

Gunicorn: multiple background worker threads

Setup: My application uses multiple workers to process elements in parallel. Processing those elements is CPU-intensive, so I need worker processes. The application will be used via a Flask API and GUnicorn. GUnicorn itself has multiple worker processes to process requests in parallel. In the Flask API, the request data is put onto a queue and the worker processes of my background application take this data from that queue.
Problem: Forking worker processes is quite time intensive and the application has to meet a certain speed requirement. Therefore, I would like to spawn the background worker processes when the app starts. To avoid mixing results, I need n background worker processes for every GUnicorn worker.
Question: How can I determine during construction time how many background workers I have to spawn and how is it possible to link those workers to GUnicorn workers?
Approach: I can read the number of GUnicorn workers from gunicorn_config.py by importing the workers variable. However, at this point, I do not know the GUnicorn worker process IDs. Do they have internal IDs that I could use at this point (e.g., GUnicorn worker #1, ...)?

You need to be aware (and account for) that a gunicorn worker can be stopped at any time (for example due to crash of the request hander or timeout in processing). So this means that hardwiring of your particular worker to particular gunicorn process cannot be permanent.
If you want to link your worker to particular gunicorn worker then relinking should happen every time gunicorn worker is restarted.
One approach would be to define your own post_fork handler and/or port_fork_init that would do that wiring.
Initially you can start the required number of workers and then post_fork handler may "borrow" (or as you call it "link") them to a worker that just have been created.

Looking for ways and options to stop the worker process after task completion in celery?

The question context is from the stand point of disposable infrastructure using docker for celery workers.
I am looking out for a option or ways to achieve the following:
I start a docker container, which starts the celery worker [along with concurrency]
It waits and picks ups one task per worker
the workers shutdown after task is completed [success or failure]
The main process dies after all workers are down.
I dispose off the container and start a new one.

CPU Concurrency behavior with celery task that calls a subprocess

I have a celery task which uses subprocess.Popen() to call out to an executable which does some CPU-intensive data crunching. It works well but does not take full advantage of the celery worker concurrency.
If I start up celeryd with --concurrency 8 -P prefork, I can confirm with ps aux | grep celeryd that 8 child processes have been spawned. OK.
Now when I start e.g. 3 tasks in parallel, I see all three tasks picked up by one of child workers:
[2014-05-08 13:41:23,839: WARNING/Worker-2] running task a...
[2014-05-08 13:41:23,839: WARNING/Worker-4] running task b...
[2014-05-08 13:41:24,661: WARNING/Worker-7] running task c...
... and they run for several minutes before completing successfully. However, when you observe the CPU usage during that time, it's clear that all three tasks are sharing the same CPU despite another free core:
If I add two more tasks, each subprocess takes ~20% of that one CPU, etc.
I would expect that each child celery process (which are created using multiprocessing.Pool via the prefork method) would be able to operate independently and not be constrained to a single core. If not, how can I take full advantage of multiple CPU cores with a CPU-bound celery task?

According to
http://bugs.python.org/issue17038
and
https://stackoverflow.com/a/15641148/519385
There is a problem whereby some C extensions mess with core affinity and prevent multiprocessing from accessing all available CPUs. The solution is a total hack but appears to work.
os.system("taskset -p 0xff %d" % os.getpid())

Is celery's apply_async thread or process?

Can someone tell me whether Celery executes a task in a thread or in a separate child process? The documentation doesn't seem to explain it (read it like 3 times). If it is a thread, how does it get pass the GIL (particularly whom and how an event is notified)?
How would you compare celery's async with Twisted's reactor model? Is celery using reactor model after all?
Thanks,

Can someone tell me whether Celery executes a task in a thread or in a
separate child process?
Neither, the task will be executed in a separate process possibly on a different machine. It is not a child process of the thread where you call 'delay'. The -C and -P options control how the worker process manages it's own threading. The worker processes get tasks through a message service which is also completely independent.
How would you compare celery's async with Twisted's reactor model? Is
celery using reactor model after all?
Twisted is an event queue. It is asynchronous but it's not designed for parallel processing.

-c and -P are the concurrency related options for celery worker.
-c CONCURRENCY, --concurrency=CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: processes (default), eventlet,
gevent, solo or threads.
using eventlet:
http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#enabling-eventlet
http://docs.celeryproject.org/en/latest/internals/reference/celery.concurrency.processes.html

Python Celery - lookup task by pid

A pretty straightforward question, maybe -
I often see a celery task process running on my system that I cannot find when I use celery.task.control.inspect()'s active() method. Often this process will be running for hours, and I worry that it's a zombie of some sort. Usually it's using up a lot of memory, too.
Is there a way to look up a task by linux pid? Does celery or the AMPQ result backend save that?
If not, any other way to figure out which particular task is the one that's sitting around eating up memory?
---- updated:
What can I do when active() tells me that there are no tasks running on a particular box, but the box's memory is in full use, and htop is showing that these worker pool threads are the ones using it, but at the same time using 0% CPU? if it turns out this is related to some quirk of my current rackspace setup and nobody can answer, I'll still accept Loren's.
Thanks~

I'm going to make the assumption that by 'task' you mean 'worker'. The question would make little sense otherwise.
For some context it's important to understand the process hierarchy of Celery worker pools. A worker pool is a group of worker processes (or threads) that share the same configuration (process messages of the same set of queues, etc.). Each pool has a single parent process that manages the pool. This process controls how many child workers are forked and is responsible for forking replacement children when children die. The parent process is the only process bound to AMQP and the children ingest and process tasks from the parent via IPC. The parent process itself does not actually process (run) any tasks.
Additionally, and towards an answer to your question, the parent process is the process responsible for responding to your Celery inspect broadcasts, and the PIDs listed as workers in the pool are only the child workers. The parent PID is not included.
If you're starting the Celery daemon using the --pidfile command-line parameter, that file will contain the PID of the parent process and you should be able to cross-reference that PID with the process you're referring to determine if it is in fact a pool parent process. If you're using Celery multi to start multiple instances (multiple worker pools) then by default PID files should be located in the directory from which you invoked Celery multi. If you're not using either of these means to start Celery try using one of them to verify that the process isn't a zombie and is in fact simply a parent.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.