Celery signal task_success executes in which process? - python

If I hook up a callback to the celery task_success signal handler, which process does it get executed in? The child or the worker process?
The documentation does not explicitly list it. (It lists it for the signal task_sent, but not for the other signals: http://docs.celeryproject.org/en/latest/userguide/signals.html#task-sent)
thanks...

There's no such thing as a "child" process; there is the process sending the task (which can be any Python process, including a celery worker, or celery beat, or anything else) and there is the worker that processes the task.
All the task signals except task_sent are executed in the worker that processes the task; in fact they can't possibly execute anywhere else. Celery signals (like Django signals) are not like operating system events, or like Celery tasks, which can originate in one process and trigger something in another process; they get processed in the same process as that in which they originate. They have nothing to do with the Python standard library signal module.

Related

Is celery's apply_async thread or process?

Can someone tell me whether Celery executes a task in a thread or in a separate child process? The documentation doesn't seem to explain it (read it like 3 times). If it is a thread, how does it get pass the GIL (particularly whom and how an event is notified)?
How would you compare celery's async with Twisted's reactor model? Is celery using reactor model after all?
Thanks,
Can someone tell me whether Celery executes a task in a thread or in a
separate child process?
Neither, the task will be executed in a separate process possibly on a different machine. It is not a child process of the thread where you call 'delay'. The -C and -P options control how the worker process manages it's own threading. The worker processes get tasks through a message service which is also completely independent.
How would you compare celery's async with Twisted's reactor model? Is
celery using reactor model after all?
Twisted is an event queue. It is asynchronous but it's not designed for parallel processing.
-c and -P are the concurrency related options for celery worker.
-c CONCURRENCY, --concurrency=CONCURRENCY
Number of child processes processing the queue. The
default is the number of CPUs available on your
system.
-P POOL_CLS, --pool=POOL_CLS
Pool implementation: processes (default), eventlet,
gevent, solo or threads.
using eventlet:
http://docs.celeryproject.org/en/latest/userguide/concurrency/eventlet.html#enabling-eventlet
http://docs.celeryproject.org/en/latest/internals/reference/celery.concurrency.processes.html

Reduce the number of workers on a machine in Python-RQ?

What is a good way to Reduce the number of workers on a machine in Python-RQ?
According to the documentation, I need to send a SIGINT or SIGTERM command to one of the worker processes on the machine:
Taking down workers
If, at any time, the worker receives SIGINT (via Ctrl+C) or SIGTERM (via kill), the worker wait until the currently running task is finished, stop the work loop and gracefully register its own death.
If, during this takedown phase, SIGINT or SIGTERM is received again,the worker will forcefully terminate the child process (sending it SIGKILL), but will still try to register its own death.
This seems to imply a lot of coding overhead:
Would need to keep track of the PID for the worker process
Would need to have a way to send a SIGINT command from a remote machine
Do I really need to custom build this, or is there a way to do this easily using the Python-RQ library or some other existing library?
Get all running workers using rq.Worker.all()
Select the worker you want to kill
Use os.kill(worker.pid, signal.SIGINT)

Processes sharing queue not terminating properly

I have a multiprocessing application where the parent process creates a queue and passes it to worker processes. All processes use this queue for creating a queuehandler for the purpose of logging. There is a worker process reading from this queue and doing logging.
The worker processes continuously check if parent is alive or not. The problem is that when I kill the parent process from command line, all workers are killed except for one. The logger process also terminates. I don't know why one process keeps executing. Is it because of any locks etc in queue? How to properly exit in this scenario? I am using
sys.exit(0)
for exiting.
I would use sys.exit(0) only if there is no other chance. It's always better to cleanly finish each thread / process. You will have some while loop in your Process. So just do break there, so that it can come to an end.
Tidy up before you leave, i.e., release all handles of external resources, e.g., files, sockets, pipes.
Somewhere in these handles might be the reason for the behavior you see.

Python Celery - lookup task by pid

A pretty straightforward question, maybe -
I often see a celery task process running on my system that I cannot find when I use celery.task.control.inspect()'s active() method. Often this process will be running for hours, and I worry that it's a zombie of some sort. Usually it's using up a lot of memory, too.
Is there a way to look up a task by linux pid? Does celery or the AMPQ result backend save that?
If not, any other way to figure out which particular task is the one that's sitting around eating up memory?
---- updated:
What can I do when active() tells me that there are no tasks running on a particular box, but the box's memory is in full use, and htop is showing that these worker pool threads are the ones using it, but at the same time using 0% CPU? if it turns out this is related to some quirk of my current rackspace setup and nobody can answer, I'll still accept Loren's.
Thanks~
I'm going to make the assumption that by 'task' you mean 'worker'. The question would make little sense otherwise.
For some context it's important to understand the process hierarchy of Celery worker pools. A worker pool is a group of worker processes (or threads) that share the same configuration (process messages of the same set of queues, etc.). Each pool has a single parent process that manages the pool. This process controls how many child workers are forked and is responsible for forking replacement children when children die. The parent process is the only process bound to AMQP and the children ingest and process tasks from the parent via IPC. The parent process itself does not actually process (run) any tasks.
Additionally, and towards an answer to your question, the parent process is the process responsible for responding to your Celery inspect broadcasts, and the PIDs listed as workers in the pool are only the child workers. The parent PID is not included.
If you're starting the Celery daemon using the --pidfile command-line parameter, that file will contain the PID of the parent process and you should be able to cross-reference that PID with the process you're referring to determine if it is in fact a pool parent process. If you're using Celery multi to start multiple instances (multiple worker pools) then by default PID files should be located in the directory from which you invoked Celery multi. If you're not using either of these means to start Celery try using one of them to verify that the process isn't a zombie and is in fact simply a parent.

python/django spawn background process and avoid zombie process

I need to spawn a background process in django, the view returns immediately, the background process continues make some changes, then update the db. This is done by os.spawnl() function to call a separate .py file.
The problem is after the background process is done, it becames a zombie function [python] <defunct>.
How do I avoid that? I followed this and this example but I still got the child process as zombie after the django render process.
I want to take this chance to practice my *nix process management skills so please do me a favor, don't give me Celery or other mq/async task solutions, and I hate dependencies.
This got to long for a comment-
The wait syscall (which os.wait is a wrapper for) reaps exit codes/pids from dead processes. You will want to os.wait in the process that is a generation above your zombie processes; the parent of the zombies processes. The parent processes will receive a SIGCHLD signal when one of its child processes die. If you insist on doing all of this yourself, you will need to install a signal handler to trap for SIGCHLD and in the signal handler call os.wait. Read some documentation on unix process handling and the Python documentation on the os module as there are variations of the os.wait function that will be non-blocking which maybe helpful.
import signal
signal.signal(signal.SIGCHLD, lambda _x,_y: os.wait())
I had a similar problem. I used active_children() from multiprocessing module.
import multiprocessing
# somewhere in middleware or where appropriate call
active_children()

Categories