Can a Daemon process fork child processes in python? - python

I have a daemon process that keeps on running which I created using runit package. I want daemon process to listen to a table and perform tasks based on the column of the table which says what task it needs to perform.
EG: table 'A' has column job_type.
I was thinking of forking child processes from this daemon process every time it gets a new task to perform (based on the new row inserted in the table A which daemon listens to).
The multiprocessing module says I can't or shouldn't fork child processes from daemon as if it dies, the children processes are orphaned.
What is a good approach to achieve that Daemons listens to table, based on column value,forks child processes (all independent of each other) which does the task and goes back to the daemon and dies.
I need to use some locking mechanism if the child processes are accessing shared data and modifying it..

I assume the daemon process you have is also spawned from a python script which called multiprocess with daemon=true.
In this case the daemon is running implies that your creator process is still running, so you can just send it a message via pipes to spawn a new process for you. If your daemon needs to talk with this, use sockets or any ipc method of your choice.

Related

Is it possible to terminate a process which started by Pool in Python?

I used multiprocessing.Pool to imporove the performance of my Python server, when a task failed, I want to terminate it's child processes immediately.
I found that, if I create a process using Process, the terminate method can meets my needs, but if I create a process with Pool.apply_async, the return type is 'ApplyResult', and it can't terminate the corresponding process.
Is there any other way to do it?

Daemon threads vs daemon processes in Python

Based on the Python documentation, daemon threads are threads that die once the main thread dies. This seems to be the complete opposite behavior of daemon processes which involve creating a child process and terminating the parent process in order to have init take over the child process (aka killing the parent process does NOT kill the child process).
So why do daemon threads die when the parent dies, is this a misnomer? I would think that "daemon" threads would keep running after the main process has been terminated.
It's just names meaning different things in different contexts.
In case you are not aware, like threading.Thread, multiprocessing.Process also can be flagged as "daemon". Your description of "daemon processes" fits to Unix-daemons, not to Python's daemon-processes.
The docs also have a section about Process.daemon:
... Note that a daemonic process is not allowed to create child processes.
Otherwise a daemonic process would leave its children orphaned if it
gets terminated when its parent process exits. Additionally, these are
not Unix daemons or services, they are normal processes that will be
terminated (and not joined) if non-daemonic processes have exited.
The only thing in common between Python's daemon-processes and Unix-daemons (or Windows "Services") is that you would use them for background-tasks
(for Python: only an option for tasks which don't need proper clean up on shutdown, though).
Python imposes it's own abstraction layer on top of OS-threads and processes. The daemon-attribute for Thread and Process is about this OS-independent, Python-level abstraction.
At the Python-level, a daemon-thread is a thread which doesn't get joined (awaited to exit voluntarily) when the main-thread exits and a daemon-process is a process which gets terminated (not joined) when the parent-process exits. Daemon-threads and processes both experience the same behavior in that their natural exit is not awaited in case the main or parent-process is shutting down. That's all.
Note that Windows doesn't even have the concept of "related processes" like Unix, but Python implements this relationship of "child" and "parent" in a cross-platform manner.
I would think that "daemon" threads would keep running after the main
process has been terminated.
A thread cannot exist outside of a process. A process always hosts and gives context to at least one thread.

Processes sharing queue not terminating properly

I have a multiprocessing application where the parent process creates a queue and passes it to worker processes. All processes use this queue for creating a queuehandler for the purpose of logging. There is a worker process reading from this queue and doing logging.
The worker processes continuously check if parent is alive or not. The problem is that when I kill the parent process from command line, all workers are killed except for one. The logger process also terminates. I don't know why one process keeps executing. Is it because of any locks etc in queue? How to properly exit in this scenario? I am using
sys.exit(0)
for exiting.
I would use sys.exit(0) only if there is no other chance. It's always better to cleanly finish each thread / process. You will have some while loop in your Process. So just do break there, so that it can come to an end.
Tidy up before you leave, i.e., release all handles of external resources, e.g., files, sockets, pipes.
Somewhere in these handles might be the reason for the behavior you see.

Python Celery - lookup task by pid

A pretty straightforward question, maybe -
I often see a celery task process running on my system that I cannot find when I use celery.task.control.inspect()'s active() method. Often this process will be running for hours, and I worry that it's a zombie of some sort. Usually it's using up a lot of memory, too.
Is there a way to look up a task by linux pid? Does celery or the AMPQ result backend save that?
If not, any other way to figure out which particular task is the one that's sitting around eating up memory?
---- updated:
What can I do when active() tells me that there are no tasks running on a particular box, but the box's memory is in full use, and htop is showing that these worker pool threads are the ones using it, but at the same time using 0% CPU? if it turns out this is related to some quirk of my current rackspace setup and nobody can answer, I'll still accept Loren's.
Thanks~
I'm going to make the assumption that by 'task' you mean 'worker'. The question would make little sense otherwise.
For some context it's important to understand the process hierarchy of Celery worker pools. A worker pool is a group of worker processes (or threads) that share the same configuration (process messages of the same set of queues, etc.). Each pool has a single parent process that manages the pool. This process controls how many child workers are forked and is responsible for forking replacement children when children die. The parent process is the only process bound to AMQP and the children ingest and process tasks from the parent via IPC. The parent process itself does not actually process (run) any tasks.
Additionally, and towards an answer to your question, the parent process is the process responsible for responding to your Celery inspect broadcasts, and the PIDs listed as workers in the pool are only the child workers. The parent PID is not included.
If you're starting the Celery daemon using the --pidfile command-line parameter, that file will contain the PID of the parent process and you should be able to cross-reference that PID with the process you're referring to determine if it is in fact a pool parent process. If you're using Celery multi to start multiple instances (multiple worker pools) then by default PID files should be located in the directory from which you invoked Celery multi. If you're not using either of these means to start Celery try using one of them to verify that the process isn't a zombie and is in fact simply a parent.

scheduling jobs using python apscheduler

I have to monitor a process continuously and I use the process ID to monitor the process. I wrote a program to send an email once the process had stopped so that I would manually reschedule it, but often I forget to reschedule the process ( basically another python program). I then came across the apscheduler module and used the cron style scheduling ( http://packages.python.org/APScheduler/cronschedule.html) to spawn a process once it has stopped. Now, I am able to spawn the process once PID of the process has been killed, but when I spawn it using the apscheduler I am not able to get the process id (PID) of the newly scheduled process; Hence, I am not able to monitor the process. Is there a function in apscheduler to get the process ID of the scheduled process?
Instead of relying on APSchedule to return the pid, why not have your program report the pid itself. It's quite common for daemons to have pidfiles, which are files at a known location that just contain the pid of the running process. Just wrap your main function in something like this:
import os
try:
with open("/tmp/myproc.pid") as pidfile:
pidfile.write(str(os.getpid()))
main()
finally:
os.remove("/tmp/myproc.pid")
Now whenever you want to monitor your process you can firstly check to see in the pid file exists, and if it does, retrieve the pid of the process for further monitoring. This has the benefit of being independent of a specific implementation of cron, and will make it easier in future if you want to write more programs that interact with the program locally.

Categories