Python - Notifying another thread blocked on subprocess - python

I am creating a custom job scheduler with a web frontend in python 3.4 on linux. This program creates a daemon (consumer) thread that waits for jobs to come available in a PriorityQueue. These jobs can manually be added through the web interface which adds them to the queue. When the consumer thread finds a job, it executes a program using subprocess.run, and waits for it to finish.
The basic idea of the worker thread:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
proc = subprocess.run("myprogram", timeout=my_timeout)
#do some more things
except TimeoutExpired:
#do some administration
self.queue.add(job)
However:
This consumer should be able to receive some kind of signal from the frontend (main thread) that it should stop the current job and instead work on the next job in the queue (saving the state of the current job and adding it to the end of the queue again). This can (and will most likely) happen while blocked on subprocess.run().
The subprocesses can simply be killed (the program that is executed saves sme state in a file) but the worker thread needs to do some administration on the killed job to make sure it can be resumed later on.
There can be multiple such worker threads.
Signal handlers are not an option (since they are always handled by the main thread which is a webserver and should not be bothered with this).
Having an event loop in which the process actively polls for events (such as the child exiting, the timeout occurring or the interrupt event) is in this context not really a solution but an ugly hack. The jobs are performance-heavy and constant context switches are unwanted.
What synchronization primitives should I use to interrupt this thread or to make sure it waits for several events at the same time in a blocking fashion?

I think you've accidentally glossed over a simple solution: your second bullet point says that you have the ability to kill the programs that are running in subprocesses. Notice that subprocess.call returns the return code of the subprocess. This means that you can let the main thread kill the subprocess, and just check the return code to see if you need to do any cleanup. Even better, you could use subprocess.check_call instead, which will raise an exception for you if the returncode isn't 0. I don't know what platform you're working on, but on Linux, killed processes generally don't return a 0 if they're killed.
It could look something like this:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
subprocess.check_call("myprogram", timeout=my_timeout)
#do some more things
except (TimeoutExpired, subprocess.CalledProcessError):
#do some administration
self.queue.add(job)
Note that if you're using Python 3.5, you can use subprocess.run instead, and set the check argument to True.
If you have a strong need to handle the cases where the worker needs to be interrupted when it isn't running the subprocess, then I think you're going to have to use a polling loop, because I don't think the behavior you're looking for is supported for threads in Python. You can use a threading.Event object to pass the "stop working now" pseudo-signal from your main thread to the worker, and have the worker periodically check the state of that event object.
If you're willing to consider using multiple processing stead of threads, consider switching over to the multiprocessing module, which would allow you to handle signals. There is more overhead to spawning full-blown subprocesses instead of threads, but you're essentially looking for signal-like asynchronous behavior, and I don't think Python's threading library supports anything like that. One benefit though, would be that you would be freed from the Global Interpreter Lock(PDF link), so you may actually see some speed benefits if your worker processes (formerly threads) are doing anything CPU intensive.

Related

python - linux - starting a thread immediately

here's some example code
while True: #main-loop
if command_received:
thread = Thread(target = doItNOW)
thread.start()
......
def doItNOW():
some_blocking_operations()
my problem is I need "some_blocking_operations" to start IMMEDIATELY (as soon as command_received is True).
but since they're blocking i can't execute them on my main loop
and i can't change "some_blocking_operations" to be non-blocking either
for "IMMEDIATELY" i mean as soon as possible, not more than 10ms delay.
(i once got a whole second of delay).
if it's not possible, a constant delay would also be acceptable. (but it MUST be constant. with very few milliseconds of error)
i'm currently working on a linux system (Ubuntu, but it may be another one in the future. always linux)
a python solution would be amazing.. but a different one would be better than nothing
any ideas?
thanks in advance
from threading import Thread
class worker(Thread):
def __init__(self, someParameter=True):
Thread.__init__(self)
# This is how you "send" parameters/variables
# into the thread on start-up that the thread can use.
# This is just an example in case you need it.
self.someVariable = someParameter
self.start() # Note: This makes the thread self-starting,
# you could also call .start() in your main loop.
def run():
# And this is how you use the variable/parameter
# that you passed on when you created the thread.
if self.someVariable is True:
some_blocking_operations()
while True: #main-loop
if command_received:
worker()
This is a non-blocking execution of some_blocking_operations() in a threaded manner. I'm not sure if what you're looking for is to actually wait for the thread to finish or not, or if you even care?
If all you want to do is wait for a "command" to be received and then to execute the blocking operations without waiting for it, verifying that it completes, then this should work for you.
Python mechanics of threading
Python will only run in one CPU core, what you're doing here is simply running multiple executions on overlapping clock invervals in the CPU. Meaning every other cycle in the CPU an execution will be made in the main thread, and the other your blocking call will get a chance to run a execution. They won't actually run in parallel.
There are are some "You can, but..." threads.. Like this one:
is python capable of running on multiple cores?

How to check if thread made by module thread (in Python 3 _thread) is running?

Yet the thread module works for me. How to check if thread made by module thread (in Python 3 _thread) is running? When the function the thread is doing ends, the thread ends too, or doesn't?
def __init__(self):
self.thread =None
......
if self.thread==None or not self.thread.isAlive() :
self.thread = thread.start_new_thread(self.dosomething,())
else:
tkMessageBox.showwarning("XXXX","There's no need to have more than two threads")
I know there is no function called isAlive() in "thread" module, is there any alternative?
But there isn't any reason why to use "threading" module, is there?
Unless you really need the low-level capabilities of the internal thread (_thread module, you really should use the threading module instead. It makes everything easier to use and does come with helpers such as is_alive.
Btw. the alternative of restarting a thread like you do in your example code would be to keep it running but have it wait for additional jobs. E.g. you could have a queue somewhere which keeps track of all jobs you want the thread to do, and the thread keeps working on them until the queue is empty—and then it will not terminate but wait for new jobs to appear. And only at the end of the application, you signalize the thread to stop waiting and terminate it.

Processes sharing queue not terminating properly

I have a multiprocessing application where the parent process creates a queue and passes it to worker processes. All processes use this queue for creating a queuehandler for the purpose of logging. There is a worker process reading from this queue and doing logging.
The worker processes continuously check if parent is alive or not. The problem is that when I kill the parent process from command line, all workers are killed except for one. The logger process also terminates. I don't know why one process keeps executing. Is it because of any locks etc in queue? How to properly exit in this scenario? I am using
sys.exit(0)
for exiting.
I would use sys.exit(0) only if there is no other chance. It's always better to cleanly finish each thread / process. You will have some while loop in your Process. So just do break there, so that it can come to an end.
Tidy up before you leave, i.e., release all handles of external resources, e.g., files, sockets, pipes.
Somewhere in these handles might be the reason for the behavior you see.

What is a python thread

I have several questions regarding Python threads.
Is a Python thread a Python or OS implementation?
When I use htop a multi-threaded script has multiple entries - the same memory consumption, the same command but a different PID. Does this mean that a [Python] thread is actually a special kind of process? (I know there is a setting in htop to show these threads as one process - Hide userland threads)
Documentation says:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
My interpretation/understanding was: main thread terminates when all non-daemon threads are terminated.
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?
Python threads are implemented using OS threads in all implementations I know (C Python, PyPy and Jython). For each Python thread, there is an underlying OS thread.
Some operating systems (Linux being one of them) show all different threads launched by the same executable in the list of all running processes. This is an implementation detail of the OS, not of Python. On some other operating systems, you may not see those threads when listing all the processes.
The process will terminate when the last non-daemon thread finishes. At that point, all the daemon threads will be terminated. So, those threads are part of your process, but are not preventing it from terminating (while a regular thread will prevent it). That is implemented in pure Python. A process terminates when the system _exit function is called (it will kill all threads), and when the main thread terminates (or sys.exit is called), the Python interpreter checks if there is another non-daemon thread running. If there is none, then it calls _exit, otherwise it waits for the non-daemon threads to finish.
The daemon thread flag is implemented in pure Python by the threading module. When the module is loaded, a Thread object is created to represent the main thread, and it's _exitfunc method is registered as an atexit hook.
The code of this function is:
class _MainThread(Thread):
def _exitfunc(self):
self._Thread__stop()
t = _pickSomeNonDaemonThread()
if t:
if __debug__:
self._note("%s: waiting for other threads", self)
while t:
t.join()
t = _pickSomeNonDaemonThread()
if __debug__:
self._note("%s: exiting", self)
self._Thread__delete()
This function will be called by the Python interpreter when sys.exit is called, or when the main thread terminates. When the function returns, the interpreter will call the system _exit function. And the function will terminate, when there are only daemon threads running (if any).
When the _exit function is called, the OS will terminate all of the process threads, and then terminate the process. The Python runtime will not call the _exit function until all the non-daemon thread are done.
All threads are part of the process.
My interpretation/understanding was: main thread terminates when all
non-daemon threads are terminated.
So python daemon threads are not part of python program if "the entire
Python program exits when only daemon threads are left"?
Your understanding is incorrect. For the OS, a process is composed of many threads, all of which are equal (there is nothing special about the main thread for the OS, except that the C runtime add a call to _exit at the end of the main function). And the OS doesn't know about daemon threads. This is purely a Python concept.
The Python interpreter uses native thread to implement Python thread, but has to remember the list of threads created. And using its atexit hook, it ensures that the _exit function returns to the OS only when the last non-daemon thread terminates. When using "the entire Python program", the documentation refers to the whole process.
The following program can help understand the difference between daemon thread and regular thread:
import sys
import time
import threading
class WorkerThread(threading.Thread):
def run(self):
while True:
print 'Working hard'
time.sleep(0.5)
def main(args):
use_daemon = False
for arg in args:
if arg == '--use_daemon':
use_daemon = True
worker = WorkerThread()
worker.setDaemon(use_daemon)
worker.start()
time.sleep(1)
sys.exit(0)
if __name__ == '__main__':
main(sys.argv[1:])
If you execute this program with the '--use_daemon', you will see that the program will only print a small number of Working hard lines. Without this flag, the program will not terminate even when the main thread finishes, and the program will print Working hard lines until it is killed.
I'm not familiar with the implementation, so let's make an experiment:
import threading
import time
def target():
while True:
print 'Thread working...'
time.sleep(5)
NUM_THREADS = 5
for i in range(NUM_THREADS):
thread = threading.Thread(target=target)
thread.start()
The number of threads reported using ps -o cmd,nlwp <pid> is NUM_THREADS+1 (one more for the main thread), so as long as the OS tools detect the number of threads, they should be OS threads. I tried both with cpython and jython and, despite in jython there are some other threads running, for each extra thread that I add, ps increments the thread count by one.
I'm not sure about htop behaviour, but ps seems to be consistent.
I added the following line before starting the threads:
thread.daemon = True
When I executed the using cpython, the program terminated almost immediately and no process was found using ps, so my guess is that the program terminated together with the threads. In jython the program worked the same way (it didn't terminate), so maybe there are some other threads from the jvm that prevent the program from terminating or daemon threads aren't supported.
Note: I used Ubuntu 11.10 with python 2.7.2+ and jython 2.2.1 on java1.6.0_23
Python threads are practically an interpreter implementation, because the so called global interpreter lock (GIL), even if it's technically using the os-level threading mechanisms. On *nix it's utilizing the pthreads, but the GIL effectivly makes it a hybrid stucked to the application-level threading paradigm. So you will see it on *nix systems multiple times in a ps/top output, but it still behaves (performance-wise) like a software-implemented thread.
No, you are just seeing the kind of underlying thread implementation of your os. This kind of behaviur is exposed by *nix pthread-like threading or im told even windows does implement threads this way.
When your program closes, it waits for all threads to finish also. If you have threads, which could postpone the exit indefinitly, it may be wise to flag those threads as "daemons" and allow your program to finish even if those threads are still running.
Some reference material you might be interested:
Linux Gazette: Understanding Threading in Python.
Doug Hellman: Multi-processing techniques in Python
David Beazley: PyCon 2010:Understanding the Python GIL(Video-presentation)
There are great answers to the question, but I feel the daemon threads question is still not explained in a simple fashion. So this answer refers just to the third question
"main thread terminates when all non-daemon threads are terminated."
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?
If you think about what a daemon is, it is usually a service. Some code that runs in an infinite loop, that serves request, fill queues, accepts connections, etc. Other threads use it. It has no purpose when running by itself (in a single process terms).
So the program can't wait for the daemon thread to terminate, because it might never happen. Python will end the program when all non daemon threads are done. It also stops the daemon threads.
To wait until a daemon thread has completed its work, use the join() method.
daemon_thread.join() will make Python to wait for the daemon thread as well before exiting. The join() also accepts a timeout argument.

Parent Thread exiting before Child Threads [python]

I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?
Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()
You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.

Categories