I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?
Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()
You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.
Related
I have a query. I have seen examples where developers write something like the code as follows:
import threading
def do_something():
return true
t = threading.Thread(target=do_something)
t.start()
t.join()
I know that join() signals the interpreter to wait till the thread is completely executed. But what if I do not write t.join()? Will the thread get closed automatically and will it be reused later?
Please let me know the answer. It's my first attempt at creating a multi-threaded application in Python 3.5.0.
A Python thread is just a regular OS thread. If you don't join it, it still keeps running concurrently with the current thread. It will eventually die, when the target function completes or raises an exception. No such thing as "thread reuse" exists, once it's dead it rests in peace.
Unless the thread is a "daemon thread" (via a constructor argument daemon or assigning the daemon property) it will be implicitly joined for before the program exits, otherwise, it is killed abruptly.
One thing to remember when writing multithreading programs in Python, is that they only have limited use due to infamous Global interpreter lock. In short, using threads won't make your CPU-intensive program any faster. They can be useful only when you perform something involving waiting (e.g. you wait for certain file system event to happen in a thread).
The join part means the main program will wait for the thread to end before continuing. Without join, the main program will end and the thread will continue.
Now if you set the daemon parameter to "True", it means the thread will depends on the main program, and it will ends if the main program ends before.
Here is an example to understand better :
import threading
import time
def do_something():
time.sleep(2)
print("do_something")
return True
t = threading.Thread(target=do_something)
t.daemon = True # without the daemon parameter, the function in parallel will continue even your main program ends
t.start()
t.join() # with this, the main program will wait until the thread ends
print("end of main program")
no daemon, no join:
end of main program
do_something
daemon only:
end of main program
join only:
do_something
end of main program
daemon and join:
do_something
end of main program
# Note : in this case the daemon parameter is useless
Without join(), non-daemon threads are running and are completed with the main thread concurrently.
Without join(), daemon threads are running with the main thread concurrently and when the main thread is completed, the daemon threads are exited without completed if the daemon threads are still running.
You can see my answer in this post explaining about it in detail.
I have client and server module, each one can be started by a function. I just need to find a way to run booth in parallel which:
in case of an exception in the client/server would stop the other so the test runner would not stay stuck
in case of an exception in client/server would print the exception or propagate it to the runner so I could see it and debug the client/server using the test suite
would preferably use threads for performance reasons
The first tentative with simple threads ended with an ugly os._exit(1) when catching a exception in the run method of the thread (which kills the test runner...) Edit: with the threading package
The second tentative (to try to avoid os._exit()) was with concurrent.futures.ThreadPoolExecutor. It allows to get the exception out of the thread but I still can't find a way to abort the other thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
server_future = executor.submit(server)
client_future = executor.submit(client)
concurrent.futures.wait([server_future, client_future],
return_when=concurrent.futures.FIRST_EXCEPTION)
if client_future.done() && client_future.exception():
# we can handle the client exception here
# but how to stop the server from waiting the client?
# also, raise is blocking
if server_future.done() && server_future.exception():
# same here
Is there a way to achieve this with threads?
If not with threads, is there a simple way to test a client server app at all? (I think the two first requirements are enough to have a usable solution)
Edit: The client or the server would be blocked on an accept() or a receive() call so I can't periodically pool a flag a decide to exit.(one of classic method to stop a thread)
You can use the threading package. Be aware though that force killing thread is not a good idea, as discussed here. It seems there is no official way to kill Thread in Python, but you can follow one of the example given on the linked post.
Now you need to wait for one thread to exit before stopping the other one, avoiding your test runner to be stuck. You can use Threads wrapping your server/client launch, and have your main Thread waiting for either client/server Thread to exit before killing the other one.
You can define your client/server Thread like this:
# Server thread (replace
class testServerThread (threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
# Do stuff if required
def run(self):
try:
startServer() # Or startClient() for your client thread
except: Exception
# Print your exception here, so you can debug
Then, start both client and server thread, and wait for one of them to exit. Once one of them is not alive anymore, you can kill the other and continue on testing.
# Create and start client/server
serverThread = testServerThread ()
clientThread = testClientThread ()
serverThread.start()
clientThread.start()
# Wait at most 5 seconds for them to exit, and loop if they're still both alive
while(serverThread.is_alive() and clientThread.is_alive()):
serverThread.join(5)
clientThread.join(5)
# Either client or server exited. Kill the other one.
# Note: the kill function you'll have to define yourself, as said above
if(serverThread.is_alive()):
serverThread.kill()
if(clientThread.islive()):
clientThread.kill()
# Done! Your Test runner can continue its work
The central piece of code is the join() function:
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception –, or until the optional timeout occurs.
So in our case, it will wait 5 seconds for the client and 5 seconds for the server, and if both of them are still alive afterward it will loop again. Whenever one of them exit, the loop will stop, and the remaining thread will be killed.
I am creating a custom job scheduler with a web frontend in python 3.4 on linux. This program creates a daemon (consumer) thread that waits for jobs to come available in a PriorityQueue. These jobs can manually be added through the web interface which adds them to the queue. When the consumer thread finds a job, it executes a program using subprocess.run, and waits for it to finish.
The basic idea of the worker thread:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
proc = subprocess.run("myprogram", timeout=my_timeout)
#do some more things
except TimeoutExpired:
#do some administration
self.queue.add(job)
However:
This consumer should be able to receive some kind of signal from the frontend (main thread) that it should stop the current job and instead work on the next job in the queue (saving the state of the current job and adding it to the end of the queue again). This can (and will most likely) happen while blocked on subprocess.run().
The subprocesses can simply be killed (the program that is executed saves sme state in a file) but the worker thread needs to do some administration on the killed job to make sure it can be resumed later on.
There can be multiple such worker threads.
Signal handlers are not an option (since they are always handled by the main thread which is a webserver and should not be bothered with this).
Having an event loop in which the process actively polls for events (such as the child exiting, the timeout occurring or the interrupt event) is in this context not really a solution but an ugly hack. The jobs are performance-heavy and constant context switches are unwanted.
What synchronization primitives should I use to interrupt this thread or to make sure it waits for several events at the same time in a blocking fashion?
I think you've accidentally glossed over a simple solution: your second bullet point says that you have the ability to kill the programs that are running in subprocesses. Notice that subprocess.call returns the return code of the subprocess. This means that you can let the main thread kill the subprocess, and just check the return code to see if you need to do any cleanup. Even better, you could use subprocess.check_call instead, which will raise an exception for you if the returncode isn't 0. I don't know what platform you're working on, but on Linux, killed processes generally don't return a 0 if they're killed.
It could look something like this:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
subprocess.check_call("myprogram", timeout=my_timeout)
#do some more things
except (TimeoutExpired, subprocess.CalledProcessError):
#do some administration
self.queue.add(job)
Note that if you're using Python 3.5, you can use subprocess.run instead, and set the check argument to True.
If you have a strong need to handle the cases where the worker needs to be interrupted when it isn't running the subprocess, then I think you're going to have to use a polling loop, because I don't think the behavior you're looking for is supported for threads in Python. You can use a threading.Event object to pass the "stop working now" pseudo-signal from your main thread to the worker, and have the worker periodically check the state of that event object.
If you're willing to consider using multiple processing stead of threads, consider switching over to the multiprocessing module, which would allow you to handle signals. There is more overhead to spawning full-blown subprocesses instead of threads, but you're essentially looking for signal-like asynchronous behavior, and I don't think Python's threading library supports anything like that. One benefit though, would be that you would be freed from the Global Interpreter Lock(PDF link), so you may actually see some speed benefits if your worker processes (formerly threads) are doing anything CPU intensive.
I am writing a class that creates threads that timeout if not used within a certain time. The class allows you to pump data to a specific thread (by keyword), and if it doesn't exist it creates the thread.
Anywho, the problem I have is main supervisor class doesn't know when threads have ended. I can't put blocking code like join or poll to see if it's alive. What I want is an event handler, that is called when a thread ends (or is just about to end) so that I can inform the supervisor that the thread is no longer active.
Is this something that can be done with signal or something similar?
As psuedocode, I'm looking for something like:
def myHandlerFunc():
# inform supervisor the thread is dead
t1 = ThreadFunc()
t1.eventHandler(condition=thread_dies, handler=myHandlerFunc)
EDIT: Perhaps a better way would be to pass a ref to the parent down to the thread, and have the thread tell parent class directly. I'm sure someone will tell me off for data flow inversion.
EDIT: Here is some psuedocode:
class supervisor():
def __init__:
Setup thread dict with all threads as inactive
def dispatch(target, message):
if(target thread inactive):
create new thread
send message to thread
def thread_timeout_handler():
# Func is called asynchronously when a thread dies
# Does some stuff over here
def ThreadFunc():
while( !timeout ):
wait for message:
do stuff with message
(Tell supervisor thread is closing?)
return
The main point is that you send messages to the threads (referenced by keyword) through the supervisor. The supervisor makes sure the thread is alive (since they timeout after a while), creates a new one if it dies, and sends the data over.
Looking at this again, it's easy to avoid needing an event handler as I can just check if the thread is alive using threadObj.isAlive() instead of dynamically keeping a dict of thread statuses.
But out of curiosity, is it possible to get a handler to be called in the supervisor class by signals sent from the thread? The main App code would call the supervisor.dispatch() function once, then do other stuff. It would later be interrupted by the thread_timeout_handler function, as the thread had closed.
You still don't mention if you are using a message/event loop framework, which would provide a way for you to dispatch a call to the "main" thread and call an event handler.
Assuming you're not, than you can't just interrupt or call into the main thread.
You don't need to, though, as you only need to know if a thread is alive when you decide if you need to create a new one. You can do your checking at this time. This way, you only need a way to communicate the "finished" state between threads. There are a lot of ways to do this (I've never used .isAlive(), but you can pass information back in a Queue, Event, or even a shared variable).
Using Event it would look something like this:
class supervisor():
def __init__:
Setup thread dict with all threads as inactive
def dispatch(target, message):
if(thread.event.is_set()):
create new thread
thread.event = Event()
send message to thread
def ThreadFunc(event):
while( !timeout ):
wait for message:
do stuff with message
event.set()
return
Note that this way there is still a possible race condition. The supervisor thread might check is_set() right before the worker thread calls .set() which will lie about the thread's ability to do work. The same problem would exist with isAlive().
Is there a reason you don't just use a threadpool?
Yet the thread module works for me. How to check if thread made by module thread (in Python 3 _thread) is running? When the function the thread is doing ends, the thread ends too, or doesn't?
def __init__(self):
self.thread =None
......
if self.thread==None or not self.thread.isAlive() :
self.thread = thread.start_new_thread(self.dosomething,())
else:
tkMessageBox.showwarning("XXXX","There's no need to have more than two threads")
I know there is no function called isAlive() in "thread" module, is there any alternative?
But there isn't any reason why to use "threading" module, is there?
Unless you really need the low-level capabilities of the internal thread (_thread module, you really should use the threading module instead. It makes everything easier to use and does come with helpers such as is_alive.
Btw. the alternative of restarting a thread like you do in your example code would be to keep it running but have it wait for additional jobs. E.g. you could have a queue somewhere which keeps track of all jobs you want the thread to do, and the thread keeps working on them until the queue is empty—and then it will not terminate but wait for new jobs to appear. And only at the end of the application, you signalize the thread to stop waiting and terminate it.