I'm using amqp module with RabbitMQ. I needed to create a non blocking consumer. For that purpose I used threading module.
There is one problem: I must make threads stopped on app exit. Here is the part of the code:
c.start_consuming(message_callback)
while not self._stop.isSet():
if self._stop.isSet():
print("thread will be stopped")
else:
print("thread will NOT BE STOPPED")
c.channel.wait()
Problem is, c.channel.wait() sometime blocks and sometime not depending on whether there are some messages in the queue it is listening or not (I did some experiments but they are not enough)
If there were a timeout argument I could use with c.channel.wait() function, I could achieve that goal by setting the timeout, for example 0.1 seconds. As far as I search the source code, there is no timeout option.
Main Question: How can I create a non-blocking consumer with amqp module?
Sub Question 1: How can I patch the amqp code so it starts using a timeout value?
Fallback solution: I may consider using multiprocessing module in order to kill that process anytime.
Related
I'm having a kafka consumer which is running in a thread in my django application, I want to apply some monitoring and alerting on that thread. So how can I add thread monitoring (check state if it is alive or dead) and if thread is dead then need to raise an alert.
I have tried monitoring by creating scheduler which runs every 10 mins and calls thread.is_alive() method. But the problem is the scheduler is running in a different process and unable to access main process' s thread. So how can I resolve this?
To monitor a thread in Python, you can use the threading module, which is a built-in module in Python that provides a number of functions for working with threads. Here is an example of how you can use the threading module to monitor a thread:
import threading
def my_function():
my_thread = threading.Thread(target=my_function)
my_thread.start()
if my_thread.is_alive():
# Do something
The threading module provides a number of other functions and methods that you can use to monitor and control threads, such as join(), isDaemon(), and setDaemon(). For more information, you can read the official Python documentation for the threading module: https://docs.python.org/3/library/threading.html.
I have client and server module, each one can be started by a function. I just need to find a way to run booth in parallel which:
in case of an exception in the client/server would stop the other so the test runner would not stay stuck
in case of an exception in client/server would print the exception or propagate it to the runner so I could see it and debug the client/server using the test suite
would preferably use threads for performance reasons
The first tentative with simple threads ended with an ugly os._exit(1) when catching a exception in the run method of the thread (which kills the test runner...) Edit: with the threading package
The second tentative (to try to avoid os._exit()) was with concurrent.futures.ThreadPoolExecutor. It allows to get the exception out of the thread but I still can't find a way to abort the other thread.
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
server_future = executor.submit(server)
client_future = executor.submit(client)
concurrent.futures.wait([server_future, client_future],
return_when=concurrent.futures.FIRST_EXCEPTION)
if client_future.done() && client_future.exception():
# we can handle the client exception here
# but how to stop the server from waiting the client?
# also, raise is blocking
if server_future.done() && server_future.exception():
# same here
Is there a way to achieve this with threads?
If not with threads, is there a simple way to test a client server app at all? (I think the two first requirements are enough to have a usable solution)
Edit: The client or the server would be blocked on an accept() or a receive() call so I can't periodically pool a flag a decide to exit.(one of classic method to stop a thread)
You can use the threading package. Be aware though that force killing thread is not a good idea, as discussed here. It seems there is no official way to kill Thread in Python, but you can follow one of the example given on the linked post.
Now you need to wait for one thread to exit before stopping the other one, avoiding your test runner to be stuck. You can use Threads wrapping your server/client launch, and have your main Thread waiting for either client/server Thread to exit before killing the other one.
You can define your client/server Thread like this:
# Server thread (replace
class testServerThread (threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
# Do stuff if required
def run(self):
try:
startServer() # Or startClient() for your client thread
except: Exception
# Print your exception here, so you can debug
Then, start both client and server thread, and wait for one of them to exit. Once one of them is not alive anymore, you can kill the other and continue on testing.
# Create and start client/server
serverThread = testServerThread ()
clientThread = testClientThread ()
serverThread.start()
clientThread.start()
# Wait at most 5 seconds for them to exit, and loop if they're still both alive
while(serverThread.is_alive() and clientThread.is_alive()):
serverThread.join(5)
clientThread.join(5)
# Either client or server exited. Kill the other one.
# Note: the kill function you'll have to define yourself, as said above
if(serverThread.is_alive()):
serverThread.kill()
if(clientThread.islive()):
clientThread.kill()
# Done! Your Test runner can continue its work
The central piece of code is the join() function:
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception –, or until the optional timeout occurs.
So in our case, it will wait 5 seconds for the client and 5 seconds for the server, and if both of them are still alive afterward it will loop again. Whenever one of them exit, the loop will stop, and the remaining thread will be killed.
I am creating a custom job scheduler with a web frontend in python 3.4 on linux. This program creates a daemon (consumer) thread that waits for jobs to come available in a PriorityQueue. These jobs can manually be added through the web interface which adds them to the queue. When the consumer thread finds a job, it executes a program using subprocess.run, and waits for it to finish.
The basic idea of the worker thread:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
proc = subprocess.run("myprogram", timeout=my_timeout)
#do some more things
except TimeoutExpired:
#do some administration
self.queue.add(job)
However:
This consumer should be able to receive some kind of signal from the frontend (main thread) that it should stop the current job and instead work on the next job in the queue (saving the state of the current job and adding it to the end of the queue again). This can (and will most likely) happen while blocked on subprocess.run().
The subprocesses can simply be killed (the program that is executed saves sme state in a file) but the worker thread needs to do some administration on the killed job to make sure it can be resumed later on.
There can be multiple such worker threads.
Signal handlers are not an option (since they are always handled by the main thread which is a webserver and should not be bothered with this).
Having an event loop in which the process actively polls for events (such as the child exiting, the timeout occurring or the interrupt event) is in this context not really a solution but an ugly hack. The jobs are performance-heavy and constant context switches are unwanted.
What synchronization primitives should I use to interrupt this thread or to make sure it waits for several events at the same time in a blocking fashion?
I think you've accidentally glossed over a simple solution: your second bullet point says that you have the ability to kill the programs that are running in subprocesses. Notice that subprocess.call returns the return code of the subprocess. This means that you can let the main thread kill the subprocess, and just check the return code to see if you need to do any cleanup. Even better, you could use subprocess.check_call instead, which will raise an exception for you if the returncode isn't 0. I don't know what platform you're working on, but on Linux, killed processes generally don't return a 0 if they're killed.
It could look something like this:
class Worker(threading.Thread):
def __init__(self, queue):
self.queue = queue
# more code here
def run(self):
while True:
try:
job = self.queue.get()
#do some work
subprocess.check_call("myprogram", timeout=my_timeout)
#do some more things
except (TimeoutExpired, subprocess.CalledProcessError):
#do some administration
self.queue.add(job)
Note that if you're using Python 3.5, you can use subprocess.run instead, and set the check argument to True.
If you have a strong need to handle the cases where the worker needs to be interrupted when it isn't running the subprocess, then I think you're going to have to use a polling loop, because I don't think the behavior you're looking for is supported for threads in Python. You can use a threading.Event object to pass the "stop working now" pseudo-signal from your main thread to the worker, and have the worker periodically check the state of that event object.
If you're willing to consider using multiple processing stead of threads, consider switching over to the multiprocessing module, which would allow you to handle signals. There is more overhead to spawning full-blown subprocesses instead of threads, but you're essentially looking for signal-like asynchronous behavior, and I don't think Python's threading library supports anything like that. One benefit though, would be that you would be freed from the Global Interpreter Lock(PDF link), so you may actually see some speed benefits if your worker processes (formerly threads) are doing anything CPU intensive.
I am working on an implementation of a very small library in Python that has to be non-blocking.
On some production code, at some point, a call to this library will be done and it needs to do its own work, in its most simple form it would be a callable that needs to pass some information to a service.
This "passing information to a service" is a non-intensive task, probably sending some data to an HTTP service or something similar. It also doesn't need to be concurrent or to share information, however it does need to terminate at some point, possibly with a timeout.
I have used the threading module before and it seems the most appropriate thing to use, but the application where this library will be used is so big that I am worried of hitting the threading limit.
On local testing I was able to hit that limit at around ~2500 threads spawned.
There is a good possibility (given the size of the application) that I can hit that limit easily. It also makes me weary of using a Queue given the memory implications of placing tasks at a high rate in it.
I have also looked at gevent but I couldn't see an example of being able to spawn something that would do some work and terminate without joining. The examples I went through where calling .join() on a spawned Greenlet or on an array of greenlets.
I don't need to know the result of the work being done! It just needs to fire off and try to talk to the HTTP service and die with a sensible timeout if it didn't.
Have I misinterpreted the guides/tutorials for gevent ? Is there any other possibility to spawn a callable in fully non-blocking fashion that can't hit a ~2500 limit?
This is a simple example in Threading that does work as I would expect:
from threading import Thread
class Synchronizer(Thread):
def __init__(self, number):
self.number = number
Thread.__init__(self)
def run(self):
# Simulating some work
import time
time.sleep(5)
print self.number
for i in range(4000): # totally doesn't get past 2,500
sync = Synchronizer(i)
sync.setDaemon(True)
sync.start()
print "spawned a thread, number %s" % i
And this is what I've tried with gevent, where it obviously blocks at the end to
see what the workers did:
def task(pid):
"""
Some non-deterministic task
"""
gevent.sleep(1)
print('Task', pid, 'done')
for i in range(100):
gevent.spawn(task, i)
EDIT:
My problem stemmed out from my lack of familiarity with gevent. While the Thread code was indeed spawning threads, it also prevented the script from terminating while it did some work.
gevent doesn't really do that in the code above, unless you add a .join(). All I had to do to see the gevent code do some work with the spawned greenlets was to make it a long running process. This definitely fixes my problem as the code that needs to spawn the greenlets is done within a framework that is a long running process in itself.
Nothing requires you to call join in gevent, if you're expecting your main thread to last longer than any of your workers.
The only reason for the join call is to make sure the main thread lasts at least as long as all of the workers (so that the program doesn't terminate early).
Why not spawn a subprocess with a connected pipe or similar and then, instead of a callable, just drop your data on the pipe and let the subprocess handle it completely out of band.
As explained in Understanding Asynchronous/Multiprocessing in Python, asyncoro framework supports asynchronous, concurrent processes. You can run tens or hundreds of thousands of concurrent processes; for reference, running 100,000 simple processes takes about 200MB. If you want to, you can mix threads in rest of the system and coroutines with asyncoro (provided threads and coroutines don't share variables, but use coroutine interface functions to send messages etc.).
I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?
Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()
You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.