How to get exception occurring in a polling thread? - python

I am using the concurrent.futures module and my code looks like this
class SomeClass:
def __init__(self):
executor = ThreadPoolExecutor(max_workers=1)
executor.submit(self._start_polling())
def _start_polling(self):
while True:
# polling some stuff
def get_stuff(self):
# returns some stuff
I will be using this class to get_stuff() multiple times in my code and I need to polling to make sure I always get the latest stuff (the stuff changes from time to time by some other program). Now, if there occurs an exception in the polling thread, how do I raise it in the main thread and stop the entire program? Currently, if there's an exception the polling thread dies and get_stuff() returns stale data.
I tried getting the future object but if I use it in any way like future.exception() it just blocks the execution of the main thread on it. Any advice would be much appreciated.
Edit:
I first looked into asyncio to do this but after reading about it a bunch it looks like running a asyncio is not really good for running background tasks like this. Correct me if I am wrong.

Related

Is resetting Python thread/process tasks by calling self.run() pythonic?

Regarding the code below of the process class MyProcessClass, sometimes I want to rerun all of the self.run tasks.
self.run(retry=True) is what I use to rerun the run(self) tasks within the class. It allows me rerun the tasks of the process class run(self) whenever I want to from wherever I want to from any class function.
MyProcessClass(Process):
def __init__(self):
Process.__init__(self)
#gets called automatically on class initialization
#process.start().
#it also gets called when a class function calls
#self.run(retry=True)
def run(self,end=False,retry=False):
if end==True:
sys.exit()
elif retry==True:
redo_prep()
do_stuff()
#represents class functions doing stuff
def do_stuff():
#stuff happens well
return
#stuff happens and need to redo everything
self.run(retry=True)
I don't want the thread/process to end, but I want everything to rerun. Could this cause problems because the run function is being called recursively-ish and I am running hundreds of these process class objects at one time. The box hits about 32GB of memory when all are running. Only objects that need to will be rerun.
My goal is to rerun the self.run tasks if needed or end the thread if needed from anywhere in the class, be it 16 functions deep or 2. In a sense, I am resetting the thread's tasks, since I know resetting the thread from within doesn't work. I have seen other ideas regarding "resetting" threads from How to close a thread from within?. I am looking for the most pythonic way of dealing with rerunning class self.run tasks.
I usually use try-catch throughout the class:
def function():
while True:
try:
#something bad
except Exception as e:
#if throttle just wait
#otherwise, raise
else:
return
Additional Question: If I were to raise a custom exception to trigger a #retry for the retries module, would I have to re-raise? Is that more or less pythonic than the example above?
My script had crapped out in a way I hadn't seen before and I worried that calling the self.run(retry=True) had caused it to do this. I am trying to see if there is anything crazy about the way I am calling the self.run() within the process class.
It looks like you're implementing a rudimentary retrying scenario. You should consider delegating this to a library for this purpose, like retrying. This will probably be a better approach compared to the logic you're trying to implement within the thread to 'reset' it.
By raising/retrying on specific exceptions, you should be able to implement the proper error-handling logic cleanly with retrying. As a best-practice, you should avoid broad excepts and catch specific exceptions whenever possible.
Consider a pattern whereby the thread itself does not need to know if it will need to be 'reset' or restarted. Instead, if possible, try to have your thread return some value or exception info so the main thread can decide whether to re-queue a task.

Saving and retrieving thread object/state python

I am using the python threading module for the first time and this is what I am trying to find out -- can we save and load state of python thread object in some database or file? Let's say I have the below piece of code in a program (this program runs like a server using python twisted) to do something asynchronously (like copying files in a non-blocking fashion).
def foo():
print('Hello world!')
import threading
thr = threading.Thread(target=foo, args=(), kwargs={})
thr.start()
Now, I don't do thr.join() here as I don't want to wait for the thread to complete. Now, is it even possible to save the state of thread or object and later retrieve it and find out if the thread is still alive or not? I can get the thread id and status by doing this but only through the thread object.
# thr.ident
# check status of thread by doing thr.is_alive()
I may be completely wrong here - but is it possible to save/load thread objects? Also, looking for suggestions. Thanks a lot!
is it possible to save/load thread objects?
No, it is not.

Twisted callRemote

I have to make remote calls that can take quite a long time (over 60 seconds). Our entire code relies on processing the return value from the callRemote, so that's pretty bad since we're blocking on IO the whole time despite using twqisted + 50 worker threads running.
We currently use something like
result = threads.blockingCallFromThread(reactor, callRemote, "method", args)
and get the result/go on, but as its name says it's blocking the event loop so we cannot wait for several results at the same time.
THere's no way I can refactor the whole code to make it asynchronous so I think the only way is to defer the long IO tasks to threads.
I'm trying to make the remote calls in threads, but I can't find a way to get the result from the blocking calls back. The remoteCalls are made, the result is somewhere but I just can't get a hook on it.
What I'm trying to do currently looks like
reactor.callInThread(callRemote, name, *args, **kw)
which returns a empty Deferred (why ?).
I'm trying to put the result in some sort of queue but it just won't work. How do I do that ?
AFAIK, blockingCallFromThread executes code in reactor's thread. That's why it doesn't work as you need.
If I understand you properly, you need to move some operation out off reactors thread and get the result into reactors thread.
I use approach with deferToThread for the same case.
Example with deferreds:
import time
from twisted.internet import reactor, threads
def doLongCalculation():
time.sleep(1)
return 3
def printResult(x):
print x
# run method in thread and get result as defer.Deferred
d = threads.deferToThread(doLongCalculation)
d.addCallback(printResult)
reactor.run()
Also, you might be interested in threads.deferToThreadPool.
Documentation about threading in Twisted.

Access a Python Code That is Running

I apologize if this isn't the correct way to word it, but I'm not sure where to start. If this question needs to be reworded, I will definitely do that.
I have just finished writing a piece of code that is collecting data from a variety of servers. It is currently running, and I would like to be able to start writing other pieces of code that can access the data being collected. Obviously I can do this by dumping the data into files, and have my data analysis code read the files stored on disk. However, for some forms of my analysis I would like to have something closer to real time data. Is there a way for me to access the class from my data collection piece of code without explicitly instantiating it? I mean, can I set up one piece of code to start the data collection, and then write other pieces of code later that are able to access the data collection class without stopping and restarting the data collection piece of code?
I hope that makes sense. I realize the data can just be stored to disk, and I could do things like just have my data analysis code search directories for changes. However, I am just curious to know if something like this can be done.
This seems to be like a Producer Consumer problem.
The producer's job is to generate a piece of data, put it into the
buffer and start again. At the same time, the consumer is consuming
the data (i.e., removing it from the buffer) one piece at a time
The catch here is "At the same time". So, producer and consumer need
to run concurrently. Hence we need separate threads for Producer and
Consumer.
I am taking code from the above link, you should go through it for extra details.
from threading import Thread
import time
import random
from Queue import Queue
queue = Queue(10)
class ProducerThread(Thread):
def run(self):
nums = range(5)
global queue
while True:
num = random.choice(nums)
queue.put(num)
print "Produced", num
time.sleep(random.random())
class ConsumerThread(Thread):
def run(self):
global queue
while True:
num = queue.get()
queue.task_done()
print "Consumed", num
time.sleep(random.random())
ProducerThread().start()
ConsumerThread().start()
Explanation :
We are using a Queue instance(hereafter queue).Queue has a Condition
and that condition has its lock. You don't need to bother about
Condition and Lock if you use Queue.
Producer uses put available on queue to insert data in the queue.
put() has the logic to acquire the lock before inserting data in
queue.
Also put() checks whether the queue is full. If yes, then it calls
wait() internally and so producer starts waiting.
Consumer uses get.
get() acquires the lock before removing data from queue.
get() checks if the queue is empty. If yes, it puts consumer in
waiting state.

Pattern for a background Twisted server that fills an incoming message queue and empties an outgoing message queue?

I'd like to do something like this:
twistedServer.start() # This would be a nonblocking call
while True:
while twistedServer.haveMessage():
message = twistedServer.getMessage()
response = handleMessage(message)
twistedServer.sendResponse(response)
doSomeOtherLogic()
The key thing I want to do is run the server in a background thread. I'm hoping to do this with a thread instead of through multiprocessing/queue because I already have one layer of messaging for my app and I'd like to avoid two. I'm bringing this up because I can already see how to do this in a separate process, but what I'd like to know is how to do it in a thread, or if I can. Or if perhaps there is some other pattern I can use that accomplishes this same thing, like perhaps writing my own reactor.run method. Thanks for any help.
:)
The key thing I want to do is run the server in a background thread.
You don't explain why this is key, though. Generally, things like "use threads" are implementation details. Perhaps threads are appropriate, perhaps not, but the actual goal is agnostic on the point. What is your goal? To handle multiple clients concurrently? To handle messages of this sort simultaneously with events from another source (for example, a web server)? Without knowing the ultimate goal, there's no way to know if an implementation strategy I suggest will work or not.
With that in mind, here are two possibilities.
First, you could forget about threads. This would entail defining your event handling logic above as only the event handling parts. The part that tries to get an event would be delegated to another part of the application, probably something ultimately based on one of the reactor APIs (for example, you might set up a TCP server which accepts messages and turns them into the events you're processing, in which case you would start off with a call to reactor.listenTCP of some sort).
So your example might turn into something like this (with some added specificity to try to increase the instructive value):
from twisted.internet import reactor
class MessageReverser(object):
"""
Accept messages, reverse them, and send them onwards.
"""
def __init__(self, server):
self.server = server
def messageReceived(self, message):
"""
Callback invoked whenever a message is received. This implementation
will reverse and re-send the message.
"""
self.server.sendMessage(message[::-1])
doSomeOtherLogic()
def main():
twistedServer = ...
twistedServer.start(MessageReverser(twistedServer))
reactor.run()
main()
Several points to note about this example:
I'm not sure how your twistedServer is defined. I'm imagining that it interfaces with the network in some way. Your version of the code would have had it receiving messages and buffering them until they were removed from the buffer by your loop for processing. This version would probably have no buffer, but instead just call the messageReceived method of the object passed to start as soon as a message arrives. You could still add buffering of some sort if you want, by putting it into the messageReceived method.
There is now a call to reactor.run which will block. You might instead write this code as a twistd plugin or a .tac file, in which case you wouldn't be directly responsible for starting the reactor. However, someone must start the reactor, or most APIs from Twisted won't do anything. reactor.run blocks, of course, until someone calls reactor.stop.
There are no threads used by this approach. Twisted's cooperative multitasking approach to concurrency means you can still do multiple things at once, as long as you're mindful to cooperate (which usually means returning to the reactor once in a while).
The exact times the doSomeOtherLogic function is called is changed slightly, because there's no notion of "the buffer is empty for now" separate from "I just handled a message". You could change this so that the function is installed called once a second, or after every N messages, or whatever is appropriate.
The second possibility would be to really use threads. This might look very similar to the previous example, but you would call reactor.run in another thread, rather than the main thread. For example,
from Queue import Queue
from threading import Thread
class MessageQueuer(object):
def __init__(self, queue):
self.queue = queue
def messageReceived(self, message):
self.queue.put(message)
def main():
queue = Queue()
twistedServer = ...
twistedServer.start(MessageQueuer(queue))
Thread(target=reactor.run, args=(False,)).start()
while True:
message = queue.get()
response = handleMessage(message)
reactor.callFromThread(twistedServer.sendResponse, response)
main()
This version assumes a twistedServer which works similarly, but uses a thread to let you have the while True: loop. Note:
You must invoke reactor.run(False) if you use a thread, to prevent Twisted from trying to install any signal handlers, which Python only allows to be installed in the main thread. This means the Ctrl-C handling will be disabled and reactor.spawnProcess won't work reliably.
MessageQueuer has the same interface as MessageReverser, only its implementation of messageReceived is different. It uses the threadsafe Queue object to communicate between the reactor thread (in which it will be called) and your main thread where the while True: loop is running.
You must use reactor.callFromThread to send the message back to the reactor thread (assuming twistedServer.sendResponse is actually based on Twisted APIs). Twisted APIs are typically not threadsafe and must be called in the reactor thread. This is what reactor.callFromThread does for you.
You'll want to implement some way to stop the loop and the reactor, one supposes. The python process won't exit cleanly until after you call reactor.stop.
Note that while the threaded version gives you the familiar, desired while True loop, it doesn't actually do anything much better than the non-threaded version. It's just more complicated. So, consider whether you actually need threads, or if they're merely an implementation technique that can be exchanged for something else.

Categories