Aborting HTTP request cross-thread

Aborting HTTP request cross-thread - python

I'm porting one of my projects from C# and am having trouble solving a multithreading issue in Python. The problem relates to a long-lived HTTP request, which is expected (the request will respond when a certain event occurs on the server). Here's the summary:
I send the request using urllib2 on a separate thread. When the request returns or times out, the main thread is notified. This works fine. However, there are cases where I need to abort this outstanding request and switch to a different URL. There are four solutions that I can consider:
Abort the outstanding request. C# has WebRequest.Abort(), which I can call cross-thread to abort the request. Python urllib2.Request appears to be a pure data class, in that instances only store request information; responses are not connected to Request objects. So I can't do this.
Interrupt the thread. C# has Thread.Interrupt(), which will raise a ThreadInterruptedException in the thread if it is in a wait state, or the next time it enters such a state. (Waiting on a monitor and file/socket I/O are both waiting states.) Python doesn't seem to have anything comparable; there does not appear to be a way to wake up a thread that is blocked on I/O.
Set a low timeout on the request. On a timeout, check an "aborted" flag. If it's false, restart the request.
Similar to option 3, add an "aborted" flag to the state object so that when the request does finally end in one way or another, the thread knows that the response is no longer needed and just shuts itself down.
Options 3 and 4 seem to be the only ones supported by Python, but option 3 is a horrible solution and 4 will keep open a connection I don't need. I am hoping to be a good netizen and close this connection when I no longer need it. Is there any way to actually abort the outstanding request, one way or another?

Consider using gevent. Gevent uses non-thread cooperating units of execution called greenlets. Greenlets can "block" on IO, which really means "go to sleep until the IO is ready". You could have a requester greenlet that owns the socket and a main greenlet that decides when to abort. When you want to abort and switch URLs the main greenlet kills the requester greenlet. The requester catches the resulting exception, closes its socket/urllib2 request, and starts over.
Edited to add: Gevent is not compatible with threads, so be careful with that. You'll have to either use gevent all the way or threads all the way. Threads in python are kinda lame anyway because of the GIL.

Similar to Spike Gronim's answer, but even more heavy handed.
Consider rewriting this in twisted. You probably would want to subclass twisted.web.http.HTTPClient, in particular implementing handleResponsePart to do your client interaction (or handleResponseEnd if you don't need to see it before the response ends). To close the connection early, you just call the loseConnection method on the client protocol.

Maybe this snippet of "killable thread" could be useful to you if you have no other choice. But i would have the same opinion as Spike Gronim and recommend using gevent.

I found this question using google and used Spike Gronim's answer to come up with:
from gevent import monkey
monkey.patch_all()
import gevent
import requests
def post(*args, **kwargs):
if 'stop_event' in kwargs:
stop_event = kwargs['stop_event']
del kwargs['stop_event']
else:
stop_event = None
req = gevent.spawn(requests.post, *args, **kwargs)
while req.value is None:
req.join(timeout=0.1)
if stop_event and stop_event.is_set():
req.kill()
break
return req.value
I thought it might be useful for other people as well.
It works just like a regular request.post, but takes an extra keyword argument 'stop_event'. This is a threading.Event. The request will abort if the stop_event gets set.
Use with caution, because if it's not waiting for either the connection or the communitation, it can block GIL (as mentioned). It (gevent) does seem compatible with threading these days (through monkey patch).

Related

pyzmq - zmq_req can I have one context and use several sockets?

I'm currently working on a Benchmark project, where I'm trying to stress the server out with zmq requests.
I was wondering what would be the best way to approach this, I was thinking of having a context to create a socket and push it into a thread, in which I would send request and wait for responses in each thread respectively, but I'm not too sure this is possible with python's limitations.
More over, would it be the same socket for all threads, that is, if I'm waiting for a response on one thread (With it's own socket), would it be possible for another thread to catch that response?
Thanks.
EDIT:
Test flow logic would be like this:
Client socket would use zmq.REQ.
Client sends message.
Client waits for a response.
If no response, client reconnects and tries again until limit.
I'd like to scale this operation up to any number of clients, preferring not to deal with Processes unless performance wise the difference is significant..
How would you do this?

Q : "...can I have one context and use several sockets?"
Oh sure you can.
Moreover, you can have several Context()-instances, each one managing ... almost... any number of Socket()-instances, each Socket()-instance's methods may get called from one and only one python-thread ( a Zen-of-Zero rule: zero-sharing ).
Due to known GIL-lock re-[SERIAL]-isation of all the thread-based code-execution flow, this still has to and will wait for acquiring the GIL-lock ownership, which in turn permits a GIL-lock owner ( and nobody else ) to execute a fixed amount of python instructions, before it re-releases the GIL-lock to some other thread...

Abort long running http operation

In my (python) code I have a thread listening for changes from a couchdb feed (continuous changes). The changes request has a timeout parameter which is too big in certain circumstances (for example when a user wants to interrupt the program manually with ^C).
How can I abort a long-running blocking http request?
Is this possible, or do I need to reduce the timeout to make my program more responsive?
This would be unfortunate, because having a timeout small enough to make the program really responsive (say, 1s), means that there are lots of connections being created (one per second!), which defeats the purpose of listening to changes, and makes it very difficult to make sure that we are not missing any changes (in the re-connecting timespan we can indeed miss changes, so that special code is needed to handle that case)
The other option is to forcefully abort the thread, but that is not really an option in python.

If I understand correctly it looks like you are waiting too long between requests before deciding whether to respond to the users or not. You are right continuously closing and creating new connections will defeat the purpose of changes feed.
A solution could be to use heartbeat query parameter in which couchdb will keep sending newlines to tell the client that the connection is still alive.
http://localhost:5984/hello/_changes?feed=continuous&heartbeat=1000&include_docs=true
as long as you are getting heartbeats (newlines) you can be sure that you are getting new changes. A new line will indicate that no changes have occurred. Where as an actual change will be reported back. No need to close the connection. Respond to your clients if resp!="/n"

Blocking the thread execution in general prevents the thread from beeing terminated. You need to wait until the request timed out. But this is already clear.
Using a library that supports non blocking requests is maybe a solution, but I don't know if there is any.
Anyway ... you've mentioned that reducing the timeout will lead to more connections. I'd suggest to implement a waiting loop between requests that can be interrupted by an external signal to terminate the thread. with this loop you can control the number of requests independent from the timeout.

How do I return an HTTP response in a callback in Flask, or does it even matter?

I am familiar with evented servers but not threaded ones. A common feature of REST APIs implemented in evented systems like node.js+express or tornado is, in an handler, to do some I/O work asynchronously in a callback, then return the actual HTTP response in the callback. In Express, we have things like:
app.post('/products', function (req, res) {
Product.create(req.body, function (err, product) {
if (err) return res.json(400, err);
res.send(201, product);
});
});
where Post.create hits the database and calls the callback after the product is persisted. The response, whether 201 or 400 is sent in the callback. This keeps the server freed up to do other things while the database is working, but from the point of view of the client, the request seems to take a while.
Suppose I wanted to do the same thing in Flask (which is not an evented server). If I have a POST handler to create several objects that needs to make several database writes that could take several seconds to complete, it seems I have two choices:
I could immediately return a 202 ACCEPTED but then this burdens the client with having to check back to see whether all the writes were committed.
I could just implement all the database writes directly inside the handler. The client will have to wait the few seconds for the reply, but it's synchronous and simple from the client perspective.
My question is whether if I do #2, is Flask smart enough to block the current request thread so that other requests can be handled during the database writes? I would hope the server doesn't block here.
BTW I have done long polling, but this is for a public REST API where clients expect simple requests and responses, so I think either approach 1 or 2 is best. Option 1 seems rare to me, but I am worried about #2 blocking the server? Am I right to be concerned, or is Flask (and threaded servers) just smart so I need not worry?

Blocking vs. non-blocking
Flask itself (much like express) is not inherently blocking or non-blocking - it relies on the underlying container to provide the features necessary for operation (reading data from the user and writing responses to the user). If the server does not provide an event loop (e. g. mod_wsgi) then Flask will block. If the server is a non-blocking one (e. g. gunicorn) then Flask will not block.
On the other end of things, if the code that you write in your handlers is blocking Flask will block, even if it is run on a non-blocking container.
Consider the following:
app.post('/products', function (req, res) {
var response = Product.createSync(req.body);
// Event loop is blocked until Product is created
if (response.isError) return res.json(400, err);
res.send(201, product);
});
If you run that on a node server you will quickly bring everything to a screeching halt. Even though node itself is non-blocking your code is not and it blocks the event loop preventing you from handling any other request from this node until the loop is yielded at res.json or res.send. Node's ecosystem makes it easy to find non-blocking IO libraries - in most other common environments you have to make a conscious choice to use non-blocking libraries for the IO you need to do.
Threaded servers and how they work
Most non-evented containers use multiple threads to manage the workload of a concurrent system. The container accepts requests in the main thread and then farms off the handling of the request and the serving of the response to one of its worker threads. The worker thread executes the (most often blocking) code necessary to handle the request and generate a response. While the handling code is running that thread is blocked and cannot take on any other work. If the request rate exceeds the total thread pool count then clients start backing up, waiting for a thread to complete.
What's the best thing to do with a long-running request in a threaded environment?
Knowing that blocking IO blocks one of your workers, the question now is "how many concurrent users are you expecting to have?" (Where concurrent means "occur over the span of time it takes to accept and process one request") If the answer is "less than the total number of threads in my worker thread pool" then you are golden - your server can handle the load and it's non-blocking nature is in no way a threat to stability. Choosing between #1 and #2 is largely a matter of taste.
On the other hand, if the answer to the above question is "more than the total number of works in my thread pool" then you will need to handle the requests by passing off the user's data to another worker pool (generally via a queue of some kind) and responding to the request with a 202 (Option #1 in your list). That will enable you to lower the response time, which will, in turn, enable you to handle more users.
TL;DR
Flask is not blocking or non-blocking as it does no direct IO
Threaded servers block on the request / response handling thread, not the accept request thread
Depending on the expected traffic you will almost certainly want to go go with option #1 (return a 202 and push the work into a queue to be handled by a different thread pool / evented solution).

Stopping threads spawned by BaseHTTPServer using ThreadingMixin

I have read on here on this post that using ThreadingMixin (from the SocketServer module), you are able to create a threaded server with BaseHTTPServer. I have tried it, and it does work. However, how can I stop active threads spawned by the server (for example, during a server shutdown)? Is this possible?

The simplest solution is to just use daemon_threads. The short version is: just set this to True, and don't worry about it; when you quit, any threads still working will stop automatically.
As the ThreadingMixIn docs say:
When inheriting from ThreadingMixIn for threaded connection behavior, you should explicitly declare how you want your threads to behave on an abrupt shutdown. The ThreadingMixIn class defines an attribute daemon_threads, which indicates whether or not the server should wait for thread termination. You should set the flag explicitly if you would like threads to behave autonomously; the default is False, meaning that Python will not exit until all threads created by ThreadingMixIn have exited.
Further details are available in the threading docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
Sometimes this isn't appropriate, because you want to shut down without quitting, or because your handlers may have cleanup that needs to be done. But when it is appropriate, you can't get any simpler.
If all you need is a way to shutdown without quitting, and don't need guaranteed cleanup, you may be able to use platform-specific thread-cancellation APIs via ctypes or win32api. This is generally a bad idea, but occasionally it's what you want.
If you need clean shutdown, you need to build your own machinery for that, where the threads cooperate. For example, you could create a global "quit flag" variable protected by a threading.Condition, and have your handle function check this periodically.
This is great if the threads are just doing slow, non-blocking work that you can break up into smaller pieces. For example, if the handle function always checks the quit flag at least once every 5 seconds, you can guarantee being able to shutdown the threads within 5 seconds. But what if the threads are doing blocking work—as they probably are, because the whole reason you used ThreadingMixIn was to let you make blocking calls instead of writing select loops or using asyncore or the like?
Well, there is no good answer. Obviously if you just need the shutdown to happen "eventually" rather than "within 5 seconds" (or if you're willing to abandon clean shutdown after 5 seconds, and revert to either using platform-specific APIs or daemonizing the threads), you can just put the checks before and after each blocking call, and it will "often" work. But if that's not good enough, there's really nothing you can do.
If you need this, the best answer is to change your architecture to use a framework that has ways to do this. The most popular choices are Twisted, Tornado, and gevent. In the future, PEP 3156 will bring similar functionality into the standard library, and there's a partly-complete reference implementation tulip that's worth playing with if you're not trying to build something for the real world that has to be ready soon.

Here's example code showing how to use threading.Event to shutdown the server on any POST request,
import SocketServer
import BaseHTTPServer
import threading
quit_event = threading.Event()
class MyRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
"""This handler fires the quit event on POST."""
def do_GET(self):
self.send_response(200)
def do_POST(self):
quit_event.set()
self.send_response(200)
class MyThreadingHTTPServer(
SocketServer.ThreadingMixIn, BaseHTTPServer.HTTPServer):
pass
server = MyThreadingHTTPServer(('', 8080), MyRequestHandler)
threading.Thread(target=server.serve_forever).start()
quit_event.wait()
server.shutdown()
The server is shutdown cleanly, so you can immediately restart the server and the port is available rather than getting "Address already in use".

Pattern for a background Twisted server that fills an incoming message queue and empties an outgoing message queue?

I'd like to do something like this:
twistedServer.start() # This would be a nonblocking call
while True:
while twistedServer.haveMessage():
message = twistedServer.getMessage()
response = handleMessage(message)
twistedServer.sendResponse(response)
doSomeOtherLogic()
The key thing I want to do is run the server in a background thread. I'm hoping to do this with a thread instead of through multiprocessing/queue because I already have one layer of messaging for my app and I'd like to avoid two. I'm bringing this up because I can already see how to do this in a separate process, but what I'd like to know is how to do it in a thread, or if I can. Or if perhaps there is some other pattern I can use that accomplishes this same thing, like perhaps writing my own reactor.run method. Thanks for any help.
:)

The key thing I want to do is run the server in a background thread.
You don't explain why this is key, though. Generally, things like "use threads" are implementation details. Perhaps threads are appropriate, perhaps not, but the actual goal is agnostic on the point. What is your goal? To handle multiple clients concurrently? To handle messages of this sort simultaneously with events from another source (for example, a web server)? Without knowing the ultimate goal, there's no way to know if an implementation strategy I suggest will work or not.
With that in mind, here are two possibilities.
First, you could forget about threads. This would entail defining your event handling logic above as only the event handling parts. The part that tries to get an event would be delegated to another part of the application, probably something ultimately based on one of the reactor APIs (for example, you might set up a TCP server which accepts messages and turns them into the events you're processing, in which case you would start off with a call to reactor.listenTCP of some sort).
So your example might turn into something like this (with some added specificity to try to increase the instructive value):
from twisted.internet import reactor
class MessageReverser(object):
"""
Accept messages, reverse them, and send them onwards.
"""
def __init__(self, server):
self.server = server
def messageReceived(self, message):
"""
Callback invoked whenever a message is received. This implementation
will reverse and re-send the message.
"""
self.server.sendMessage(message[::-1])
doSomeOtherLogic()
def main():
twistedServer = ...
twistedServer.start(MessageReverser(twistedServer))
reactor.run()
main()
Several points to note about this example:
I'm not sure how your twistedServer is defined. I'm imagining that it interfaces with the network in some way. Your version of the code would have had it receiving messages and buffering them until they were removed from the buffer by your loop for processing. This version would probably have no buffer, but instead just call the messageReceived method of the object passed to start as soon as a message arrives. You could still add buffering of some sort if you want, by putting it into the messageReceived method.
There is now a call to reactor.run which will block. You might instead write this code as a twistd plugin or a .tac file, in which case you wouldn't be directly responsible for starting the reactor. However, someone must start the reactor, or most APIs from Twisted won't do anything. reactor.run blocks, of course, until someone calls reactor.stop.
There are no threads used by this approach. Twisted's cooperative multitasking approach to concurrency means you can still do multiple things at once, as long as you're mindful to cooperate (which usually means returning to the reactor once in a while).
The exact times the doSomeOtherLogic function is called is changed slightly, because there's no notion of "the buffer is empty for now" separate from "I just handled a message". You could change this so that the function is installed called once a second, or after every N messages, or whatever is appropriate.
The second possibility would be to really use threads. This might look very similar to the previous example, but you would call reactor.run in another thread, rather than the main thread. For example,
from Queue import Queue
from threading import Thread
class MessageQueuer(object):
def __init__(self, queue):
self.queue = queue
def messageReceived(self, message):
self.queue.put(message)
def main():
queue = Queue()
twistedServer = ...
twistedServer.start(MessageQueuer(queue))
Thread(target=reactor.run, args=(False,)).start()
while True:
message = queue.get()
response = handleMessage(message)
reactor.callFromThread(twistedServer.sendResponse, response)
main()
This version assumes a twistedServer which works similarly, but uses a thread to let you have the while True: loop. Note:
You must invoke reactor.run(False) if you use a thread, to prevent Twisted from trying to install any signal handlers, which Python only allows to be installed in the main thread. This means the Ctrl-C handling will be disabled and reactor.spawnProcess won't work reliably.
MessageQueuer has the same interface as MessageReverser, only its implementation of messageReceived is different. It uses the threadsafe Queue object to communicate between the reactor thread (in which it will be called) and your main thread where the while True: loop is running.
You must use reactor.callFromThread to send the message back to the reactor thread (assuming twistedServer.sendResponse is actually based on Twisted APIs). Twisted APIs are typically not threadsafe and must be called in the reactor thread. This is what reactor.callFromThread does for you.
You'll want to implement some way to stop the loop and the reactor, one supposes. The python process won't exit cleanly until after you call reactor.stop.
Note that while the threaded version gives you the familiar, desired while True loop, it doesn't actually do anything much better than the non-threaded version. It's just more complicated. So, consider whether you actually need threads, or if they're merely an implementation technique that can be exchanged for something else.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.