cherrypy check connection status - python

I have a cherrypy api that is intended to run for a long time on the server.
I have an unreliable client that can die or close connection for various reasons that are out of my control.
During the time my server api runs, I want to periodically check the status of the connection, making sure the client is still listening and abort my operation if the client has gone away.
I could not find any good place describing how to poll the connection status while serving a cherrypy request.
One example of such a long run is computing md5 of multiple big files (of tens of GBs) in chunks of small buffer (limited memory).
I don't need any solutions that shorten the runtime since that is not my goal here. I want to keep this connection open for as long as I can, but abort if it is closed.
Here is the simple sample of my code:
#cherrypy.expose
def foo(self):
cherrypy.response.headers['Content-Type'] = 'text/plain'
def run():
for result in get_results(): # get_results() is a heavy method mentioned
yield json.dumps(result)
return run()
foo._cp_config = {'response.stream': True}

The only reliable way to know that the client has died is to try to write some data to the socket, which for CherryPy can be done with yield. You must yield non-empty strings, so you'd have to be returning a Content-Type that can handle some filler text, like some extra spaces after the opening <head> tag of an HTML document. If the client closes the connection, the CherryPy server will stop requesting additional yielded data from the handler (and call any close method on the generator so you can clean up).

As far as I know, CherryPy doesn't provide you with any mechanism to detect that a client died. It will only tell you if a response took too long to complete (and therefore be sent out).
You may refer to this SO thread for more information.

Related

Python requests ignore data from stream

Is it possible to pretend reading the stream so I don't have to use a thread in order to keep the connection alive? If I don't read the stream, then the given websocket will be closed by the server. (How this works is roughly explained by the comments in the code)
My goal is to just receive messages from the websocket. I don't care about the stream itself.
Currently my code looks like this:
import requests
import threading
import websockets
def fake_read(response):
"""
if the server finds out that I'm not reading the data, the websocket will be closed.
This function is there to receive the data.
"""
for _ in response.iter_content():
pass # do nothing with the data.
def handle_websocket(wss_uri):
"""
using websockets lib, not relevant for this question. (ping_interval=None if somebody wants to connect to the server)
You can also use https://www.piesocket.com/websocket-tester to receive the messages.
"""
print(wss_url)
response = requests.get("https://stream.antenne.de/rockantenne", stream=True) # radio station stream. A cookie of the response contains the websocket URL
wss_addr = response.cookies["sm_websocket_url"]
threading.Thread(target=lambda: fake_read(response), daemon=True).start() # instantly start the thread
handle_websocket(wss_addr)
I want to avoid threads and, if possible, save up some bandwidth. Is there another way to handle streams or any alternatives?
I planned to look into aiohttp, since using async might be better here, but this doesn't eliminate all the problems I have, since there's a function needed to handle the stream and it's still downloading the data aswell.

Connection reset when server doesn't fully consume request body

I'm now familiar with the general cause of this problem from another SO answer and from the uWSGI documentation, which states:
If an HTTP request has a body (like a POST request generated by a
form), you have to read (consume) it in your application. If you do
not do this, the communication socket with your webserver may be
clobbered.
However, I don't understand what exactly is happening at the TCP level for this problem to occur. Not knowing the details of this process, I would assume the server can simply discard what remains in the stream, but that's obviously not the case.
If I consume only part of the request body in my application and ultimately return a 200 response, a web browser will report a connection reset error. Who reset the connection? The webserver or the client? It seems like all the data has been sent by the client already, but the application has just not exhausted the stream. Is there something that happens when the stream is exhausted in the application that triggers the webserver to indicate it has finished reading?
My application is Python/Flask, but I've seen questions about this from several languages and frameworks. For example, this fails if exhaust() is not called on the request stream:
#app.route('/upload', methods=['POST'])
def handle-upload():
file = request.stream
pandas.read_csv(file, nrows=100)
response = # Do stuff
file.exhaust()
return jsonify(response)
While there is some buffering throughout the chain, large file transfers are not going to complete until the receiver has consumed them. The buffers will fill up, and packets will be dropped until the buffers are drained. Eventually, the browser will give up trying to send the file and drop the connection.

Python Requests Not Cleaning up Connections and Causing Port Overflow?

I'm doing something fairly outside of my comfort zone here, so hopefully I'm just doing something stupid.
I have an Amazon EC2 instance which I'm using to run a specialized database, which is controlled through a webapp inside of Tomcat that provides a REST API. On the same server, I'm running a Python script that uses the Requests library to make hundreds of thousands of simple queries to the database (I don't think it's possible to consolidate the queries, though I am going to try that next.)
The problem: after running the script for a bit, I suddenly get a broken pipe error on my SSH terminal. When I try to log back in with SSH, I keep getting "operation timed out" errors. So I can't even log back in to terminate the Python process and instead have to reboot the EC2 instance (which is a huge pain, especially since I'm using ephemeral storage)
My theory is that each time requests makes a REST call, it activates a pair of ports between Python and Tomcat, but that it never closes the ports when it's done. So python keeps trying to grab more and more ports and eventually either somehow grabs away and locks the SSH port (booting me off), or it just uses all the ports and that causes the network system to crap out somehow (as I said, I'm out of my depth.)
I also tried using httplib2, and was getting a similar problem.
Any ideas? If my port theory is correct, is there a way to force requests to surrender the port when it's done? Or otherwise is there at least a way to tell Ubuntu to keep the SSH port off-limits so that I can at least log back in and terminate the process?
Or is there some sort of best practice to using Python to make lots and lots of very simple REST calls?
Edit:
Solved...do:
s = requests.session()
s.config['keep_alive'] = False
Before making the request to force Requests to release connections when it's done.
My speculation:
https://github.com/kennethreitz/requests/blob/develop/requests/models.py#L539 sets conn to connectionpool.connection_from_url(url)
That leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L562, which leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L167.
This eventually leads to https://github.com/kennethreitz/requests/blob/develop/requests/packages/urllib3/connectionpool.py#L185:
def _new_conn(self):
"""
Return a fresh :class:`httplib.HTTPConnection`.
"""
self.num_connections += 1
log.info("Starting new HTTP connection (%d): %s" %
(self.num_connections, self.host))
return HTTPConnection(host=self.host, port=self.port)
I would suggest hooking a handler up to that logger, and listening for lines that match that one. That would let you see how many connections are being created.
Figured it out...Requests has a default 'Keep Alive' policy on connections which you have to explicitly override by doing
s = requests.session()
s.config['keep_alive'] = False
before you make a request.
From the doc:
"""
Keep-Alive
Excellent news — thanks to urllib3, keep-alive is 100% automatic within a session! Any requests that you make within a session will automatically reuse the appropriate connection!
Note that connections are only released back to the pool for reuse once all body data has been read; be sure to either set prefetch to True or read the content property of the Response object.
If you’d like to disable keep-alive, you can simply set the keep_alive configuration to False:
s = requests.session()
s.config['keep_alive'] = False
"""
There may be a subtle bug in Requests here because I WAS reading the .text and .content properties and it was still not releasing the connections. But explicitly passing 'keep alive' as false fixed the problem.

How can i ignore server response to save bandwidth?

I am using a server to send some piece of information to another server every second. The problem is that the other server response is few kilobytes and this consumes the bandwidth on the first server ( about 2 GB in an hour ). I would like to send the request and ignore the return ( not even receive it to save bandwidth ) ..
I use a small python script for this task using (urllib). I don't mind using any other tool or even any other language if this is going to make the request only.
A 5K reply is small stuff and is probably below the standard TCP window size of your OS. This means that even if you close your network connection just after sending the request and checking just the very first bytes of the reply (to be sure that request has been really received) probably the server already sent you the whole answer and the packets are already on the wire or on your computer.
If you cannot control (i.e. trim down) what is the server reply for your notification the only alternative I can think to is to add another server on the remote machine waiting for a simple command and doing the real request locally and just sending back to you the result code. This can be done very easily may be even just with bash/perl/python using for example netcat/wget locally.
By the way there is something strange in your math as Glenn Maynard correctly wrote in a comment.
For HTTP, you can send a HEAD request instead of GET or POST:
import urllib2
request = urllib2.Request('https://stackoverflow.com/q/5049244/')
request.get_method = lambda: 'HEAD' # override get_method
response = urllib2.urlopen(request) # make request
print response.code, response.url
Output
200 https://stackoverflow.com/questions/5049244/how-can-i-ignore-server-response-t
o-save-bandwidth
See How do you send a HEAD HTTP request in Python?
Sorry but this does not make much sense and is likely a violation of the HTTP protocol. I consider such an idea as weird and broken-by-design. Either make the remote server shut up or configure your application or whatever is running on the remote server on a different protocol level using a smarter protocol with less bandwidth usage. Everything else is hard being considered as nonsense.

Making moves w/ websockets and python / django ( / twisted? )

The fun part of websockets is sending essentially unsolicited content from the server to the browser right?
Well, I'm using django-websocket by Gregor Müllegger. It's a really wonderful early crack at making websockets work in Django.
I have accomplished "hello world." The way this works is: when a request is a websocket, an object, websocket, is appended to the request object. Thus, I can, in the view interpreting the websocket, do something like:
request.websocket.send('We are the knights who say ni!')
That works fine. I get the message back in the browser like a charm.
But what if I want to do that without issuing a request from the browser at all?
OK, so first I save the websocket in the session dictionary:
request.session['websocket'] = request.websocket
Then, in a shell, I go and grab the session by session key. Sure enough, there's a websocket object in the session dictionary. Happy!
However, when I try to do:
>>> session.get_decoded()['websocket'].send('With a herring!')
I get:
Traceback (most recent call last):
File "<console>", line 1, in <module>
error: [Errno 9] Bad file descriptor
Sad. :-(
OK, so I don't know much of anything about sockets, but I know enough to sniff around in a debugger, and lo and behold, I see that the socket in my debugger (which is tied to the genuine websocket from the request) has fd=6, while the one that I grabbed from the session-saved websocket has fd=-1.
Can a socket-oriented person help me sort this stuff out?
I'm the author of django-websocket. I'm not a real expert in the topic of websockets and networking, however I think I have a decent understanding of whats going on. Sorry for going into great detail. Even if most of the answer isn't specific to your question it might help you at some other point. :-)
How websockets work
Let me explain shortly what a websocket is. A websocket starts as something that really looks like a plain HTTP request, established from the browser. It indicates through a HTTP header that it wants to "upgrade" the protocol to be a websocket instead of a HTTP request. If the server supports websockets, it agrees on the handshake and both - server and client - now know that they will use the established tcp socket formerly used for the HTTP request as a connection to interchange websocket messages.
Beside sending and waiting for messages, they have also of course the ability to close the connection at any time.
How django-websocket abuses the python's wsgi request environment to hijack the socket
Now lets get into the details of how django-websocket implements the "upgrading" of the HTTP request in a django request-response cylce.
Django usually uses the WSGI specification to talk to the webserver like apache or gunicorn etc. This specification was designed just with the very limited communication model of HTTP in mind. It assumes that it gets a HTTP request (only incoming data) and returns the response (only outgoing data). This makes it tricky to force django into the concept of a websocket where bidirectional communication is allowed.
What I'm doing in django-websocket to achieve this is that I dig very deeply into the internals of WSGI and django's request object to retrieve the underlaying socket. This tcp socket is then used to handle the upgrade the HTTP request to a websocket instance directly.
Now to your original question ...
I hope the above makes it obvious that when a websocket is established, there is no point in returning a HttpResponse. This is why you usually don't return anything in a view that is handled by django-websocket.
However I wanted to stick close to the concept of a view that holds the logic and returns data based on the input. This is why you should only use the code in your view to handle the websocket.
After you return from the view, the websocket is automatically closed. This is done for a reason: We don't want to keep the socket open for an undefined amount of time and relying on the client (the browser) to close it.
This is why you cannot access a websocket with django-websocket outside of your view. The file descriptor is then of course set to -1 indicating that its already closed.
Disclaimer
I explained above that I'm digging in the surrounding environment of django to get somehow -- in a very hackish way -- access to the underlaying socket. This is very fragile and also not supposed to work since WSGI is not designed for this! I also explained above that the websocket is closed after the view ends - however after the websocket closed down (AND closed the tcp socket), django's WSGI implementation tries to send a HTTP response - it doesn't know about websockets and thinks it is in a normal HTTP request-response cycle. But the socket is already closed an the sending will fail. This usually causes an exception in django.
This didn't affected my testings with the development server. The browser will never notice (you know .. the socket is already closed ;-) - but raising an unhandled error in every request is not a very good concept and may leak memory, doesn't handle database connection shutdown correctly and many athor things that will break at some point if you use django-websocket for more than experimenting.
This is why I would really advise you not to use websockets with django yet. It doesn't work by design. Django and especially WSGI would need a total overhaul to solve these problems (see this discussion for websockets and WSGI). Since then I would suggest using something like eventlet. Eventlet has a working websocket implementation (I borrowed some code from eventlet for the initial version of django-websocket) and since its just plain python code you can import your models and everything else from django. The only drawback is that you need a second webserver running just to handle websockets.
As Gregor Müllegger pointed out, Websockets can't be properly handled by WSGI, because that protocol never was designed to handle such a feature.
uWSGI, since version 1.9.11, can handle Websockets out of the box. Here uWSGI communicates with the application server using raw HTTP rather than the WSGI protocol. A server written that way, can therefore handle the protocol internals and keep the connection open over a long period. Having long living connections handled by a Django view is not a good idea either, because they then would block a worker thread, which is a limited resource.
The main purpose of Websockets, is to have the server push messages to the client in an asynchronous way. This can be a Django view triggered by other browsers (ex.: chat clients, multiplayer games), or an event triggered by, say django-celery (ex.: sport results). It therefore is fundamental for these Django services, to use a message queue for pushing messages to the client.
To handle this in a scalable way, I wrote django-websocket-redis, a Django module which can keep open all those long living Websocket connections in one single thread/process using Redis as the backend message queue.
You could give stargate a bash: http://boothead.github.com/stargate/ and http://pypi.python.org/pypi/stargate/.
It's built on top of pyramid and eventlet (I also contributed a fair bit of the websocket support and tests to eventlet). The big advantage of pyramid for this sort of thing is that it's got the concept of a resource which the url maps to, rather than just the result of a callable. So you end up with a graph of persistent resources that maps to your url structure and websocket connections are simply routed and connected to those resources.
So you end up only needing to do two things:
class YourView(WebSocketView):
def handler(self, websocket):
self.request.context.add_listener(websocket)
while True:
msg = websocket.wait()
# Do something with message
To receive messages
and
resource.send(some_other_message)
Here resource is an instance of a stargate.resource.WebSocketAwareContext (as is self.request.context) above and the send method sends the message to all clients connected with the add_listener method.
To publish a message to all of the connected clients you just call node.send(message)
I'm hopefully going to write up a little example app in the next week or two to demonstrate this a little better.
Feel free to ping me on github if you want some help with it.
request.websocket is probably get closed when you return from the request handler (view). The simple solution is to keep the handler alive (by not returning from the view). If your server is not multi-threaded you won't be able to accept any other simultaneous requests.

Categories