Compress the streamed output with cherrypy - python

I am using cherrypy for a web server which is able to stream the output of some methods.
Server uses yield to send lines of data and client uses onprogress event of $.ajax method.
But enabling 'tools.gzip' config of cherrypy caused the output not to be cached by the client. In fact the onprogress event of client is not called unless the server method is finished completely. It seems the cherrypy compression tool is not able to compress the output in streaming mode (it can compress the output only when get it completely).
Now my first question is how to fix this problem. If it is not fixable my second question is how to diable the cherrypy compression for a specific method.

You have to enable the streaming capabilities of the request.
Set the following configuration:
{'response.stream': True}
The gzip tools inspect the current request and look for the stream and response accordingly.
For more information: http://docs.cherrypy.org/en/latest/advanced.html#streaming-the-response-body

Related

How to handle callbacks in Python 3?

I have a custom HTTP method/verb (lets say LISTEN) which allows me to listen for an update on a resource stored on a remote server. The API available for this has a blocking call which will get my client code to listen for an update till I interrupt the execution of that call. Just to provide an example, if I were to perform a curl as follows:
curl -X LISTEN http://<IP-Address>:<Port>/resource
The execution of this creates a blocking call, providing me updates on the resource whenever a new value for this resource is pushed to the server (similar to a pub-sub model), the response for that would look similar to this:
{"data":"value update 1","id":"id resource"}
{"data":"value update 2","id":"id resource"}
(...)
If I were to write code to handle this in Python, how do I call my url using this custom verb and handle the blocking call/call back while ensuring that this does not block the execution of the rest of my code?
If you're using Python requests lib with a custom HTTP verb and need to read stream content, you can do something like this:
import json
import requests # sudo pip3 install requests
url = "http://........."
r = requests.request('LISTEN', url, stream=True)
for line in r.iter_lines():
# filter out keep-alive new lines
if line:
decoded_line = line.decode('utf-8')
print(json.loads(decoded_line))
Note: by default all requests calls are blocking, so you need to run this code in a separate thread/process to avoid that.
...while ensuring that this does not block the execution of the rest of my code
Since you provided no details about your application, I will try to list some general thoughts on question.
Your task can be solved in many ways. Solution depends on your app architecture.
If this is a web server, you can take a look at tornado(see streaming callback) or aiohttp streaming examples.
On the other hand you can run the code above in a separate process and communicate with other applications/services using RabbitMQ for example (or other ipc mechanism).

python stream handler for multiple servers

There are servers running in multiple locations,I need to stream the application log data from these servers to a ZMQ(Zero Message Queue) using python stream handler.How do i use the stream handler to get this done? I have already referred the Python Handlers documentation https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler
You can post your logs from different servers as json to the ZMQ iteratively. For the ZMQ make a PyZMQ application which will have a message handler, listening to your incoming json from these servers. Then as per requirement the incoming json data can be processed and stored in a file (or wherever you want to store). This file can be read for the incoming logs ( eg: tail -f fileName.txt or fileName.log)
Here is link which will help you setup a PyZMQ application:
Designing and Testing PyZMQ Applications – Part 1
For logging specifically you can use these example:
A simple Python logging example
Logging, StreamHandler and standard streams

Timeout with Python Requests + Clojure HttpKit Server but not Ring server

I have some Ring routes which I'm running one of two ways.
lein ring server, with the lein-ring plugin
using org.httpkit.server, like (hs/run-server app {:port 3000}))
It's a web app (being consumed by an Angular.js browser client).
I have some API tests written in Python using the Requests library:
my_r = requests.post(MY_ROUTE,
data=MY_DATA,
headers={"Content-Type": "application/json"},
timeout=10)
When I use lein ring server, this request works fine in the JS client and the Python tests.
When I use httpkit, this works fine in the JS client but the Python client times out with
socket.timeout: timed out
I can't figure out why the Python client is timing out. It happens with httpkit but not with lein-ring, so I can only assume that the cause is related to the difference.
I've looked at the traffic in WireShark and both look like they give the correct response. Both have the same Content-Length field (15 bytes).
I've raised the number of threads to 10 (shouldn't need to) and no change.
Any ideas what's wrong?
I found how to fix this, but no satisfactory explanation.
I was using wrap-json-response Ring middleware to take a HashMap and convert it to JSON. I switched to doing my own conversion in my handler with json/write-str, and this fixes it.
At a guess it might be something to do with the server handling output buffering, but that's speculation.
I've combed through the Wireshark dumps and I can't see any relevant differences between the two. The sent Content-Length fields are identical. The 'bytes in flight' differ, at 518 and 524.
No clue as to why the web browser was happy with this but Python Requests wasn't, and whether or this is a bug in Requests, httpkit, ring-middleware-format or my own code.

Subprocess doesn't wait and makes PhantomJS crash

I have two scripts running, one on port :80 and one on port :81. Because some of our users are having issues with stuff happening on the server with port :81, I'm trying to implement a workaround like this;
Old way of doing it, which works fine for most users:
AngularJS app makes request to example.com:81/getpdf/1
Flask server generates PNG and PDF files using PhantomJS and ImageMagick using two separate subprocess.Popen calls and the .wait() method
Using Flask's send_file(), the PDF gets sent back to the user and starts downloading
My workaround for this issue:
AngularJS makes request to example.com/getpdf/1
Flask server (:80) makes a new GET request, r = requests.get(url_with_port_81), faking the old AngularJS request to create the PNG/PDF
Instead of using send_file(), I now return the path of the generated PDF
I return send_file(r.text)
Now, using my workaround, the subprocesses I run to create the PNG/PDFs somehow crash. I have to sudo pkill python, and only when I do so, I'm getting a PNG with no data in the folder on my server.
Basically, PhantomJS has run but hasn't loaded any data (only html/css, but no important stuff that needs to come from the Flask server) and crashes. How is this even possible? I'm just faking the request the browser makes using requests.get, or am I not aware of something here?
I thought subprocess.Popen is non-blocking, so my requests for data could still be answered to fill the PNG/PDFs?
I finally found the reason my subprocess kept crashing.
Apparently, it's a bug in Python < 2.7.3, described here: http://bugs.python.org/issue12786
I had to use 'close_fds=True' in my Popen call and all was fixed. Thanks for your effort either way, #Mark Hildreth!

Persistent HTTPS Connections in Python

I want to make an HTTPS request to a real-time stream and keep the connection open so that I can keep reading content from it and processing it.
I want to write the script in python. I am unsure how to keep the connection open in my script. I have tested the endpoint with curl which keeps the connection open successfully. But how do I do it in Python. Currently, I have the following code:
c = httplib.HTTPSConnection('userstream.twitter.com')
c.request("GET", "/2/user.json?" + req.to_postdata())
response = c.getresponse()
Where do I go from here?
Thanks!
It looks like your real-time stream is delivered as one endless HTTP GET response, yes? If so, you could just use python's built-in urllib2.urlopen(). It returns a file-like object, from which you can read as much as you want until the server hangs up on you.
f=urllib2.urlopen('https://encrypted.google.com/')
while True:
data = f.read(100)
print(data)
Keep in mind that although urllib2 speaks https, it doesn't validate server certificates, so you might want to try and add-on package like pycurl or urlgrabber for better security. (I'm not sure if urlgrabber supports https.)
Connection keep-alive features are not available in any of the python standard libraries for https. The most mature option is probably urllib3
httplib2 supports this. (I'd have thought this the most mature option, didn't know urllib3 yet, so TokenMacGuy may still be right)
EDIT: while httplib2 does support persistent connections, I don't think you can really consume streams with it (ie. one long response vs. multiple requests over the same connection), which I now realise you may need.

Categories