Persistent HTTPS Connections in Python - python

I want to make an HTTPS request to a real-time stream and keep the connection open so that I can keep reading content from it and processing it.
I want to write the script in python. I am unsure how to keep the connection open in my script. I have tested the endpoint with curl which keeps the connection open successfully. But how do I do it in Python. Currently, I have the following code:
c = httplib.HTTPSConnection('userstream.twitter.com')
c.request("GET", "/2/user.json?" + req.to_postdata())
response = c.getresponse()
Where do I go from here?
Thanks!

It looks like your real-time stream is delivered as one endless HTTP GET response, yes? If so, you could just use python's built-in urllib2.urlopen(). It returns a file-like object, from which you can read as much as you want until the server hangs up on you.
f=urllib2.urlopen('https://encrypted.google.com/')
while True:
data = f.read(100)
print(data)
Keep in mind that although urllib2 speaks https, it doesn't validate server certificates, so you might want to try and add-on package like pycurl or urlgrabber for better security. (I'm not sure if urlgrabber supports https.)

Connection keep-alive features are not available in any of the python standard libraries for https. The most mature option is probably urllib3

httplib2 supports this. (I'd have thought this the most mature option, didn't know urllib3 yet, so TokenMacGuy may still be right)
EDIT: while httplib2 does support persistent connections, I don't think you can really consume streams with it (ie. one long response vs. multiple requests over the same connection), which I now realise you may need.

Related

Using Python urllib2, How can I stream between a GET and a POST?

I want to write code to transfer a file from one site to another. This can be a large file, and I'd like to do it without creating a local temporary file.
I saw the trick of using mmap to upload a large file in Python: "HTTP Post a large file with streaming", but what I really need is a way to link up the response from the GET to creating the POST.
Anyone done this before?
You can't, or at least shouldn't.
urllib2 request objects have no way to stream data into them on the fly, period. And in the other direction, response objects are file-like objects, so in theory you can read(8192) out of them instead of read(), but for most protocols—including HTTP—it will either often or always read the whole response into memory and serve your read(8192) calls out of its buffer, making it pointless. So, you have to intercept the request, steal the socket out of it, and deal with it manually, at which point urllib2 is getting in your way more than it's helping.
urllib2 makes some things easy, some things much harder than they should be, and some things next to impossible; when it isn't making things easy, stop using it.
One solution is to use a higher-level third-party library. For example, requests gets you half-way there (it makes it very easy to stream from a response, but can only stream into a response in limited situations), and requests-toolbelt gets you the rest of the way there (it adds various ways to stream-upload).
The other solution is to use a lower-level library. And here, you don't even have to leave the stdlib. httplib forces you to think in terms of sending and receiving things bit by bit, but that's exactly what you want. On the get request, you can just call connect and request, and then call read(8192) repeatedly on the response object. On the post request, you call connect, putrequest, putheader, endheaders, then repeatedly send each buffer from the get request, then getresponse when you're done.
In fact, in Python 3.2+'s http.client (the equivalent of 2.x's httplib), HTTPClient.request doesn't have to be a string, it can be any iterable or any file-like object with read and fileno methods… which includes an response object. So, it's this simple:
import http.client
getconn = httplib.HTTPConnection('www.example.com')
getconn.request('GET', 'http://www.example.com/spam')
getresp = getconn.getresponse()
getconn = httplib.HTTPConnection('www.example.com')
getconn.request('POST', 'http://www.example.com/eggs', body=getresp)
getresp = getconn.getresponse()
… except, of course, that you probably want to craft appropriate headers (you can actually use urllib.request, the 3.x version of urllib2, to build a Request object and not send it…), and pull the host and port out of the URL with urlparse instead of hardcoding them, and you want to exhaust or at least check the response from the POST request, and so on. But this shows the hard part, and it's not hard.
Unfortunately, I don't think this works in 2.x.
Finally, if you're familiar with libcurl, there are at least three wrappers for it (including one that comes with the source distribution). I'm not sure whether to call libcurl higher-level or lower-level than urllib2, it's sort of on its own weird axis of complexity. :)
urllib2 may be too simple for this task. You might want to look into pycurl. I know it supports streaming.

Timeout with Python Requests + Clojure HttpKit Server but not Ring server

I have some Ring routes which I'm running one of two ways.
lein ring server, with the lein-ring plugin
using org.httpkit.server, like (hs/run-server app {:port 3000}))
It's a web app (being consumed by an Angular.js browser client).
I have some API tests written in Python using the Requests library:
my_r = requests.post(MY_ROUTE,
data=MY_DATA,
headers={"Content-Type": "application/json"},
timeout=10)
When I use lein ring server, this request works fine in the JS client and the Python tests.
When I use httpkit, this works fine in the JS client but the Python client times out with
socket.timeout: timed out
I can't figure out why the Python client is timing out. It happens with httpkit but not with lein-ring, so I can only assume that the cause is related to the difference.
I've looked at the traffic in WireShark and both look like they give the correct response. Both have the same Content-Length field (15 bytes).
I've raised the number of threads to 10 (shouldn't need to) and no change.
Any ideas what's wrong?
I found how to fix this, but no satisfactory explanation.
I was using wrap-json-response Ring middleware to take a HashMap and convert it to JSON. I switched to doing my own conversion in my handler with json/write-str, and this fixes it.
At a guess it might be something to do with the server handling output buffering, but that's speculation.
I've combed through the Wireshark dumps and I can't see any relevant differences between the two. The sent Content-Length fields are identical. The 'bytes in flight' differ, at 518 and 524.
No clue as to why the web browser was happy with this but Python Requests wasn't, and whether or this is a bug in Requests, httpkit, ring-middleware-format or my own code.

Listening to the Output of a Web Application

I have been trying, in vain, to make a program that reads text out loud using the web application found here (http://www.ispeech.org/text.to.speech.demo.php). It is a demo text-to-speech program, that works very well, and is relatively fast. What I am trying to do is make a Python program that would input text to the application, then output the result. The result, in this case, would be sound. Is there any way in Python to do this, like, say, a library? And if not, is it possible to do this through any other means? I have looked into the iSpeech API (found here), but the only problem with it is that there is a limited number of free uses (I believe that it is 200). While this program is only meant to be used a couple of times, I would rather it be able to use the service more then 200 times. Also, if this solution is impractical, could anyone direct me towards another alternative?
# AKX I am currently using eSpeak, and it works well. It just, well, doesn't sound too good, and it is hard to tell at times what is being said.
If using iSpeech is not required, there's a decent (it's surely not as beautifully articulated as many commercial solutions) open-source text-to-speech solution available called eSpeak.
It's usable from the command line (subprocess with Python), or as a shared library. It seems there's also a Python wrapper (python-espeak) for it.
Hope this helps.
OK. I found a way to do it, seems to work fine. Thanks to everyone who helped! Here is the code I'm using:
from urllib import quote_plus
def speak(text):
import pydshow
words = text.split()
temp = []
stuff = []
while words:
temp.append(words.pop(0))
if len(temp) == 24:
stuff.append(' '.join(temp))
temp = []
stuff.append(' '.join(temp))
for i in stuff:
pydshow.PlayFileWait('http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text='+quote_plus(i))
if __name__ == '__main__':
speak('Hello. This is a text-to speech test.')
I find this ideal because it DOES use the API, but it uses the API key that is used for the demo program. Therefore, it never runs out. The key is 8d1e2e5d3909929860aede288d6b974e.
You can actually test this at work without the program, by typing the following into your address bar:
http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text=
Followed by the text you want to speak. You can also adjust the language, by changing, in this case, the ukenglishmale to something else that iSpeech offers. For example, ukenglishfemale. This will speak the same text, but in a feminine voice.
NOTE: Pydshow is my wrapper around DirectShow. You can use yours instead.
The flow of your application would be like this:
Client-side: User inputs text into form, and form submits a request to server
Server: may be python or whatever language/framework you want. Receives http request with text.
Server: Runs text-to-speech either with pure python library or by running a subprocess to a utility that can generate speech as a wav/mp3/aiff/etc
Server: Sends HTTP response back by streaming file with a mime type to Client
Client: Receives the http response and plays the content
Specifically about step 3...
I don't have any particular advise on the most articulate open source speech synthesizing software available, but I can say that it does not have to necessarily be pure python, or even python at all for that matter. Most of these packages have some form of a command line utility to take stdin or a file and produce an audio file as output. You would simply launch this utility as a subprocess to generate the file, and then stream the file back in your http response.
If you decide to make use of an existing web service that provides text-to-speech via an API (iSpeech), then step 3 would be replaced with making your own server-side http request out to iSpeech, receiving the response and pretty much forwarding that response back to the original client request, like a proxy. I would say the benefit is not having to maintain your own speech synthesis solution or getting better quality that you could from an open source... but the downside is that you probably will have a bit more latency in your response time since your server has to make its own external http request and download the data first.

python facebook sdk call to facebook is slow when compared to command line curl

I am using facebook python Graph API. When i am calling put_object to write to news feed it is taking about 12-14 sec to complete the call. When i run from command line using curl with same parameters i get the response back in 1.2 seconds.
I ran the profiler on the python code and from i see that it is spending 99.5% time in the socket.recv . I am not sure if it is the problem with facebook python sdk or something else.
I am on python 2.6. i see from facebook.py that it is using urllib.
file = urllib.urlopen("https://graph.facebook.com/" + path + "?" +
urllib.urlencode(args), post_data)
Has someone experienced similar slow down ? Any suggestions will be highly appreciated.
Direct command-line CURL is bound to be faster than urllib or urllib2. If you want speed, you could replace the call using pycurl (which is also a C-extension) whereas urllib is a python module written on top of httplib.
What more you could do is, if you're flexible enough to use a Tornado server, use the async caller of Tornado which directly talks to sockets and is also asynchronous.
Also, if nothing out of these can be done, try replacing urllib with urllib2 and create a non blocking caller with callback returns. This is all that I've done to improve the native 3rd party wrappers of facebook/twitter/amazon etc.
Are you behind an http proxy server? Curl honors proxy server environment variables, while urllib doesn't do so by default, and also doesn't support calling an https url (such as https://graph.facebook.com) over a proxy server.
In any event I expect it's more likely a network issue than a Python vs C issue. Yes C is faster, but this isn't a CPU-bound task, and there's no way that you're burning 12-14 seconds inside the Python interpreter to make this call.
If curl is happy but urllib is not, perhaps trying pycurl will solve your problem. http://pycurl.sourceforge.net/

How do I get urllib2 to log ALL transferred bytes

I'm writing a web-app that uses several 3rd party web APIs, and I want to keep track of the low level request and responses for ad-hock analysis. So I'm looking for a recipe that will get Python's urllib2 to log all bytes transferred via HTTP. Maybe a sub-classed Handler?
Well, I've found how to setup the built-in debugging mechanism of the library:
import logging, urllib2, sys
hh = urllib2.HTTPHandler()
hsh = urllib2.HTTPSHandler()
hh.set_http_debuglevel(1)
hsh.set_http_debuglevel(1)
opener = urllib2.build_opener(hh, hsh)
logger = logging.getLogger()
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.NOTSET)
But I'm still looking for a way to dump all the information transferred.
This looks pretty tricky to do. There are no hooks in urllib2, urllib, or httplib (which this builds on) for intercepting either input or output data.
The only thing that occurs to me, other than switching tactics to use an external tool (of which there are many, and most people use such things), would be to write a subclass of socket.socket in your own new module (say, "capture_socket") and then insert that into httplib using "import capture_socket; import httplib; httplib.socket = capture_socket". You'd have to copy all the necessary references (anything of the form "socket.foo" that is used in httplib) into your own module, but then you could override things like recv() and sendall() in your subclass to do what you like with the data.
Complications would likely arise if you were using SSL, and I'm not sure whether this would be sufficient or if you'd also have to make your own socket._fileobject as well. It appears doable though, and perusing the source in httplib.py and socket.py in the standard library would tell you more.

Categories