Understanding streaming in python-requests and in general

Understanding streaming in python-requests and in general - python

I'm really having troubles wrapping my head around this and maybe someone can point me in the right direction:
I'm using python (django framework) for an web-application and i have an additional web socket server that receives chunked binary data from the browser. I want to send (or stream) those chunks to another server using the python-requests library.
According to the official documentation you have to provide a generator as the data attribute:
arr = []
def streamer():
global arr
for i in arr:
yield i
#lets say this function will get called when a "stream-start" message is sent to the web-socket server
def onStart():
resp = requests.post("http://some.url/chunked", data=streamer())
#lets say this function will get called when a chunk of binary data is sent to the web-socket server
def onChunk(chunk):
arr.append(chunk)
In this scenario, how would I possible be able to send anything since arr is empty when I send the request. How can I keep the connection open, so that every chunk will be sent?
I think there is some major issue that I don't understand about streaming in general. So, next to hints about solving my actual problem I would also appreciate any recommendation of tutorials or a good read on this subject.

Related

How to use the API to get the top 20 streams

I am trying to write a Python script to get the top 20 streams using the API. However, I could not find a guide online. I am going off of the python-twitch-client docs but so far I couldn’t find something helpful. I’ll admit, its my first time ever working with this API.
Precisely, this is what I want to accomplish: https://dev.twitch.tv/docs/api/reference#get-streams I know that the default return is 20 streams.
At the moment, this is all the code I have:
from twitch import TwitchClient
client = TwitchClient(client_id='<my client id>')

Welcome to working with APIs - it is definitely a blast.
In this scenario, assuming you've successfully created your application and retrieved your client id as outlined in the instructions, to get the latest streams you would use get_live_streams
from twitch import TwitchClient
client = TwitchClient(client_id='<client-id>')
streams = client.streams.get_live_streams()
print(streams)
The API provides other functions you can use to retrieve other data such as get_featured, get_stream_by_user, and more. You can view all those functions in the documentation.

Can I send a generator through a REST service in Python?

We have been sending a huge Result Set(which is over a million rows) from our database to the client using a REST service. This causes the whole result set to be loaded up in memory, it packs the whole thing up and sends it over to the client. The client has to download this huge amount of data before any processing can be done. I am trying to figure out a way to stream data between a server and a client.
We already have the REST service, so I am trying to send over a a Python generator from the server side (database) to the client, and the client just has to deal with the current chunk of data, and when it is done with the current chunk it can just call next().
I am not sure if this works as intended, do any of you know if the generator on the server side generates the whole data and then sends it over? Or does this actually works?
I am also thinking of using Python sockets if the REST service does not work, but I am open to suggestions. If anybody has dealt with something like this before, what did you do?
I have to use Python for this by the way.
I have tried using generators and I get all the data, but I am not sure if it is actually being sent over in chunks or if it is being generated in whole on the server side and being sent over, which defeats the purpose of it.
This just psudo-code to show what I am doing
#server
#app.route(/stream_data)
def stream_data():
def gen()
while data:
yield data[:chunk_size]
return
return requests.response(gen())
#client
def main():
url = 'blabla#rest.service.com'
with requests.get(url, stream=True) as r:
for x in r.iter_content():
print(x)
I am getting the data and everything works okay, but I am not sure if it actually streaming or just sending the whole thing over.

Python requests caching authentication headers

I have used python's requests module to do a POST call(within a loop) to a single URL with varying sets of data in each iteration. I have already used session reuse to use the underlying TCP connection for each call to the URL during the loop's iterations.
However, I want to further speed up my processing by 1. caching the URL and the authentication values(user id and password) as they would remain the same in each call 2. Spawning multiple sub-processes which could take a certain number of calls and pass them as a group thus allowing for parallel processing of these smaller sub-processes
Please note that I pass my authentication as headers in base64 format and a pseudo code of my post call would typically look like this:
S=requests.Session()
url='https://example.net/'
Loop through data records:
headers={'authorization':authbase64string,'other headers'}
data="data for this loop"
#Post call
r=s.post(url,data=data,headers=headers)
response=r.json()
#end of loop and program
Please review the scenario and suggest any techniques/tips which might be of help-
Thanks in advance,
Abhishek

You can:
do it as you described (if you want to make it faster then you can run it using multiprocessing) and e.g. add headers to session, not request.
modify target server and allow to send one post request with multiple data (so you're going to limit time spent on connecting, etc)
do some optimalizations on server side, so it's going to reply faster (or just store requests and send you response using some callback)
It would be much easier if you described the use case :)

Can you hook pyramid into twisted, skipping the wsgi part?

Given that :
WSGI doesn't play very well with async.
Twisted ergonomics suck.
Pyramid is very clean and component oriented.
How could I use Pyramid and Twisted ?
I can imagine making a twisted protocol to get the raw HTML request. But then I can't see how to parse it into a pyramid request objects. All documented pyramid tools seems to expect some wsgi interface at some point.
I could use waitress code to parse the request and turn it into a WSGI env then pass the env to pyramid but that's a lot of work with many issues I'm sure I can't even imagine down the road.
I know twisted includes a WSGI server, but it implies synchronicity in the app code, which does not serve my purpose. I want to be able to use the request and response objects, renderers, routers, and others pyramid tools in a twisted asynchronous protocol, with an asynchronous, non blocking app code as well. Hence I won't want to use WSGI.
Twisted API is verbose, heavy and uninuitive compared to any other asynchronous toolkit you'll find in Python or even other languages. Hence the critic about its ergonomics. I can use it, but training newcomers in my team to do it has a high cost. I wish to lower it.
Indeed, it packs a lot of power that I want to use.
To elaborate on my needs, I'm building a tool using crossbar.io and cyclone to have a WAMP/HTTP framework a bit friendlier to my team that the current tools. But cyclone is not as complete as pyramid, and I was hoping pyramid components were decoupled enough that the WSGI paradigm was not enforced, so I could leverage the tremendous work they did on it. All I need is an entry point : somewhere to get the HTML, and parse it into a request objet, and somewhere to take a response object, and returns HTML to the client. I wish i don't have to write a protocol manually for this, http is tricky and I'm sure I'll get it wrong in many ways.
One precision : i don't wish to use the full pyramid framework, just some components here and there, such as rooting, cookie parsing, CSRF protection, etc. I won't use their view system for it assumes a synchronous API.
Looking at Pyramid, I can see that it expects the entire request be be parsed and turned into a request object. it also returns the response as an object as well. So a part of the problem, to hook twisted and pyramid together, is to :
get the http request text as one big chunk from twisted;
parse it into the request object somehow (couldn't find a simple function to do this, but if I can turn it into an WSGI environ + request object, pyramid can convert it to it's format).
get the pyramid response object and turn it into a generator of strings (an adaptor can be find since that's what WSGI does).
Send the response back with twisted from this generator of strings.
Alternatives can be to use something simpler than pyramid like werkzeug for the glue.

Twisted Web lets you interpret HTTP request bodies (regardless of content-type, HTML or otherwise) incrementally as they're received - but it doesn't make doing so very easy. There's a very old ticket that we never seem to make much progress on for improving this situation. Until it's resolved, there probably isn't a better answer than the one I'm about to give. This incremental HTTP request body delivery, I think, is what you're looking for here (because you said you expect requests to "be a big HTML chunk").
The hook for incremental request body handling is Request.handleContentChunk. You can see a complete demonstration of its use in my answer to Python server for streaming request body content.
This gives you the data as it arrives at the server. If you want to use Pyramid, you'll have to construct a Pyramid request that uses this data. Most of the initialization of the Pyramid request object should be straightforward (eg filling the environ dictionary with the request headers - you can take these from Request.requestHeaders). The slightly trickier part will be initializing the Pyramid request object's body - which is supposed to be a file-like object that provides synchronous access to the request body.
On the one hand, if you dispatch the request before the request body has been completely received then you avoid the cost of buffering the entire request body in memory. On the other hand, if you let application code begin to read the request body then you have to deal with the circumstance that it tries to read beyond the point in the data which has actually arrived at the server. This can be dealt with. The body file-like object is expected to present a blocking interface. All you have to do is block until the data is available.
Here's a brief (incomplete, not meant to actually work) sketch of what I mean:
# XXX Note: Queue is not actually thread-safe. Use a safer primitive.
from Queue import Queue
class Body(object):
def __init__(self):
self._buffer = Queue()
self._pending = b""
self._eof = False
def read(self, how_many):
if self._eof:
return b""
if self._pending == b"":
data = self._buffer.get()
if data is None:
self._eof = True
return b""
else:
self._pending = data
if self._pending is None:
result = self._pending[:how_many]
self._pending = self._pending[how_many:]
return result
def _add_data(self, data):
self._buffer.put(data)
You can create an instance of this type, initialize the Pyramid request object's body attribute with it, and then call _add_data on it in the Twisted Request class's handleContentChunk callback.
You could also implement this as an enhancement to Twisted's own WSGI server. For the sake of simplicity, Twisted's WSGI server does read the entire request body before dispatching the request to the WSGI application - but it doesn't have to. If this is the only problem with WSGI then it'd be better to improve the quality of the WSGI implementation and keep the interface rather than both implementing the improvement and stepping outside of the interface (tying you more closely to both Twisted and Pyramid - unnecessarily).
The second half of the problem, generating response bodies incrementally, shouldn't really be a problem. Twisted's WSGI container will write out response data as the WSGI application object yields it. Or if you use twisted.web.resource instead of the WSGI interface, you can call request.write as many times as you like, at any time you like (up until you call request.finish). The only trick is that if you want to do this you must return NOT_DONE_YET from the render method.

Listening to the Output of a Web Application

I have been trying, in vain, to make a program that reads text out loud using the web application found here (http://www.ispeech.org/text.to.speech.demo.php). It is a demo text-to-speech program, that works very well, and is relatively fast. What I am trying to do is make a Python program that would input text to the application, then output the result. The result, in this case, would be sound. Is there any way in Python to do this, like, say, a library? And if not, is it possible to do this through any other means? I have looked into the iSpeech API (found here), but the only problem with it is that there is a limited number of free uses (I believe that it is 200). While this program is only meant to be used a couple of times, I would rather it be able to use the service more then 200 times. Also, if this solution is impractical, could anyone direct me towards another alternative?
# AKX I am currently using eSpeak, and it works well. It just, well, doesn't sound too good, and it is hard to tell at times what is being said.

If using iSpeech is not required, there's a decent (it's surely not as beautifully articulated as many commercial solutions) open-source text-to-speech solution available called eSpeak.
It's usable from the command line (subprocess with Python), or as a shared library. It seems there's also a Python wrapper (python-espeak) for it.
Hope this helps.

OK. I found a way to do it, seems to work fine. Thanks to everyone who helped! Here is the code I'm using:
from urllib import quote_plus
def speak(text):
import pydshow
words = text.split()
temp = []
stuff = []
while words:
temp.append(words.pop(0))
if len(temp) == 24:
stuff.append(' '.join(temp))
temp = []
stuff.append(' '.join(temp))
for i in stuff:
pydshow.PlayFileWait('http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text='+quote_plus(i))
if __name__ == '__main__':
speak('Hello. This is a text-to speech test.')
I find this ideal because it DOES use the API, but it uses the API key that is used for the demo program. Therefore, it never runs out. The key is 8d1e2e5d3909929860aede288d6b974e.
You can actually test this at work without the program, by typing the following into your address bar:
http://api.ispeech.org/api/rest?apikey=8d1e2e5d3909929860aede288d6b974e&format=mp3&action=convert&voice=ukenglishmale&text=
Followed by the text you want to speak. You can also adjust the language, by changing, in this case, the ukenglishmale to something else that iSpeech offers. For example, ukenglishfemale. This will speak the same text, but in a feminine voice.
NOTE: Pydshow is my wrapper around DirectShow. You can use yours instead.

The flow of your application would be like this:
Client-side: User inputs text into form, and form submits a request to server
Server: may be python or whatever language/framework you want. Receives http request with text.
Server: Runs text-to-speech either with pure python library or by running a subprocess to a utility that can generate speech as a wav/mp3/aiff/etc
Server: Sends HTTP response back by streaming file with a mime type to Client
Client: Receives the http response and plays the content
Specifically about step 3...
I don't have any particular advise on the most articulate open source speech synthesizing software available, but I can say that it does not have to necessarily be pure python, or even python at all for that matter. Most of these packages have some form of a command line utility to take stdin or a file and produce an audio file as output. You would simply launch this utility as a subprocess to generate the file, and then stream the file back in your http response.
If you decide to make use of an existing web service that provides text-to-speech via an API (iSpeech), then step 3 would be replaced with making your own server-side http request out to iSpeech, receiving the response and pretty much forwarding that response back to the original client request, like a proxy. I would say the benefit is not having to maintain your own speech synthesis solution or getting better quality that you could from an open source... but the downside is that you probably will have a bit more latency in your response time since your server has to make its own external http request and download the data first.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.