We have been sending a huge Result Set(which is over a million rows) from our database to the client using a REST service. This causes the whole result set to be loaded up in memory, it packs the whole thing up and sends it over to the client. The client has to download this huge amount of data before any processing can be done. I am trying to figure out a way to stream data between a server and a client.
We already have the REST service, so I am trying to send over a a Python generator from the server side (database) to the client, and the client just has to deal with the current chunk of data, and when it is done with the current chunk it can just call next().
I am not sure if this works as intended, do any of you know if the generator on the server side generates the whole data and then sends it over? Or does this actually works?
I am also thinking of using Python sockets if the REST service does not work, but I am open to suggestions. If anybody has dealt with something like this before, what did you do?
I have to use Python for this by the way.
I have tried using generators and I get all the data, but I am not sure if it is actually being sent over in chunks or if it is being generated in whole on the server side and being sent over, which defeats the purpose of it.
This just psudo-code to show what I am doing
#server
#app.route(/stream_data)
def stream_data():
def gen()
while data:
yield data[:chunk_size]
return
return requests.response(gen())
#client
def main():
url = 'blabla#rest.service.com'
with requests.get(url, stream=True) as r:
for x in r.iter_content():
print(x)
I am getting the data and everything works okay, but I am not sure if it actually streaming or just sending the whole thing over.
Related
I want to send a bulk of requests using requests.futures library in python.
The problem is some of the requests are getting Server Error in API response with 200 ok status code.
I need to implement a fallback mechanism to resend the failed requests and get the actual data from server.
This is what I have implemented.
Code to send bulk data and getting bulk response using futures library
def sendBulkRequests(requestURLs):
# this would send data to server
return responseList
Validate Response
requestURLsToRetry = []
for i in range(len(responseList)):
if 'sometext' not in responseList[i].text:
requestURLsToRetry.append(requestURLs[i])
Now I need to resend requestURLsToRetry to sendBulkRequests()
responseListRetried = sendBulkRequests(requestURLsToRetry)
How do I append responseListRetried it to original responseList to be returned so the next code can process hoping the data returned is correct.
I might need to send sendBulkRequests() multiple times until all correct API responses are received and above condition 2 is verified with no issues.
you cannnot change the list while you are iterating through it as it has to be fixed length. You could just iterate over the second list, or once the first process is finished and you have the list of failed requests just set list1 = listOfFailedRequests and iterate over list1 again with the previously defined function
I have simple bidirectional rpc proto interface, such as:
rpc RouteData(stream RouteNote) returns (stream ProcessedRouteNote) {}
The thing is that it might takes a while until I can return ProcessedRouteNote.
I would like to know what is the recommended way to store away a connected client so I could stream back a response(i.e. "ProcessedRouteNote") at a later time?
"def RouteData(self, request_iterator, servicer_context)"
It seems that saving "request_iterator" of "def RouteData", which is "RpcMethodHandler", and then calling directly to stream_stream would do the job.
Will appreciate any feedback.
I could probably simplify this question further by asking: How can I send data/response to a specific client that has previously sent a request(bidirectional) to the server? It should be noted that the intention is to send the response not in the context of the server's RPC handler. Moreover, there could be dozen of requests but only single response. So I have no interest to block RPC handler to wait for response. I really hope this is possible with grpc, otherwise it is really a deal breaker for us.
Thanks,
Mike
You can have a generator yielding response values and waiting for a threading.Event object to trigger that might be stored in a hashtable somewhere depending on your application logic.
I have used python's requests module to do a POST call(within a loop) to a single URL with varying sets of data in each iteration. I have already used session reuse to use the underlying TCP connection for each call to the URL during the loop's iterations.
However, I want to further speed up my processing by 1. caching the URL and the authentication values(user id and password) as they would remain the same in each call 2. Spawning multiple sub-processes which could take a certain number of calls and pass them as a group thus allowing for parallel processing of these smaller sub-processes
Please note that I pass my authentication as headers in base64 format and a pseudo code of my post call would typically look like this:
S=requests.Session()
url='https://example.net/'
Loop through data records:
headers={'authorization':authbase64string,'other headers'}
data="data for this loop"
#Post call
r=s.post(url,data=data,headers=headers)
response=r.json()
#end of loop and program
Please review the scenario and suggest any techniques/tips which might be of help-
Thanks in advance,
Abhishek
You can:
do it as you described (if you want to make it faster then you can run it using multiprocessing) and e.g. add headers to session, not request.
modify target server and allow to send one post request with multiple data (so you're going to limit time spent on connecting, etc)
do some optimalizations on server side, so it's going to reply faster (or just store requests and send you response using some callback)
It would be much easier if you described the use case :)
I'm using the Django Python framework with the Django REST Framework. When a new instance of a model is saved, I need to generate a PDF that is saved locally on the server. Is there a way that I can branch off the task of generating the PDF so that the user immediately gets a 201 return while the server is generating the PDF? I don't know if this would be a suitable situation for multithreading.
The parent's save function is called before starting the PDF generation so right in between there it would be safe to return 201.
def save(self, *args, **kwargs):
set_pdf = False
if self.id is None and self.nda_pdf is not None and len(self.nda_pdf) > 0:
set_pdf = True
super(Visitor, self).save(*args, **kwargs)
if set_pdf: generate_pdf(self)
I want to call that generate_pdf(self) function after returning something to the client.
Depending on how much does it take to generate PDF, you may want to block the response until the file is generated and only then return HTTP 201.
It has no influence on multithreading, neither for the client, nor for the server:
The client should do non-blocking requests any way (or at least do them from a thread different than the one which handles UI events). Moreover, if the client doesn't care about the response (i.e. whether the PDF is generated correctly or not), it's up to the client to send the request without waiting for the response.
The server... well, the server has to do PDF generation anyway. Returning HTTP 201 immediately won't change anything. Also, the fact that the server is currently responding to one request doesn't mean it won't process another one (unless you have too many requests or use a very weirdly configured HTTP server).
If PDF generation actually takes a long time (say more than a minute), then returning HTTP 202 Accepted (and not HTTP 201!) can be a solution in order to avoid timeouts or situations where clients won't understand why the server is not responding for too long.
I'm really having troubles wrapping my head around this and maybe someone can point me in the right direction:
I'm using python (django framework) for an web-application and i have an additional web socket server that receives chunked binary data from the browser. I want to send (or stream) those chunks to another server using the python-requests library.
According to the official documentation you have to provide a generator as the data attribute:
arr = []
def streamer():
global arr
for i in arr:
yield i
#lets say this function will get called when a "stream-start" message is sent to the web-socket server
def onStart():
resp = requests.post("http://some.url/chunked", data=streamer())
#lets say this function will get called when a chunk of binary data is sent to the web-socket server
def onChunk(chunk):
arr.append(chunk)
In this scenario, how would I possible be able to send anything since arr is empty when I send the request. How can I keep the connection open, so that every chunk will be sent?
I think there is some major issue that I don't understand about streaming in general. So, next to hints about solving my actual problem I would also appreciate any recommendation of tutorials or a good read on this subject.