Cloud Function http - send back results live as they arrive - python

I have a Cloud Function (Python) who does some long(not heavy) calculation that depend on other external APIs so the respond might take some time ( 30 seconds).
def test(request):
request_json = request.get_json()
for x in y:
r = get_external_api_respond()
calculate r and return partial respond
The problems and questions are :
Is there a way to start returning to the web client results as they arrive to the Function? right now I know http can only return once to the message and close connection.
Pagination in this case will be too complicated to achieve, as results depend on previous results, etc. Are there any solutions in Google Cloud to return live results as they come ? other type of Function ?
Will it be very expensive if the function stay open for a minute even tough it does not have heavy calculations, just doing multiple API request in loop ?

You need to use some intermediary storage, which you will top-up from your function, and read in your HTTP request from web page. I wouldn't call it producer-consumer pattern, really, as you produce once, but consumes as many times as you need to.
You can use a Table Storage or Blob Storage if you use Azure.
https://learn.microsoft.com/en-us/azure/storage/tables/table-storage-overview
https://azure.microsoft.com/en-gb/products/storage/blobs/
With Table, you can just add records as you get them calculated.
With Blob, you can use Append blob type, or just read and write blob again (it seems like you use single producer).
As a bonus, you can distribute your task across multiple functions and get results much faster. This is called scale-out.

Related

What should be the strategy/technology to capture streaming tick data coming from a stock broker

I am using a stock broker's API to create an application in python. The API has a websocket. I need pass a list of tokens (of stock tickers) and the callback function is called as and when there is a change in any data for those tokens (for example price of the stock). The structure is -
api.start_websocket(order_update_callback=event_handler_order_update, subscribe_callback=event_handler_quote_update, socket_open_callback=open_callback)
In the socket open, I need to subscribe list of tokens. For example:
api.subscribe(['NSE|22', 'BSE|522032'])
I can use event_handler_quote_update to process the streaming data. Such as showing the live prices, Open Interest, Volume etc. on a web page or on a spreadsheet.
The processing of the data will be slower than the speed of streaming data. In that case what should be the strategy or technology to make sure that all the data get processed in real time? Can I use Queue? But it appears that the blocking mechanism may hamper the functionality in this case. Is threading a good solution? If yes, how it should be implemented in this case? Any other options?

Retrieving the number of tasks in a particular state in the Azure Batch API

The Azure Batch API provides the list function, which retrieves an enumerable list of tasks in a job, which takes TaskListOptions, to, for instance, filter the tasks by state.
I would like to query the API only for the number of tasks in a particular state and the API does not provide a function for that. I can do it by downloading an enumerating all the tasks, for instance like so:
n = sum(1 for t in bsc.task.list(job.id, bm.TaskListOptions(filter="state eq 'Completed'")))
This is of course horribly slow. The OData specification does provide the $count query option, but I can't find a way to add that onto the query. Is there a way to use $count with the Batch API, or is there perhaps a completely different alternative, e.g., via raw REST queries bypassing the Batch API?
Updated 2017-07-31:
You can now query the task counts for a job directly using the get_task_counts API. This will return a TaskCounts object for the specified job.
As it appears you are using the Azure Batch Python SDK, please use azure-batch version 3.1.0 or later.
Original Answer:
Right now, doing a list query as you have it is the only way to accomplish counts. You can slightly optimize your query by providing a select clause where only the properties you care about are returned by the server which will reduce the amount of data transferred. This is a common ask and improvements in this space are on their way - this answer will be updated when available.
To your other question, the language SDKs are built on top of the REST API and expose the full functionality of the REST layer.

Python requests caching authentication headers

I have used python's requests module to do a POST call(within a loop) to a single URL with varying sets of data in each iteration. I have already used session reuse to use the underlying TCP connection for each call to the URL during the loop's iterations.
However, I want to further speed up my processing by 1. caching the URL and the authentication values(user id and password) as they would remain the same in each call 2. Spawning multiple sub-processes which could take a certain number of calls and pass them as a group thus allowing for parallel processing of these smaller sub-processes
Please note that I pass my authentication as headers in base64 format and a pseudo code of my post call would typically look like this:
S=requests.Session()
url='https://example.net/'
Loop through data records:
headers={'authorization':authbase64string,'other headers'}
data="data for this loop"
#Post call
r=s.post(url,data=data,headers=headers)
response=r.json()
#end of loop and program
Please review the scenario and suggest any techniques/tips which might be of help-
Thanks in advance,
Abhishek
You can:
do it as you described (if you want to make it faster then you can run it using multiprocessing) and e.g. add headers to session, not request.
modify target server and allow to send one post request with multiple data (so you're going to limit time spent on connecting, etc)
do some optimalizations on server side, so it's going to reply faster (or just store requests and send you response using some callback)
It would be much easier if you described the use case :)

Aggregate multiple APIs request results using python

I'm working on an application that will have to use multiple external APIs for information and after processing the data, will output the result to a client. The client uses a web interface to query, once query is send to server, server process send requests to different API providers and after joining the responses from those APIs then return response to client.
All responses are in JSON.
current approach:
import requests
def get_results(city, country, query, type, position):
#get list of apis with authentication code for this query
apis = get_list_of_apis(type, position)
results = [ ]
for api in apis:
result = requests.get(api)
#parse json
#combine result in uniform format to display
return results
Server uses Django to generate response.
Problem with this approach
(i) This may generate huge amounts of data even though client is not interested in all.
(ii) JSON response has to be parsed based on different API specs.
How to do this efficiently?
Note: Queries are being done to serve job listings.
Most APIs of this nature allow for some sort of "paging". You should code your requests to only draw a single page from each provider. You can then consolidate the several pages locally into a single stream.
If we assume you have 3 providers, and page size is fixed at 10, you will get 30 responses. Assuming you only show 10 listings to the client, you will have to discard and re-query 20 listings. A better idea might be to locally cache the query results for a short time (say 15 minutes to an hour) so that you don't have to requery the upstream providers each time your user advances a page in the consolidated list.
As far as the different parsing required for different providers, you will have to handle that internally. Create different classes for each. The list of providers is fixed, and small, so you can code a table of which provider-url gets which class behavior.
Shameless plug but I wrote a post on how I did exactly this in Durango REST framework here.
I highly recommend using Django REST framework, it makes everything so much easier
Basically, the model on your APIs end is extremely simple and simply contains information on what external API is used and the ID for that API resource. A GenericProvider class then provides an abstract interface to perform CRUD operations on the external source. This GenericProvider uses other providers that you create and determines what provider to use via the provider field on the model. All of the data returned by the GenericProvider is then serialised as usual.
Hope this helps!

Real-time data on webpage with jQuery

I would like a webpage that constantly updates a graph with new data as it arrives. Regularly, all the data you have is passed to the page at the beginning of the request. However, I need the page to be able to update itself with fresh information every few seconds to redraw the graph.
Background
The webpage will be similar to this http://www.panic.com/blog/2010/03/the-panic-status-board/. The data coming in will temperature values to be graphed measured by an Arduino and saved to the Django database (this part is already complete).
Update
It sounds as though the solution is to use the jQuery.ajax() function ( http://api.jquery.com/jQuery.ajax/) with a function as the .complete callback that will schedule another request several seconds later to a URL that will return the data in JSON format.
How can that method be scheduled? With the .delay() function?
So the page must perform periodic jQuery.ajax calls with a url parameter set to a server's URL where the latest up-to-data information (possibly just as an incremental delta from the last instant for which the client has info -- the client can send that instant as a query parameter in the Ajax call) is served, ideally in JSON form. The callback at the completion of the async request can schedule the next ajax calls for a few seconds in the future, and then repaint the graph.
The fact that, server-side, you're using Django, doesn't seem all that crucial -- the server just needs to format a bunch of data as JSON and send it back, after all.
If the data is composed by on-the-fly graphics, you can just serve them (in gif, png or any other graphic format) through Django and reload each individually or even reload all the page at once.
It depends on the load and performance requirements, but you can start simply reloading all the page and then, if necessary, use AJAX to reload just each specific part (it's very easy to achieve with jQuery or prototype using updates). If you expect a lot of data, then you should change to generate the graphic on the client, using just JSON incremental data.

Categories