I am designing a solution that will perform concurrent http requests to certain REST API. Thing is that this API requires to make a POST request to an authentication endpoint (with username + api_key in the header) so that the server gives you a 20min valid JWT.
As far as I see whenever you ask for a new token, previous token is no longer valid.
In this scenario, is concurrency a possibility? (e.g. using multi-thread in python). As far as I understand, with that principle of working with JWT only one thread shall be doing the job, and not "n" number of threads concurrently, as each thread would invalidate previous token
Create one thread which only manages the authentication, ie fetches a new JWT every 15 minutes (something under 20).
Create N worker threads which make the POST requests.
All threads should share a variable holding the JWT in use and a synchronization primitive like threading.Event. Where you want to store them (global, class, etc) is up to you.
The workers wait for this Event to be set via ev.wait() before every POST request.
The auth thread clear the event via ev.clear() when it needs to fetch a new JWT and when it has set the JWT variable, just set the Event again via ev.set()
This way the POST workers will run freely until the time to refresh the JWT, then pause only while a new one is being fetched.
This can also be done with asyncio.Event if you use async concurrency.
Related
There are two different services. One service -Django is getting the request from the front-end and then calling an API in the other service -Flask.
But the response time of the Flask service is high and if the user navigates to another page that request will be canceled.
Should it be a background task or a pub/sub pattern? If so, how to do it in the background and then tell the user here is your last result?
You have two main options possible:
Make an initial request to a "simple" view of Django, which load a skeleton HTML page with a spinner where some JS will trigger a XHR request to a second Django view which will contain the other service (Flask) call. Thus, you can even properly alert your user the loading takes times and handle the exit on the browser side (ask confirmation before leaving/abort the request...)
If possible, cache the result of the Flask service, so you don't need to call it at each page load.
You can combine those two solutions by calling the service in a asynchronous request and cache its result (depending on context, you may need to customize the cache depending on the user connected for example).
The first solution can be declined with pub/sub, websockets, whatever, but a classical XHR seems fine for your case.
On our project, we have a couple of time-expensive endpoints. Our solution was similar to a previous answer:
Once we receive a request we call a Celery task that does its expensive work in async mode. We do not wait for its results and return a quick response to the user. Celery task sends its progress/results via WebSockets to a user. Frontend handles this WS message. The benefit of this approach is that we do not spend the CPU of our backend. We spend the CPU of the Celery worker that is running on another machine.
How can I persist an API object across different Celery tasks? I have one API object per user with an authenticated session (python requests) to make API calls. A user_id, csrftoken, etc. is sent with each request.
I need to schedule different tasks in Celery to perform API requests without re-authenticating for each task.
How can I do this?
You can put these data into the database/memcache and fetch by userid as a key.
If these data are stateless - it's fine. Concurrent processes take the authenticating parameters, construct request and send it.
If it changes the state (unique incrementing request id, changing token, etc) after each request (or in some requests) - you need to implement a singleton manager to provide correct credentials by request. All tasks should request for credentials from this manager. It can also limit the rate for example.
If you would like to pass this object to the task as a parameter - then you need to serialize it. Just make sure it is seriazeable.
I have an HTTP API using Flask and in one particular operation clients use it to retrieve information obtained from a 3rd party API. The retrieval is done with a celery task. Usually, my approach would be to accept the client request for that information and return a 303 See Other response with an URI that can be polled for the response as the background job is finished.
However, some clients require the operation to be done in a single request. They don't want to poll or follow redirects, which means I have to run the background job synchronously, hold on to the connection until it's finished, and return the result in the same response. I'm aware of Flask streaming, but how to do such long-pooling with Flask?
Tornado would do the trick.
Flask is not designed for asynchronization. A Flask instance processes one request at a time in one thread. Therefore, when you hold the connection, it will not proceed to next request.
I want to make a Google App Engine app that does the following:
Client makes an asynchronous http request
Server starts processing that request
Client makes ajax http requests to get progress
The problem is that the server processing (step #2) may take more than 30 seconds.
I know that you can't have threads on Google Application Engine and that all tasks must complete within 30 seconds or they get shut down. Is there some way to work around this?
Also, I'm using python-django as a backend.
You'll want to use the Task Queue API, probably via deferred tasks. The deferred API makes working with Task Queues dramatically simpler.
Essentially, you'll want to spawn a task to start the processing. That task should catch DeadlineExceeded exceptions and reschedule itself (again via the deferred API) to continue processing. This requires that your tasks be able to keep track of their own progress. They can also update their own status in memcache, which you can use to write a view that checks a task's status. That view can then be polled via Ajax.
I'm using CherryPy to make a web-based frontend for SymPy that uses an asynchronous process library on the server side to allow for processing multiple requests at once without waiting for each one to complete. So as to allow for the frontend to function as expected, I am using one process for the entirety of each session. The client-side Javascript sends the session-id from the cookie to the server when the user submits a request, and the server-side currently uses a pair of lists, storing instances of a controller class in one and the corresponding session-id's in another, creating a new interpreter proxy and sending the input if a non-existant session-id is submitted. The only problem with this is that the proxy classes are not deleted upon the expiration of their corresponding sessions. Also, I can't see anything to retrieve the session-id for which the current request is being served.
My questions about all this are: is there any way to "connect" an arbitrary object to a CherryPy session so that it gets deleted upon session expiration, is there something I am overlooking here that would greatly simplify things, and does CherryPy's multi-threading negate the problem of synchronous reading of the stdout filehandle from the child process?
You can create your own session type, derived from CherryPy's base session. Use its clean_up method to do your cleanup.
Look at cherrypy/lib/sessions.py for details and sample session implementations.