How can I persist an authenticated API object across different Celery tasks? - python

How can I persist an API object across different Celery tasks? I have one API object per user with an authenticated session (python requests) to make API calls. A user_id, csrftoken, etc. is sent with each request.
I need to schedule different tasks in Celery to perform API requests without re-authenticating for each task.
How can I do this?

You can put these data into the database/memcache and fetch by userid as a key.
If these data are stateless - it's fine. Concurrent processes take the authenticating parameters, construct request and send it.
If it changes the state (unique incrementing request id, changing token, etc) after each request (or in some requests) - you need to implement a singleton manager to provide correct credentials by request. All tasks should request for credentials from this manager. It can also limit the rate for example.
If you would like to pass this object to the task as a parameter - then you need to serialize it. Just make sure it is seriazeable.

Related

Concurrency with a REST API that uses JWT

I am designing a solution that will perform concurrent http requests to certain REST API. Thing is that this API requires to make a POST request to an authentication endpoint (with username + api_key in the header) so that the server gives you a 20min valid JWT.
As far as I see whenever you ask for a new token, previous token is no longer valid.
In this scenario, is concurrency a possibility? (e.g. using multi-thread in python). As far as I understand, with that principle of working with JWT only one thread shall be doing the job, and not "n" number of threads concurrently, as each thread would invalidate previous token
Create one thread which only manages the authentication, ie fetches a new JWT every 15 minutes (something under 20).
Create N worker threads which make the POST requests.
All threads should share a variable holding the JWT in use and a synchronization primitive like threading.Event. Where you want to store them (global, class, etc) is up to you.
The workers wait for this Event to be set via ev.wait() before every POST request.
The auth thread clear the event via ev.clear() when it needs to fetch a new JWT and when it has set the JWT variable, just set the Event again via ev.set()
This way the POST workers will run freely until the time to refresh the JWT, then pause only while a new one is being fetched.
This can also be done with asyncio.Event if you use async concurrency.

How to send partial status of request to frontend by django python?

Suppose, I have sent a post request from react to Django rest API and that request is time taking. I want to get how many percentages it has been processed and send to the frontend without sending the real response?
There are two broad ways to approach this.
(which I would recommend to start with): Break the request up. The initial request doesn't start the work, it sends a message to an async task queue (such as Celery) to do the work. The response to the initial request is the ID of the Celery task that was spawned. The frontend now can use that request ID to poll the backend periodically to check if the task is finished and grab the results when they are ready.
Websockets, wherein the connection to the backend is kept open across many requests, and either side can initiate sending data. I wouldn't recommend this to start with, since its not really how Django is built, but with a higher level of investment it will give an even smoother experience.

How to handle high response time

There are two different services. One service -Django is getting the request from the front-end and then calling an API in the other service -Flask.
But the response time of the Flask service is high and if the user navigates to another page that request will be canceled.
Should it be a background task or a pub/sub pattern? If so, how to do it in the background and then tell the user here is your last result?
You have two main options possible:
Make an initial request to a "simple" view of Django, which load a skeleton HTML page with a spinner where some JS will trigger a XHR request to a second Django view which will contain the other service (Flask) call. Thus, you can even properly alert your user the loading takes times and handle the exit on the browser side (ask confirmation before leaving/abort the request...)
If possible, cache the result of the Flask service, so you don't need to call it at each page load.
You can combine those two solutions by calling the service in a asynchronous request and cache its result (depending on context, you may need to customize the cache depending on the user connected for example).
The first solution can be declined with pub/sub, websockets, whatever, but a classical XHR seems fine for your case.
On our project, we have a couple of time-expensive endpoints. Our solution was similar to a previous answer:
Once we receive a request we call a Celery task that does its expensive work in async mode. We do not wait for its results and return a quick response to the user. Celery task sends its progress/results via WebSockets to a user. Frontend handles this WS message. The benefit of this approach is that we do not spend the CPU of our backend. We spend the CPU of the Celery worker that is running on another machine.

Flask request waiting for asynchronous background job

I have an HTTP API using Flask and in one particular operation clients use it to retrieve information obtained from a 3rd party API. The retrieval is done with a celery task. Usually, my approach would be to accept the client request for that information and return a 303 See Other response with an URI that can be polled for the response as the background job is finished.
However, some clients require the operation to be done in a single request. They don't want to poll or follow redirects, which means I have to run the background job synchronously, hold on to the connection until it's finished, and return the result in the same response. I'm aware of Flask streaming, but how to do such long-pooling with Flask?
Tornado would do the trick.
Flask is not designed for asynchronization. A Flask instance processes one request at a time in one thread. Therefore, when you hold the connection, it will not proceed to next request.

Python Webserver: How to serve requests asynchronously

I need to create a python middleware that will do the following:
a) Accept http get/post requests from multiple clients.
b) Modify and Dispatch these requests to a backend remote application (via socket communication). I do not have any control over this remote application.
c) Receive processed results from backend application and return these results back to the requesting clients.
Now the clients are expecting a synchronous request/response scenario. But the backend application is not returning the results synchronously. That is, some requests take much longer to process than others. Hence,
Client 1 : send http request C1 --> get response R1
Client 2 : send http request C2 --> get response R2
Client 3 : send http request C3 --> get response R3
Python middleware receives them in some order: C2, C3, C1. Dispatches them in this order to backend (as non-http messages). Backend responds with results in mixed order R1, R3, R2. Python middleware should package these responses back into http response objects and send the response back to the relevant client.
Is there any sample code to program this sort of behavior. There seem to be something like 20 different web frameworks for python and I'm confused as to which one would be best for this scenario (would prefer something as lightweight as possible ... I would consider Django too heavy ... I tried bottle, but I am not sure how to go about programming that for this scenario).
================================================
Update (based on discussions below): Requests have a request id. Responses have a response id (which should match the request id that they correspond to). There is only one socket connection between the middleware and the remote backend application. While we can maintain a {request_id : ip_address} dictionary, the issue is how to construct a HTTP response object to the correct client. I assume, threading might solve this problem where each thread maintains its own response object.
Screw frameworks. This exactly the kind of task for asyncore. This module allows event-based network programming: given a set of sockets, it calls back given handlers when data is ready on any of them. That way, threads are not necessary just to dumbly wait for data on one socket to arrive and painfully pass it to another thread. You would have to implement the http handling yourself, but examples can be found on that. Alternatively, you could use the async feature of uwsgi, which would allow your application to be integrated with an existing webserver, but that does not integrate with asyncore by default --- though it wouldn't be hard to make it work. Depends on specific needs.
Quoting your comment:
The middleware uses a single persistent socket connection to the backend. All requests from middleware are forwarded via this single socket. Clients do send a request id along with their requests. Response id should match the request id. So the question remains: How does the middleware (web server) keep track of which request id belonged to which client? I mean, is there any way for a cgi script in middleware to create a db of tuples like and once a response id matches, then send a http response to clientip:clienttcpport ?
Is there any special reason for doing all this processing in a middleware? You should be able to do all this in a decorator, or somewhere else, if more appropriate.
Anyway, you need to maintain a global concurrent dictionary (extend dict and protect it using threading.Lock). Upon a new request, store the given request-id as key, and associate it to the respective client (sender). Whenever your backend responds, retrieve the client from this dictionary, and remove the entry so it doesn't accumulate forever.
UPDATE: someone already extended the dictionary for you - check this answer.
Ultimately your going from the synchronous http request-response protocol from your clients to an asynchronous queuing/messaging protocol with your backend. So you've two choices (1) either make requests wait until the backend has no outstanding work, then process one (2) write something that marries the backend responses with their associated request (using a dictionary of request or something)
One way might be to run your server in one thread while dealing with your backend in another (see... Run Python HTTPServer in Background and Continue Script Execution) or maybe look at aiohttp (https://docs.aiohttp.org/en/v0.12.0/web.html)

Categories