Django steps or process messages via REST - python

For learning purpose I want to implement the next thing:
I have a script that runs selenium for example in the background and I have some log messages that help me to see what is going on in the terminal.
But I want to get the same messages in my REST request to the Angular app.
print('Started')
print('Logged in')
...
print('Processing')
...
print('Success')
In my view.py file
class RunTask(viewsets.ViewSet):
queryset = Task.objects.all()
#action(detail=False, methods=['GET'], name='Run Test Script')
def run(self, request, *args, **kwargs):
task = task()
if valid['success']:
return Response(data=task)
else:
return Response(data=task['message'])
def task()
print('Staring')
print('Logged in')
...
print('Processing')
...
print('Success')
return {
'success': True/False,
'message': 'my status message'
}
Now it shows me only the result of the task. But I want to get the same messages to indicate process status in frontend.
And I can't understand how to organize it.
Or how I can tell angular about my process status?

Unfortunately, it's not that simple. Indeed, the REST API lets you start the task, but since it runs in the same thread, the HTTP request will block until the task is finished before sending the response. Your print statements won't appear in the HTTP response but on your server output (if you look at the shell where you ran python manage.py runserver, you'll see those print statements).
Now, if you wish to have those output in real-time, you'll have to look for WebSockets. They allow you to open a "tunnel" between the browser and the server, and send/receive messages in real-time. The django-channels library allow you to implement them.
However, for long-running background tasks (like a Selenium scraper), I would advise to look into the Celery task queue. Basically, your Django process will schedule task into the queue. The tasks into the queue will then be executed by one (or more !) "worker" processes. The advantage of this is that your Django process won't be blocked by the long task: it justs add some work into the queue and then respond.
When you add tasks in the queue, Celery will give you a unique identifier for this task, that you can return in the HTTP response. You can then very well implement another endpoint which takes a task id in parameter and return the state of the task (is it pending ? done ? failed ?).
For this to work, you'll have to setup a "broker", a kind of database that will store the tasks to do and their results (typically RabbitMQ or Redis). Celery documentation explains this well: https://docs.celeryproject.org/en/latest/getting-started/brokers/index.html
Either way you choose, it's not a trivial thing and will need quite some work before having some results ; but it's interesting to see how it expands the possibilities of a classical HTTP server.

Related

Python/Quart: how to call client back when the app's background task is done?

I need help with the python web framework, Quart. I want to build a python server that returns 202 as soon as a client requests some time consuming I/O task, and call the client back to return value of that task as soon as the task is done.
For that purpose, I add task requested by client to the background task using app.add_background_task(task) and that gave me a successful result as it returns 202 immediately. But I'm not sure how I can approach the return value of background task and call the client back to give that value.
I'm reading https://quart.palletsprojects.com/en/latest/how_to_guides/server_sent_events.html this article. But I'm not sure how to handle it.
async def background_task(timeout=10):
print(f"background task started at", str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")))
await asyncio.sleep(timeout)
print(f"background task completed at", str(datetime.now().strftime("%d/%m/%Y %H:%M:%S")))
return "requested task done"
#app.route("/", methods=["GET"])
async def main_route():
print("Hello from main route")
app.add_background_task(background_task, 10)
return "request accepted", 202
To push information to the client, you'll need Websockets or some other mechanism - it'll require server and client-side implementations
A simpler solution is to poll the server from the client to determine if the task is complete or not. i.e. send requests repeatedly to the server until you get confirmation of what you expect, or your max number of attempts is exceeded (or a request just times out entirely)

AWS lambda does not finish execution when response is sent back to client

I'm trying to implement Fire and Forget mechanism using FastAPI. I'm facing few difficulties when implementing the mechanism.
I have two applications. One is developed with FastAPI and other is Flask. FastAPI will run in AWS Lambda and it will send requests to the Flask app running on AWS ECS.
Currently, I was able to send a request to the Flask API and receive an immediate response from the FastAPI app. But I see FastAPI still running bg_tasks.add_task(make_request, request) in the background which times out after lambda execution threshold time (15 mins).
Fast API application:
def make_request(data):
"""
Function to make a post request to flask application
:param data: Data from the user to write into the file
:return: None
"""
print("***** Inside post *****")
requests.post(url=root_url, data=data)
print("***** Post completed *****")
#router.post("/write-to-file")
async def write_to_file(request: Dict, bg_tasks: BackgroundTasks):
"""
Function to queue the requests and return to the post function
:param request: Request from the user
:param bg_tasks: Background task instance
:return: Some message
"""
print(f"****** Request call started ******")
bg_tasks.add_task(make_request, request)
print(f"****** Request completed ******")
return {"Message": "Data will be written into the file"}
Flask Application:
#app.route('/', methods=['POST'])
def write():
"""
Function to write the request data into the file
:return:
"""
request_data = request.form
try:
print(f"Sleep time {int(request_data['sleep_time'])}")
time.sleep(int(request_data["sleep_time"]))
request_data = dict(request_data)
request_data['current_time'] = str(datetime.now())
with open("data.txt", "a") as f:
f.write("\n")
f.write(json.dumps(request_data, indent=4))
return {"Message": "Success"}
except Exception as e:
return {"Message": e}
Fast API (http://localhost:8000/write-to-file/) calls the write_to_file method, which adds all the tasks (requests) into the background queue and runs them in background.
This function does not wait for the process to be completed. However, it returns the response to the client side. make_request method will then trigger the Flask endpoint (http://localhost:5000/), which in turn will process the request and write to a file. Consider make_request as one AWS lambda, if flask application takes more hours to process, the lambda will wait for longer time.
Is it possible to kill lambda once the request is published, or do something else to solve the timeout issue?
With the current setup, your lambda would run for as long, as the Flask endpoint would require to process your request. Effectively, both APIs run exactly the same time.
This is because the requests.post in the lambda function must wait for the response to finish. Given that you don't care about the results of that response, I can think of several other ways to solve this.
If I were you, I would move the queue processing to the ECS side. Then the only thing that lambda would only be responsible for putting a job into the queue that the ECS worker would process when it has capacity.
This option would let you get rid of one of the APIs: you would be able to query the Flask API directly and kill the lambda function, or instead kill the Flask API and run a worker process on ECS.
Alternatively, you could respond early on the Flask API side, which would finish your HTTP request, and thus the lambda execution, sooner. This can be confusing to set up and defeats the purpose of exposing an HTTP API in the first place. Also, under some circumstances, the Flask request execution could be terminated by the webserver after a default timeout (~30 seconds).
And finally, in case you really-really want to leave your code as it is now, you could set a request to timeout after a short period of time. In case you go this route, make sure to choose a long enough timeout for Flask to start processing the request:
try:
requests.post(url=root_url, data=data, timeout=5) # throw after 5 seconds of waiting
except requests.exceptions.Timeout:
pass

How to provide user constant notification about Celery's Task execution status?

I integrated my project with celery in this way, inside views.py after receving request from the user
def upload(request):
if "POST" == request.method:
# save the file
task_parse.delay()
# continue
and in tasks.py
from __future__ import absolute_import
from celery import shared_task
from uploadapp.main import aunit
#shared_task
def task_parse():
aunit()
return True
In short, the shared task will run a function aunit() from a third python file located in uploadapp/ directory named main.py.
Let's assume that aunit() is a resource heavy process which takes time (like file parsing). As I integrated that with celery, It works totally asynchronously now which is good to me. So, the task start -> Celery process -> It finishes then celery set status to Finish. I can view that using flower .
But what I want to do is that I want to notify the user who is using my app also through django UI that Your Task is done processing as soon as Celery has finished processing at back-side and set status to SUCCESS.
Now, I know this is possible if :
1.) I constantly request the STATUS and see wheather it returns SUCCESS or not.
How do I do that via Celery. How can you query Celery Task status from your views.py and notify user asynchronously with just celery's python module ?
You need a real time mechanism. I would suggest Firebase. Update the Firebase real time DB field of user id with a boolean=True at the end of the celery task. Implement a javascript function to listen to Firebase database user_id object changes -> update the UI

Celery have task wait for completion of same task called previously with shared argument

I am currently trying to setup celery to handle responses from a chatbot and forward those responses to a user.
The chatbot hits the /response endpoint of my server, that triggers the following function in my server.py module:
def handle_response(user_id, message):
"""Endpoint to handle the response from the chatbot."""
tasks.send_message_to_user.apply_async(args=[user_id, message])
return ('OK', 200,
{'Content-Type': 'application/json; charset=utf-8'})
In my tasks.py file, I import celery and create the send_message_to_user function:
from celery import Celery
celery_app = Celery('tasks', broker='redis://')
#celery_app.task(name='send_message_to_user')
def send_message_to_user(user_id, message):
"""Send the message to a user."""
# Here is the logic to send the message to a specific user
My problem is, my chatbot may answer multiple messages to a user, so the send_message_to_user task is properly put in the queue but then a race condition arises and sometimes the messages arrive to the user in the wrong order.
How could I make each send_message_to_user task wait for the previous task with the same name and with the same argument "user_id" before executing it ?
I have looked at this thread Running "unique" tasks with celery but a lock isn't my solution, as I don't want to implement ugly retries when the lock is released.
Does anyone have any idea how to solve that issue in a clean(-ish) way ?
Also, it's my first post here so I'm open to any suggestions to improve my request.
Thanks!

How can I get the Python Task Queue and Channel API to send messages and respond to requests during a long-running process?

This is a probably basic question, but I have not been able to find the answer.
I have a long-running process that produces data every few minutes that I would like the client to receive as soon as it is ready. Currently I have the long-running process in a Task Queue, and it adds channel messages to another Task Queue from within a for loop. The client successfully receives the channel messages and downloads the data using a get request; however, the messages are being sent from the task queue after the long-running process finishes (after about 10 minutes) instead of when the messages are added to the task queue.
How can I have the messages in the task queue sent immediately? Do I need to have the for loop broken into several tasks? The for loop creates a number of dictionaries I think I would need to post to the data store and then retrieve for the next iteration (does not seem like an ideal solution), unless there is an easier way to return data from a task.
When I do not add the messages to a Task Queue and send the messages directly in the for loop, the server does not seem to respond to the client's get request for the data (possibly due to the for loop of the long-running process blocking the response?)
Here is a simplified version of my server code:
from google.appengine.ext import db
from google.appengine.api import channel
from google.appengine.api import taskqueue
from google.appengine.api import rdbms
class MainPage(webapp2.RequestHandler):
def get(self):
## This opens the GWT app
class Service_handler(webapp2.RequestHandler):
def get(self, parameters):
## This is called by the GWT app and generates the data to be
## sent to the client.
#This adds the long-process to a task queue
taskqueue.Task(url='/longprocess/', params = {'json_request': json_request}).add(queue_name='longprocess-queue')
class longprocess_handler(webapp2.RequestHandler):
def post(self):
#This has a for loop that recursively uses data in dictionaries to
#produce kml files every few minutes
for j in range(0, Time):
# Process data
# Send message to client using a task queue to send the message.
taskqueue.Task(url='/send/', params).add(queue_name=send_queue_name)
class send_handler(webapp2.RequestHandler):
def post(self):
# This sends the message to the client
# This is currently not happening until the long-process finishes,
# but I would like it to occur immediately.
class kml_handler(webapp2.RequestHandler):
def get(self, client_id):
## When the client receives the message, it picks up the data here.
app = webapp2.WSGIApplication([
webapp2.Route(r'/', handler=MainPage),
webapp2.Route(r'/Service/', handler=Service_handler),
webapp2.Route(r'/_ah/channel/<connected>/', handler = connection_handler),
webapp2.Route(r'/longprocess/', handler = longprocess_handler),
webapp2.Route(r'/kml/<client_id>', handler = kml_handler),
webapp2.Route(r'/send/', handler = send_handler)
],
debug=True)
Do I need to break up the long-process into tasks that send and retrieve results from the data store in order to have the send_handler execute immediately, or am I missing something? Thanks
The App Engine development server only processes one request at a time. In production, these things will occur simultaneously. Try in production, and check that things behave as expected there.
There's also not much reason to use a separate task to send the channel messages in production - just send them directly from the main task.

Categories