Problem with #response.call_on_close in Google Cloud Run

Problem with #response.call_on_close in Google Cloud Run - python

I'm trying to write a script that uses #response.call_on_close on Google Cloud Run, to send an inmediate return to the invoker and do some processing after that, so that the invoker doesn't keep waitng.
The script involves the use of Selenium, and it works fine in a local Cloud Run, but when deployed on the actual Cloud I get a "devToolsActivePort file doesn't exist" error.
When I comment out all of the #response.call_on_close part and invoke it directly, it also works fine, so it has nothing to do with Selenium, there must be a problem with the decorator part but I can't figure it out.
This is the code I'm using to make the call:
from flask import Flask, request
from scraper import scrap
app = Flask(__name__)
#app.after_request
def response_processor(response):
request_json = request.get_json()
keyword = request_json['keyword']
tztimezone = request_json['tztimezone']
#response.call_on_close
def process_after_request():
scrap(keyword, topic, tztimezone)
return response
#app.route("/", methods=['GET', 'POST'])
def main():
if request.method != 'POST':
return 'Only POST requests are accepted', 405
return ''
Any help will be greatly appreciated.
Thanks!

You cannot perform CPU processing after you return a response to the client.
Google Cloud Run considers the service request is complete once you return a response. The CPU will be put to sleep for your container until the next request.
This link will help:
Lifecycle of a container on Cloud Run

Cloud Run just got a new feature: Always on cpu to disable the CPU throttling.
However, you want pays the processing time ONLY when the request is processed, you will pay for the instance until it is offloaded (about 15 minutes after the latest request received).
But, you have a CPU and memory cost discount (25% and 20%)
Be careful:
If your background job takes more than 15 minutes and the instance didn't receive new request, it will be offloaded and your job killed. This feature is designed to continue the process during several seconds after the request end.
Cloud Run team doesn't guaranty the 15 minutes (commonly observed) especially if the platform get an heavy demand on new Cloud Run instances.
In this case, you can combine this feature with the min instance feature. We can discuss further the different tradeoff of this design.

Related

Use Multiple Azure Application Insights in one Flask app

Hi i have a flask application that Build as a docker image to serve as an api
this image is deployed to multiple environments (DEV/QA/PROD)
i want to use an applicationInsight for each environment
using a single application Insight works fine
here is a code snippet
app.config['APPINSIGHTS_INSTRUMENTATIONKEY'] = APPINSIGHTS_INSTRUMENTATIONKEY
appinsights = AppInsights(app)
#app.after_request
def after_request(response):
appinsights.flush()
return response
but to have multiple application i need to configure app.config with the key of the app insight
i thought of this solution which thourghs errors
here a snippet :
app = Flask(__name__)
def monitor(key):
app.config['APPINSIGHTS_INSTRUMENTATIONKEY'] = key
appinsights = AppInsights(app)
#app.after_request
def after_request(response):
appinsights.flush()
return response
#app.route("/")
def hello():
hostname = urlparse(request.base_url).hostname
print(hostname)
if hostname == "dev url":
print('Dev')
monitor('3ed57a90-********')
if hostname == "prod url":
print('prod')
monitor('941caeca-********-******')
return "hello"
this example contains the function monitor which reads the url and decide which app key to give so it can send metrics to the right place but apparently i can't do those processes after the request is sent (is there a way a config variable can be changed based on the url condition ?)
error Message :
AssertionError: The setup method 'errorhandler' can no longer be called on the application. It has already handled its first request, any changes will not be applied consistently. Make sure all imports, decorators, functions, etc. needed to set up the application are done before running it.
i hope someone can guide me to a better solution
thanks in advance

AFAIK, Normally Application Insights SDK collect the telemetry data, and it has sent to Azure by batch. So, you have to keep a single application insights resource for an application. Use staging for use different application insights for same application.
When the request started for the service till to complete his response the Application insights has taking care of the specific service life cycle. The application while start to end it will track the information. So that we can't use more than one Application Insights in single application.
When Application starts the AI start collecting telemetry data when Application stops then the AI stops gathering telemetry information. We were using Flush to even though in between application stops to send information to AI.
I have tried what you have used. It confirms the same in the log
I have tried with single application insights I can be able to collect all telemetry information.

How to prevent the 230 seconds azure gateway timeout using python flask for long running work loads

I have a python flask application as a azure web app and one function is a compute intensive workload which takes more than 5 minutes to process, is there any hack to prevent the gateway time out error by keeping the TCP connection active between the client and the api while the function is processing the data? Sample of current code below.
from flask import Flask
app = Flask(__name__)
#app.route('/data')
def data():
mydata = super_long_process_function()
# takes more than 5 minutes to process
return mydata
Since the super_long_process_function takes more than 5 minutes, it always times out with 504 Gateway Time-out. One thing I want to mention is that this is idle timeout at the TCP level which means that if the connection is idle only and no data transfer happening, only then this timeout is hit. So is there any hack in flask that can be used to prevent this timeout while we process the data because based on my research and reading Microsoft documentation the 230 seconds limit cannot be changed for web apps.

In short: the 230 second timeout, as you stated, cannot be changed.
230 seconds is the maximum amount of time that a request can take without sending any data back to the response. It is not configurable.
Source: GitHub issue
The timeout occurs of there's no response. Keeping the connection open and sending data will not help.
There are a couple of ways you can go about this. Here are two of more possible solutions you could use to trigger your long running tasks without the timeout being an issue.
Only trigger the long running task with an HTTP call, but don't wait for their completion before returning a response.
Trigger the task using a messaging mechanism like Storage Queues or Service Bus.
For updating the web application with the result of the long running task, think along the lines of having the response hold a URL the frontend can call to check for task completion periodically, your request having a callback URL to call when the task has completed or implementing Azure Web PubSub for sending status updates to the client.

Shutdown Flask server after response

I've been reading a lot around different ways to shutdown a Flask app but I don't get it how I could implement something for my use case.
I wrote and am testing a simple Flask app which takes a POST request to create some resources within Google Cloud. This Flask app is deployed into a container and is running on Cloud Run.
My question is, I want to shutdown the app right after a 200 response or would there be a way to handle one request per Cloud Run instance?
app = Flask(__name__)
#app.route('/', methods=['POST'])
def main():
#some validation on the request.json
try:
kick_off_terraform()
return ("Success", 200)
except Exception as e:
print(e)

After doing some research I found out I can control the concurrency on the GCP side, and that way I can allow only one request per instance on Cloud Run.
gcloud run deploy $SERVICE_NAME
--image gcr.io/$GCP_PROJECT_ID/$SERVICE_NAME:$CI_COMMIT_SHORT_SHA
--region=us-east1
--platform managed
--concurrency=1

Sadly hacks like --concurrency=1 or --max-instances=1 are not great because shutting down the server after a request may cause the request to fail. (When I did that in the past, requests failed.)
Based on your question I am guessing you might not have fully grasped Cloud Run runtime behavior. Please note that:
You don't need to "shut down" a container on Cloud Run. It automatically suspends once all requests finish, and you are not even charged for the idle time that occurs outside of a request.
Operations like kick_off_terraform() can't happen in the background (they have to finish before you return the response), because Cloud Run currently doesn't allocate CPU in the background.
What you need is something like "run to completion containers" and you may need to wait a bit for that to be supported by Cloud Run.

Flask restful GET doesn't respond within app

I have a flask restful api with an endpoint
api.add_resource(TestGet, '/api/1/test')
and I want to use the data from that endpoint to populate my jinja template. But everytime I try to call it in a sample route like this
#app.route('/mytest')
def mytest():
t = get('http://localhost:5000/api/1/test')
It never returns anything and stays in a loop meaning it is doing something with the request and never returns. Is there a reason I am not able to call it within the same flask app? I am able to reach the endpoint on the browser and from another python REPL. Thoroughly confused why this would happen and why it never returns anything. At least expecting an error.
Here is the entire sample of what I am trying to run
from flask import Flask
from requests import get
app = Flask('test')
from flask_restful import Api, Resource
api = Api(app)
class TestGet(Resource):
def get(self):
return {'test': 'message'}
api.add_resource(TestGet, '/test')
#app.route('/something')
def something():
resp = get('http://localhost:5000//test').json
print(resp)
from gevent.wsgi import WSGIServer
WSGIServer(('', 5000), app).serve_forever()

Use app.run(threaded=True) if you just want to debug your program. This will start a new thread for every request.

Please see this SO thread with nice explanation of Flask limitations: https://stackoverflow.com/a/20862119/5167302
Specifically, in your case you are hitting this one:
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds).
Hence by making request from within request you are putting your application into deadlock.

Is there a less clunky way to interact with an AWS worker tier?

I have an Elastic Beanstalk application which is running a web server environment and a worker tier environment. My goal is to pass some parameters to an endpoint in the web server which submits a request to the worker which will then go off and do a long computation and write the results to an S3 bucket. For now I'm ignoring the "long computation" part and just writing a little hello world application which simulates the workflow. Here's my Flask application:
from flask import Flask, request
import boto3
import json
application = Flask(__name__)
#application.route("/web")
def test():
data = json.dumps({"file": request.args["file"], "message": request.args["message"]})
boto3.client("sqs").send_message(
QueueUrl = "really_really_long_url_for_the_workers_sqs_queue",
MessageBody = data)
return data
#application.route("/worker", methods = ["POST"])
def worker():
data = request.get_json()
boto3.resource("s3").Bucket("myBucket").put_object(Key = data["file"], Body = data["message"])
return data["message"]
if __name__ == "__main__":
application.run(debug = True)
(Note that I changed the worker's HTTP Path from the default / to /worker.) I deployed this application to both the web server and to the worker, and it does exactly what I expected. Of course, I had to do the usual IAMS configuration.
What I don't like about this is the fact that I have to hard code my worker's SQS URL into my web server code. This makes it more complicated to change which queue the worker polls, and more complicated to add additional workers, both of which will be convenient in production. I would like some code which says "send this message to whatever queue worker X is currently polling". It's obviously not a huge deal, but I thought I would see if anyone knows a way to do this.

Given the nature of the queue URLs, you may want to try keeping them in some external storage (an in-memory database or key-value store, perhaps) that associates the URLs with the IDs of the workers currently using them. That way you can update them as need be without having to modify your application. (The downside would be that you then have [an] additional source[s] of data to maintain and you'd need to write the interfacing code for both the server and workers.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Problem with #response.call_on_close in Google Cloud Run - python

You cannot perform CPU processing after you return a response to the client. Google Cloud Run considers the service request is complete once you return a response. The CPU will be put to sleep for your container until the next request. This link will help: Lifecycle of a container on Cloud Run

Related

Use Multiple Azure Application Insights in one Flask app

How to prevent the 230 seconds azure gateway timeout using python flask for long running work loads

Shutdown Flask server after response

Flask restful GET doesn't respond within app

Is there a less clunky way to interact with an AWS worker tier?

Categories

Resources