I need to perform a task whenever the mobile app requests certain data. The user does not need the task performed right away, but may need it within the next 2 minutes.
I am still fairly new to Python / web dev so I am not quite sure how to accomplish this.
I don't want the user to wait for the task performed, it'll probably take 30 seconds, but I'd still it rather be 30 seconds faster.
Is there anyway that I can send a response, so that the user gets the required info immediately, and then the task is performed right after sending the JSON.
Is it possible to send a Response to the mobile app that asked for the data without using return so that the method can continue to perform the task the user does not need to wait for?
#app.route('/image/<image_id>/')
def images(image_id):
# get the resource (unnecessary code removed)
return Response(js, status=200, mimetype='application/json')
# once the JSON response is returned, do some action
# (what I would like to do somehow, but don't know how to get it to work
On second thought maybe I need to do this action somehow asynchronously so it does not block the router (but it still needs to be done right after returning the JSON)
UPDATE - in response to some answers
For me to perform such tasks, is a Worker server on Heroku recommended / a must or is there another, cheaper way to do this?
you can create a second thread to do the extra work :
t = threading.Thread(target=some_function, args=[argument])
t.setDaemon(False)
t.start()
you should also take a look at celery or python-rq
Yes, you need a task queue. There are a couple of options.
Look at this other question: uWSGI for uploading and processing files
And of course your code is wrong since once you return your terminating code execution of that function you're in.
Related
I try to make one micro server,
#serv.route('/booking', methods=['POST'])
def booking():
Do job A
Do job B
etc...
return redirect('/direct_site')
if Job A, Job B connect to google API, my server have to do A, B first, after that do Redirect.
So how to make it faster, example.
#serv.route('/booking', methods=['POST'])
def booking():
redirect('/direct_site')
return Do job A, do job C, etc..
Two options:
Redirect first, then do the jobs.
The main issue here is that once you do the redirect, you lose the
channel to the user. That means that if job A or job B has some sort
of error (or, indeed, success), you will no longer be able to
directly display it to the user; you will instead need to track it
elsewhere, and if necessary display it to the user later (or to
yourself as the site admin).
With that in mind, the new code will probably look something like this:
#serv.route('/booking', methods=['POST'])
def booking():
put job A on the queue
put job B on the queue
return redirect('/direct_site')
Elsewhere, you'll have something handling the queue (which may be in memory, database or a dedicated queueing system), other pages for the user and/or yourself to check on the status of jobs, etc. It will no longer really be "micro"...
Do the jobs in parallel, then redirect.
This will be a lot simpler, but the redirect will still need to wait for the longest of the jobs to complete. You will be able to collect the results before redirecting the user, which means you'll be able to report any errors directly and only redirect the user to the "success" page if the jobs were, in fact, successful.
There are a number of ways to do the jobs in parallel (threads, futures, async). Which one you choose depends on what you're using elsewhere in the code and/or familiar with, and what sort of work is done in the jobs. If they're mostly API calls across the network, with little processing on your end, any of them will do the job. Futures will be easiest if each job is a single API call, threads if some of the jobs involve multiple API calls which depend on each other.
I have a particularly large task that takes ~60 seconds to complete. Heroku routers send a timeout error after 30 seconds if nothing is returned, so using a yield statement helps solve that:
def foo():
while not isDone:
print("yield")
yield " "
time.sleep(10)
return Response(foo(), mimetype='text/html')```
(or something similar)
And that works all well and good, except in my case, at the end of my very long task, it makes a decision on where to 302 forward next. It's easy enough to set a forwarding location:
response = Response(foo(), 302, mimetype='text/html')
response.headers['Location'] = '/bar'
return response
except that in this example /bar is static, and I need to assign that dynamically, and only at the end of the very long process.
So is there a way to dynamically assign the forwarding location at the end of the very long async process?
Making sure I'm interpreting this correctly. You have a request that generates something and issues a redirect. The generation takes a long time, triggering the Heroku timeout. You want to get around the time out, somehow.
Can you do this with Flask alone, with the constraint of the Heroku routing tier? Sort of. Note, though, that the simple "keep alive" you are doing fails because it results in an invalid HTTP response. Once you start sending anything that is not a header, you can't send a header (i.e., the redirect).
Your two options -
Polling. Launch an async job, pre-calculate the URL for the job result, but have it guarded by a "monitor" of some kind (e.g., check the job and display "In progress", refreshing every 2-5 seconds until it's done). You have a worker dyno that is used to calculate the result. Ordinarily, I'd say "Redis + Python RQ" for a fast start, but if you can't add any new server-side dependencies, a simple database queue table could suffice.
Pushing. Use an add-on like Pusher. No new server-side dependencies, just an account (which has a low-cost entry option) If that's not an option, roll a WebSocket-based solution.
In general, spending some time to do a good async return will pay off in the long run. It's one of the single best performance enhancements to make to any site - return fast, give the user a responsive experience. You do have to spawn the async task in another process or thread, in order to free up request threads for other responses.
I'm writing a small web server using Flask that needs to do the following things:
On the first request, serve the basic page and kick off a long (15-60 second) data processing task. The data processing task queries a second server which I do not control, updates a local database, and then performs some calculations on the results to show in the web page.
The page issues several AJAX requests that all depend on parts of the result from the long task, so I need to wait until the processing is done.
Subsequent requests for the first page would ideally re-use the previous request's result if they come in while the processing task is ongoing (or even shortly thereafter)
I tried using flask-cache (specifically SimpleCache), but ran into an issue as it seems the cache pickles the result, when I'd really rather keep the exact object.
I suppose I could re-write what I'm caching to be pickle-able, and then implement a single worker thread to do the processing.
Is there some more better way of handling this kind of workflow?
I think best way for long data processing is something like Celery.
Send request to run task and receive task ID.
Periodically send ajax requests to check task progress and receive result of task execution.
Most of the longest (most time-consuming) logic I've encountered basically involves two things: sending email and committing items to the database.
Is there any kind of built-in mechanism for doing these things asynchronously so as not to slow down page load?
Validation should be handled synchronously, but it really seems that the most performant way to email and write to the database should be asynchronously.
For example, let's say that I want to track pageviews. Thus, every time I get a view, I do:
pv = PageView.objects.get(page = request.path)
pv.views = pv.views + 1
pv.save() # SLOWWWWWWWWWWWWWW
Is it natural to think that I should speed this up by making the whole process asynchronous?
Take a look at Celery. It gives you asynchronous workers to offload tasks exactly like you're asking about: sending e-mails, counting page views, etc. It was originally designed to work only with Django, but now works in other environments too.
I use this pattern to update the text index (which is slow), since this can be done in background. This way the user sees a fast response time:
# create empty file
dir=os.path.join(settings.DT.HOME, 'var', 'belege-changed')
file=os.path.join(dir, str(self.id))
fd=open(file, 'a') # like "touch" shell command
fd.close()
A cron-job scans this directory every N minutes and updates the text index.
In your case, I would write the request.path to a file, and update the PageView model in background. This would improve the performance, since you don't need to hit the database for every increment operator.
You can have a python ThreadPool and assign the writes to the database. Although GIL prevent the Python threads to work concurrently this allow to continue the response flow before the write is finished.
I use this technique when the result of the write is not important to render the response.
Of course, if you want to post request and want to return a 201, this is not a god practice.
http://www.mongodb.org/ can do this.
Sometimes, with requests that do a lot, Google AppEngine returns an error. I have been handling this by some trickery: memcaching intermediate processed data and just requesting the page again. This often works because the memcached data does not have to be recalculated and the request finishes in time.
However... this hack requires seeing an error, going back, and clicking again. Obviously less than ideal.
Any suggestions?
inb4: "optimize your process better", "split your page into sub-processes", and "use taskqueue".
Thanks for any thoughts.
Edit - To clarify:
Long wait for requests is ok because the function is administrative. I'm basically looking to run a data-mining function. I'm searching over my datastore and modifying a bunch of objects. I think the correct answer is that AppEngine may not be the right tool for this. I should be exporting the data to a computer where I can run functions like this on my own. It seems AppEngine is really intended for serving with lighter processing demands. Maybe the quota/pricing model should offer the option to increase processing timeouts and charge extra.
If interactive user requests are hitting the 30 second deadline, you have bigger problems: your user has almost certainly given up and left anyway.
What you can do depends on what your code is doing. There's a lot to be optimized by batching datastore operations, or reducing them by changing how you model your data; you can offload work to the Task Queue; for URLFetches, you can execute them in parallel. Tell us more about what you're doing and we may be able to provide more concrete suggestions.
I have been handling something similar by building a custom automatic retry dispatcher on the client. Whenever an ajax call to the server fails, the client will retry it.
This works very well if your page is ajaxy. If your app spits entire HTML pages then you can use a two pass process: first send an empty page containing only an ajax request. Then, when AppEngine receives that ajax request, it outputs the same HTML you had before. If the ajax call succeeds it fills the DOM with the result. If it fails, it retries once.