Improving the user experiense of a slow Flask view

Improving the user experiense of a slow Flask view - python

I have an anchor tag that hits a route which generates a report in a new tab. I am lazyloading the report specs because I don't want to have copies of my data in the original place and on the report object. But collecting that data takes 10-20 seconds.
from flask import render_template
#app.route('/report/')
#app.route('/report/<id>')
def report(id=None):
report_specs = function_that_takes_20_seconds(id)
return render_template('report.html', report_specs=report_specs)
I'm wondering what I can do so that the server responds immediately with a spinner and then when function_that_takes_20_seconds is done, load the report.

You are right: a HTTP view is not a place for a long running tasks.
You need think your system architecture: what you can prepare outside the view and what computations must happen real time in a view.
The usual solutions include adding asynchronous properties and processing your data in a separate process. Often people use schedulers like Celery for this.
Prepare data in a scheduled process which runs for e.g. every 10 minutes
Cache results by storing them in a database
HTTP view always returns the last cached version
This, or then make a your view to do an AJAX call via JavaScript ("the spinner approach"). This requires obtaining some basic JavaScript skills. However this doesn't make the results appear to an end user any faster - it's just user experience smoke and mirrors.

Related

My Azure Function in Python v2 doesn't show any signs of running, but it probably is

I have a simple function app in Python v2. The plan is to process millions of images, but right I just want to make the scaffolding right, i.e. no image processing, just dummy data. So I have two functions:
process with an HTTP trigger #app.route, this inserts 3 random image URLs to the Azure Queue Storage,
process_image with a Queue trigger #app.queue_trigger, that processes one image URL from above (currently only logs the event).
I trigger the first one with curl request and as expected, I can see the invocation in the Azure portal in the function's invocation section and I can see the items in the Storage Explorer's queue.
But unexpectedly, I do not see any invocations for the second function, even though after a few seconds the items disappear from the images queue and end up in the images-poison queue. So this means that something did run with the queue items 5 times. I see the following warning in the application insights checking traces and exceptions:
Message has reached MaxDequeueCount of 5. Moving message to queue 'case-images-deduplication-poison'.
Can anyone help with what's going on? Here's the gist of the code.

If I was to guess, something else is hitting that storage queue, like your dev machine or another function, can you put logging into the second function? (sorry c# guy so I don't know the code for logging)
Have you checked the individual function metric, in the portal, Function App >> Functions >> Function name >> overview >> Total execution Count and expand to the relevant time period?
Do note that it take up to 5 minutes for executions to show but after that you'll see them in the metrics

Django : Stop calling same view if it's already called

I have a view, and when user click on a button it gets called. But if a user clicks the button twice, it gets called for another even if the first is still executing or running.
This produces a problem (if I am correct), and that is: it stops execution for the first one and starts executing for the other. How can I stop this calling of views twice?

When a request has been sent to the server, Django will pick it up and send a response when it's finished. if there is no one there to receive the response then nothing happens but the processes have been done already. You should ask for verification if your process is important to be received.
When the user sends the second request, both requests will be processed and will return a response. If you're working with an API and your frontend isn't rendered in the backend server then the user will probably receive both responses and depending on the code on the front side, many things can happen. they might see both responses or just one. or the first one might get updated instantly when the second response comes in.
This might lead to problems depending on the processes you are running. for example, if you are updating a record in a database and the update can only happen once, then probably you'll get an error in the second try. it is really really rare and mostly impossible that the update part for both requests happens in the exact same time depending on your database, your code, and many other things.
There are many ways to handle such a problem which most of them depends on many things but I'm gonna list you few of these options:
1 - Change the display for the button to none so the user won't be able to click on it again. This is a very simple solution and works most of the times unless you have users trying to hurt your system intentionally.
2 - Redirect the user to another page and then wait for a response there.
3 - If your view is doing some heavy processes which are expensive to run, then just create a queue system with some limitations for each user. This is usually done in scaled projects with a lot of users.
4 - Use a rate limiting system to deny too many requests at once or block any none normal traffic.

my advice, use jquery
$("btnSubmit").click(function(event){
event.preventDefault();
//disable the submit button
$("#btnSubmit").attr("disabled", true);
// call ur view using ajax here
});
and then after ajax ended activate the button
$("#btnSubmit").attr("disabled", false);

How to continuously read data from xively in a (python) heroku app?

I am trying to write a Heroku app in python which will read and store data from a xively feed in real time. I want the app to run independently as a sort of 'backend process' to simply store the data in a database. (It does not need to 'serve up' anything for users (for site visitors).)
Right now I am working on the 'continuous reading' part. I have included my code below. It simply reads the datastream once, each time I hit my app's Heroku URL. How do I get it to operate continuously so that it keeps on reading the data from xively?
import os
from flask import Flask
import xively
app = Flask(__name__)
#app.route('/')
def run_xively_script():
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
return "level is %s" %(level)
I am new to web development, heroku, and python... I would really appreciate any help(pointers)
{
PS:
I have read about Heroku Scheduler and from what I understand, it can be used to schedule a task at specific time intervals and when it does so, it starts a one-off dyno for the task. But as I mentioned, my app is really meant to perform just one function->continuously reading and storing data from xively. Is it necessary to schedule a separate task for that? And the one-off dyno that the scheduler will start will also consume dyno hours, which I think will exceed the free 750 dyno-hours limit (as my app's web dyno is already consuming 720 dyno-hours per month)...
}

Using the scheduler, as you and #Calumb have suggested, is one method to go about this.
Another method would be for you to setup a trigger on Xively. https://xively.com/dev/docs/api/metadata/triggers/
Have the trigger occur when your feed is updated. The trigger should POST to your Flask app, and the Flask app can then take the new data, manipulate it and store it as you wish. This would be the most near realtime, I'd think, because Xively is pushing the update to your system.

This question is more about high level architecture decisions and what you are trying to accomplish than a specific thing you should do.
Ultimately, Flask is probably not the best choice for an app to do what you are trying to do. You would be better off with just pure python or pure ruby. With that being said, using Heroku scheduler (which you alluded to) makes it possible to do something like what you are trying to do.
The simplest way to accomplish your goal (assuming that you want to change minimal amount of code and that constantly reading data is really what you want to do. Both of which you should consider) is to write a loop that runs when you call that task and grabs data for a few seconds. Just use a for loop and increment a counter for however many times you want to get the data.
Something like:
for i in range(0,5):
key = 'FEED_KEY'
feedid = 'FEED_ID'
client = xively.XivelyAPIClient(key)
feed = client.feeds.get(feedid)
datastream = feed.datastreams.get("level")
level = datastream.current_value
time.sleep(1)
However, Heroku has limits on how long something can run before it returns a value. Otherwise the router will return a 503 or 500. But you could use the scheduler to then schedule this to run every certain amount of time.
Again, I think that Flask and Heroku is not the best solution for what it sounds like you are trying to do. I would review your use case and go back to the drawing board on what the best method to accomplish it our.

GAE Backend fails to respond to start request

This is probably a truly basic thing that I'm simply having an odd time figuring out in a Python 2.5 app.
I have a process that will take roughly an hour to complete, so I made a backend. To that end, I have a backend.yaml that has something like the following:
-name: mybackend
options: dynamic
start: /path/to/script.py
(The script is just raw computation. There's no notion of an active web session anywhere.)
On toy data, this works just fine.
This used to be public, so I would navigate to the page, the script would start, and time out after about a minute (HTTP + 30s shutdown grace period I assume, ). I figured this was a browser issue. So I repeat the same thing with a cron job. No dice. Switch to a using a push queue and adding a targeted task, since on paper it looks like it would wait for 10 minutes. Same thing.
All 3 time out after that minute, which means I'm not decoupling the request from the backend like I believe I am.
I'm assuming that I need to write a proper Handler for the backend to do work, but I don't exactly know how to write the Handler/webapp2Route. Do I handle _ah/start/ or make a new endpoint for the backend? How do I handle the subdomain? It still seems like the wrong thing to do (I'm sticking a long-process directly into a request of sorts), but I'm at a loss otherwise.

So the root cause ended up being doing the following in the script itself:
models = MyModel.all()
for model in models:
# Magic happens
I was basically taking for granted that the query would automatically batch my Query.all() over many entities, but it was dying at the 1000th entry or so. I originally wrote it was computational only because I completely ignored the fact that the reads can fail.
The actual solution for solving the problem we wanted ended up being "Use the map-reduce library", since we were trying to look at each model for analysis.

Progress bar with long web requests

In a django application I am working on, I have just added the ability to archive a number of files (starting 50mb in total) to a zip file. Currently, i am doing it something like this:
get files to zip
zip all files
send HTML response
Obviously, this causes a big wait on line two where the files are being compressed. What can i do to make this processes a whole lot better for the user? Although having a progress bar would be the best, even if it just returned a static page saying 'please wait' or whatever.
Any thoughts and ideas would be loved.

You should keep in mind showing the progress bar may not be a good idea, since you can get timeouts or get your server suffer from submitting lot of simultaneous requests.
Put the zipping task in the queue and have it callback to notify the user somehow - by e-mail for instance - that the process has finished.
Take a look at django-lineup
Your code will look pretty much like:
from lineup import registry
from lineup import _debug
def create_archive(queue_id, queue):
queue.set_param("zip_link", _create_archive(resource = queue.context_object, user = queue.user))
return queue
def create_archive_callback(queue_id, queue):
_send_email_notification(subject = queue.get_param("zip_link"), user = queue.user)
return queue
registry.register_job('create_archive', create_archive, callback = create_archive_callback)
In your views, create queued tasks by:
from lineup.factory import JobFactory
j = JobFactory()
j.create_job(self, 'create_archive', request.user, your_resource_object_containing_files_to_zip, { 'extra_param': 'value' })
Then run your queue processor (probably inside of a screen session):
./manage.py run_queue
Oh, and on the subject you might be also interested in estimating zip file creation time. I got pretty slick answers there.

Fun fact: You might be able to use a progress bar to trick users into thinking that things are going faster than they really are.
http://www.chrisharrison.net/projects/progressbars/index.html

You could use a 'log-file' to keep track of the zipped files, and of how many files still remain.
The procedural way should be like this:
Count the numbers of file, write it in a text file, in a format like totalfiles.filespreocessed
Every file you zip, simply update the file
So, if you have to zip 3 files, the log file will grown as:
3.0 -> begin, no file still processed
3.1 -> 1 file on 3 processed, 33% task complete
3.2 -> 2 file on 3 processed, 66% task complete
3.3 -> 3 file on 3 processed, 100% task complete
And then with a simple ajax function (an interval) check the log-file every second.
In python, open, read and rite a file such small should be very quick, but maybe can cause some requests trouble if you'll have many users doing that in the same time, but obviously you'll need to create a log file for each request, maybe with rand name, and delete it after the task is completed.
A problem could be that, for let the ajax read the log-file, you'll need to open and close the file handler in python every time you update it.
Eventually, for a more accurate progress meter, you culd even use the file size instead of the number of file as parameter.

Better than a static page, show a Javascript dialog (using Shadowbox, JQuery UI or some custom method) with a throbber ( you can get some at hxxp://www.ajaxload.info/ ). You can also show the throbber in your page, without dialogs. Most users only want to know their action is being handled, and can live without reliable progress information ("Please wait, this could take some time...")
JQUery UI also has a progress bar API. You could make periodic AJAX queries to a didcated page on your website to get a progress report and change the progress bar accordingly. Depending on how often the archiving is ran, how many users can trigger it and how you authenticate your users, this could be quite hard.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.