reload image/page after computation is complete from the server side - python

I have a python code that performs some fairly intense computations, and then generates a plot (png file) for display on the web. I'm using python, flask, mod_wsgi, and apache. Since the computation takes several seconds (around 10 seconds), I'd like to display a "LOADING" type image while the computation is happening so that the viewer doesn't think the server is messed up, and then the actual image when computations are complete. How can I do this from the server side (not using javascript in the web browser)? In the past I remember seeing a lot of web pages where it seemed like the server was pushing a new page to the browser (from what a recall most it was search engines on message forums). The answer to this question I believe is really an http related question, so it doesn't necessarily have to be specific to serving an image (it could be an html page), or using python, flask, mod_wsgi, or apache, but it would be great if an example could be giving for just that configuration.

Before Javascript I did implement this by generating a page that had a refresh in the HTML header, with a delay of 2-3 seconds.
This page would redisplay itself until the code generating that page noticed that the 'result' was finished then generating different HTML code (without the refresh).
<HEAD>
<META http-equiv="refresh" content="3">
</HEAD>

I'm aware that this question is a bit old now, but I feel like there is not enough information available on this topic. I was struggling with this myself to find a relevant information, so I hope this will help:
Suggested solution
There are different technologies that can be used here, but I believe the simplest would be Server Sent Events. The implementation for Flask can be found here. The last part of the documentation is really important:
Subscribers will connect and block for a long time, so you should seriously consider running under an asynchronous WSGI server, such as gunicorn+gevent
So make sure fulfil the requirement. Also, it's pretty important to understand that this approach is good if you want to send the messages from your server to the client. In case you have an external worker that does the calculations for you this method will only make it more complicated for you, since your server will have to play the role of a middle man between the browser and the worker machine. On some hostings it may even not work as expected (e.g. Heroku - still not sure why it misbehaves, looks like too many updates from the worker and are not propagated properly to the client). In case you use the same host for your app and the workers, you should have no problem though.
Alternate solution
In my opinion this type of calculations belong to background, so this solution assumes that we have some kind of workers doing the job for you (like I had when I first encountered the problem). Note that this solution is not a server->client communication, but it's based on polling. I think this may be the only option if you don't run on the asynchronous server in production.
So let's assume you have a worker which status you can check, for example Iron Worker. The user visits your page and this invokes the calculations on the worker. From this point on you should use AJAX calls to get the status update directly from your worker. What I did in my app, I used jQuery to poll the worker web api and learn about it's status. After you discover that your worker is done, you can just reload the page or just the image or whatever else you need.
Additional information
If you need to update many places at the same time (not only the browser), you can use queue services, for example ironMQ, which allows you to propagate your messages to a special queue, and then subscribe to this queue with a client and receive the messages from it. This is what I did before I discovered I can query the worker directly for it's status.

Related

Regarding GIL in python

I know GIL blocks python from running its threads across cores. If it does so, why python is being used in webservers, how are the companies like youtube, instagram handling it.
PS: I know alternatives like multiprocessing can solve it. But it would be great if anyone can post it with a scenario that was handled by them.
Python is used for server-side handling in webservers, but not (usually) as webserver.
On normal setup: we have have Apache or other webserver to handles a lot of processes (server-side) (python uses usually wsgi). Note usually apache handles directly "static" files. So we have one apache server, many parallel apache processes (to handle connection and basic http) and many python processes which handles one connection per time.
Each of such process are independent each others (they just use the same resources), so you can program your server side part easily, without worrying about deadlocks. It is mostly a trade-off: performance of code, and easy and quickly to produce code without huge problems. But usually webserver with python scale very well (also on large sites), and servers are cheaper then programmers.
Note: security is also increased by having just one request in a process.
GIL exists in CPython, (Python interpreter made in C and most used), other interpreter versions such as Jython or IronPython don't have such problem, because they don't have GIL.
Even though, using CPython you can still have concurrency, just do your thing in C and then "link it" in your Python code, just like Numpy or similar do.
Other thing is, even though you have your page using Flask or Django, when you set up it in a production server, you have an Apache or Nginx, etc which has a real charge balancer (or load balancer, I can't remember the name in english now) that can serve the page to many people at the same time.
Take it from the Flask docs (link):
Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.
[...]
If you want to deploy your Flask application to a WSGI server not listed here, look up the server documentation about how to use a WSGI app with it. Just remember that your Flask application object is the actual WSGI application.
Although a bit late, but I will try to give a generic and useful answer.
#Giacomo Catenazzi's answer is a good one but some part of it is factually incorrect.
API requests (or other form of web requests) are served from an already running process. The creation of this 'already running' process is handled by some webserver like gunicorn which on startup creates specified number of processes that are running the code in your web application continuously waiting to serve any incoming request.
Needless to say, each of these processes are limited by the GIL to only run one thread at a time. But one process in its lifetime handles more than one (normally many) request. Here it would be better if we could understand the flow of a request.
We will take an example of flask but this is applicable to most web frameworks. When a request comes from Nginx, it is handed over to gunicorn which interacts with your web application via wsgi. When the request reaches to the framework, an app context is created and some variables are pushed into the app-context. Then it follows the normal route that mostly people are familiar with: routing, db calls, response creation and so on. The response is then handed back to the gunicorn via wsgi again. At the time of handing over the response, the app context is teared down. So it's the app context, not the process that is created on every new request.
Also, I have talked only about the sync worker in gunicorn but it also has an option of async worker which can handle multiple requests in parallel through coroutines. But thats a separate topic.
So answering your question:
Nginx (Capable of handling multiple requests at a time)
Gunicorn creates a pool of n number of processes at the start and also manages the pool in the sense that if a process exits or gets stuck, it kills/recreates ans adds that to the pool.
Each process handling 1 request at a time.
Read more about gunicorn's design and how it can be used to help you achieve your requirements. This is a good thread about gunicorn with flask understanding. And this is a great resource to understand flask app context

How to monitor the Internet connectivity on two PCs simultaneously?

I have two PCs and I want to monitor the Internet connectivity in both of them and make it available in a page as to whether they're currently online and running. How can I do that?
I'm thinking of a cron job that gets executed every minute that sends a POST to a file located in a server, which in turn would write the connectivity status "online" to a file. In the page where the statuses are displayed, read from both the status files and display whether they're online or not. But this feels like a sloppy idea. What alternative suggestion do you have?
(The answer doesn't necessarily have to be code; I'm not looking for copy-paste solutions. I just want an idea, a nudge in the right directio,)
I would suggest just a GET request (you just need a ping to indicate that the PC is on) sent periodically to maybe a Django server and if you query a page on the Django server, it shows a webpage indicating the status of each.
In the Django server have a loop where the time each GET is received is indicated, if the time between the last GET and current time is too large, set a flag to false.
That flag will later be visible when the URL is queried, via the views.
I don't think this would end up sloppy, just a trivial solution where you don't really have to dig too deep to make it work.
I have used Nagios in the past I like it a lot. It is free and open source. I have used it to monitor several Web, DNS, Mail servers and a proxy. You can check it here: https://www.nagios.com/products/nagioscore

Long-running connection HTTP server (Python)

I am trying to design a web application that processes large quantities of large mixed-media files coming from asynchronous processes. Each process can take several minutes.
The files are either uploaded as a POST body or pulled by the web server according to a source URL provided. The files can be processed by a variety of external tools in a synchronous or asynchronous way.
I need to be able to load balance this application so I can process multiple large files simultaneously for as much as I can afford to scale.
I think Python is my best choice for this project, but beside this, I am open to any solution. The app can either deliver the file back or rely on a messaging channel to notify the clients about the process completion.
Some approaches I thought I might use:
1) Use a non-blocking web server such as Tornado that keeps the connection open until the file processing is done. The external processing command is launched and the web server waits until the file is ready and pipes the resulting IO stream directly back to the web app that returns it. Since the processes sending requests are asynchronous, they might afford to wait (unless memory or some other issues come up).
2) Use a regular web server like Cherrypy (which I am more confident with) and have the webapp use a messaging channel to report the processing progress. The web server returns a HTTP response as soon as it receives the file, validates it and sends it to a background process. At the same time it sends a message notifying the process start. The background process then takes care of delivering the file to an available location and sending another message to the channel notifying the location of the new file. This solution looks more flexible than 1), but requires writing a separate script to handle the messages outside the web application, as well as a separate storage space for the temp files that have to be cleaned up at a certain point.
3) Use some internal messaging capability of any of the webserves mentioned above, which I am not familiar with...
Edit: something like CherryPy's pub-sub engine (http://cherrypy.readthedocs.org/en/latest/extend.html?highlight=messaging#publish-subscribe-pattern) could be a good solution.
Any suggestions?
Thank you,
gm
I had a similar situation come up with a really large scale data processing engine that my team implemented. We wanted to build our api calls in Flask, some of which can take many hours to complete, but have a way to notify the user in real time what is going on.
Basically what I came up with is was what you described as option 2. On the same machine that I am serving the flask app through apache, I created a tornado app that serves up a websocket that reports progress to the end user. Once my main page is served, it establishes the websocket connection to the tornado server, and the flask app periodically sends updates to the tornado app, and down to the end user. Even if the browser is closed during the long running application, apache keeps the request alive and processing, and upon logging back in, I can still see the current progress.
I wrote about this solution in some more detail here:
http://jonfeatherstone.com/2013/08/01/mongo-and-websockets-for-application-logging/
Good luck!

Compound custom service server using Twisted

I have an interesting project going on at our workplace. The task, that stands before us, is such:
Build a custom server using Python
It has a web server part, serving REST
It has a FTP server part, serving files
It has a SMTP part, which receives mail only
and last but not least, a it has a background worker that manages lowlevel file IO based on requests received from the above mentioned services
Obviously the go to place was Twisted library/framework, which is an excelent networking tool. However, studying the docs further, a few things came up that I'm not sure about.
Having Java background, I would solve the task (at least at the beginning) by spawning a separate thread for each service and going from there. Being in Python however, I cannot do that for any reasonable purpose as Python has GIL. I'm not sure, how Twisted handles this. I would expect, that Twisted has large (if not majority) code written in C, where GIL is not the issue, but that I couldn't find the docs explained to my satisfaction.
So the most oustanding question is: Given that Twisted uses Reactor as it's main design pattern, will it be able to:
Serve all those services needed
Do it in a non-blocking fashion (it should, according to docs, but if someone could elaborate, I'd be grateful)
Be able to serve about few hundreds of clients at once
Serve large file downloads in a reasonable way, meaning that it can serve multiple clients, using multiple services, downloading and uploading large files.
Large files being in the order of hundres of MB, or few GB. The size is not important, it's the time that the client has to stay connected to the server that matters.
Edit: I'm actually inclined to go the way of python multiprocessing, but not sure, whether that's a correct thing to do with Twisted etc.
Serve all those services needed
Yes.
Do it in a non-blocking fashion (it should, according to docs, but if someone could elaborate, I'd be grateful)
Twisted's uses the common reactor model. I/O goes through your choice of poll, select, whatever to determine if data is available. It handles only what is available, and passes the data along to other stages of your app. This is how it is non-blocking.
I don't think it provides non-blocking disk I/O, but I'm not sure. That feature not what most people need when they say non-blocking.
Be able to serve about few hundreds of clients at once
Yes. No. Maybe. What are those clients doing? Is each hitting refresh every second on a browser making 100 requests? Is each one doing a numerical simulation of galaxy collisions? Is each sending the string "hi!" to the server, without expecting a response?
Twisted can easily handle 1000+ requests per second.
Serve large file downloads in a reasonable way, meaning that it can serve multiple clients, using multiple services, downloading and uploading large files.
Sure. For example, the original version of BitTorrent was written in Twisted.

Streaming the result of a command back to the browser using Twisted and Comet

I'm writing an application that streams the output (by this I mean both sys.stdout and sys.stderr) of a python script excited on the server, in real time to the browser.
The users on the site will be allowed to select the script to run, excite and kill their chosen script, and change some parameters, so I will need a different thread per user on the site (user A can start, stop and change a script, whilst user B can do the same with a different script).
I know I need to use comet for the web clients, and seeing as the rest of the project is written in python, I'd like to use twisted for the server, however I'm not really sure of what I need to do next!
There are a daunting number of options (Divmod Mantissa, Divmod Nevow, twisted.web, STOMP, etc), and some are better documented that others, making the whole thing rather tricky!
I have a working demo using stompservice on orbited, using Orbited.TCPSocket for the javascript side of things, however I'm starting to think that STOMPs channel model isn't going to work for multithreading, multi-running scripts (unless I open a new channel per run, but that seems like the wrong use of the channel model).
Can anyone point me in the right direction, or some sample code I can learn from?
Thanks!
Nevow Athena is a framework specifically for AJAX and COMET applications and in theory is exactly the sort of thing you are looking for.
However, I am not sure that it is well used or supported at this time - looking at mailing list traffic and google search results suggests that it may not be.
There are a couple of tutorials you could look at to help you decide on it:
one on the 'official' site: http://divmod.org/trac/wiki/DivmodNevow/Athena/Tutorials/LiveElement
and one other that I found:
http://divmodsphinx.funsize.net/nevow/chattutorial/part01/index.html
The code for the latter seems to be included in the Nevow distribution when you download it under /doc/listings/partxx (I think...)
You can implement a very simple "HTTP streaming" by keeping the http connection open and appending javascript chunks that update the dom contents. This works since the browser evaluates the "script" chunks as they arrive.
I wrote a blog entry a while ago with a running example using twisted and very few lines of javascript: Simple HTTP streaming with Twisted & Javascript
You can easily mix this pattern with a publisher/subscriber pattern to make it multiuser, etc. I use this pattern to watch live log streams via web.
An example of serving for long-polling clients with Twisted is slosh. This might not be what you want, but because it's not a large framework, it can help you figure out how to use Twisted.

Categories