django-plotly-dash multi session on CPU intensive pages

django-plotly-dash multi session on CPU intensive pages - python

Running django-plotly-dash, I have multiple python pages. The issue is that when I am loading one of the pages while it is running some calculations, I can not run the same page or other pages from a different session, and the webserver is not responding for other users. If I look at the runserver output, it is busy rendering the first request only.

If I look at the runserver output, it is busy rendering the first request only.
If I understand correctly, this means you use Django's development server, and thus that you are in development (if you use django-admin runserver in production, that's a serious issue).
Now’s a good time to note: don’t use this server in anything resembling a production environment. It’s intended only for use while developing. (We’re in the business of making web frameworks, not web servers.)
Django's development server is supposed to be multithreaded and support concurrent requests. However, from my experience I also noticed that it can handle only one request at a time. I didn't dig too much into it but I assume it might be caused by an app that overrides the runserver command and disables multithreading.
In development this shouldn't be too much of an issue. And in production you won't suffer this kind of blocks as real WSGI servers such as gunicorn will be able to handle several concurrent requests (provided it is configured to use the available resources correctly, and the hardware is able to handle the load).
However if your pages are actually slow to respond, this can be an issue for the user loading the page, and will also require more resources to handle more concurrent requests. It all depends if "slow" means 2 seconds, 5 seconds, 30 seconds or even more. Reducing the response time will depend a lot on the bottleneck of your code and could include:
Optimizing the algorithms
Reducing and optimizing SQL queries (See Database access optimization)
Delaying to Celery the calculations that do not affect the response
Using websockets to flow and display the data as they get calculated without blocking the client until the whole page is calculated. (See django-channels)
Using asyncio to avoid staying idle while waiting for I/O operations.

Related

Are Heroku instances persistent? (Or, can I use dict/array as a cache?)

So my friend told me that instances on Heroku are persistent (I'm not sure if the vocab is right, but he implied that all users share the same instance).
So, if I have app.py, and an instance runs it, then all users share that instance. That means we can use a dict as a temporary cache for storing small things for faster response time.
So for example, if I'm serving an API, I can maybe define a cache like this and then use it.
How true is that? I tried looking this up, but could not find anything.
I deployed the linked API to heroku on 1 dyno, and with just a few requests per second, it was taking over 100 seconds to serve it. So my understanding is that the cache wasn't working. (It might be useful to note here that majority of time was due to request queueing, according to new relic.)

The Heroku Devcenter has several articles about the Heroku architecture.
Processes don't share memory. Moreover, your code is compiled into a slug and optimized for distribution to the dyno manager. In simple words, it means you don't even know which machine will execute your code. Theoretically, 5 users hitting your app may be routed to 5 different machines and processes.
Last but not least, keep in mind that if your app has only a single web dyno running, that web dyno will sleep. You have to have more than one web dyno to prevent web dynos from sleeping. When the dyno enter the sleep mode, the memory is released and you will loose all the data in memory.
This means that your approach will not work.
Generally speaking, in Heroku you should use external storages. For example, you can use the Memcached add-on and store your cache information in Memcached.
Also note you should not use the file system as cache. Not only because it's slower than Memcached, but also because the Cedar stack file system should be considered ephemeral.

Whats are the options to transform a event-driven tornado application into a threaded or otherwise blocking system?

I have inherited a rather large code base that utilizes tornado to compute and serve big and complex data-types (imagine a 1 MB XML file). Currently there are 8 instances of tornado running to compute and serve this data.
That was a wrong design-decision from the start and I am facing many many timeouts from applications that access the servers.
I'd like to change as few lines of code as possible in the legacy code base because I do not want to break anything that has already been tested in the field. What can I do to transform this system into a threaded one that can execute more xml-computation in parallel?

transform this system into a threaded one that can execute more xml-computation in parallel
If there are enough Tornado instances to saturate the computational resources, moving to a threaded model will probably not gain much performance. Getting rid of blocking code however helps with connection timeouts.
Another option is getting rid of all asynchronous code and using tornado.wsgi.WSGIApplication. That way, you can run the application on a threaded WSGI server. Features that are not available in WSGI mode are listed here.

Use Tornado to just receive non-blocking requests. To do the actual XML processing you can then spawn another process or use an async task processor like celery. Using celery would facilitate easy scaling of your system in future. In fact with this model you'll just need one Tornado instance.
#Eren - I don't think that the computational resources are getting saturated. It would just be that more than 8 requests are not getting processed simultaneously as Tornado would right now be serving requests in blocking mode.

Using Django Caching with Mod_WSGI

Is anyone aware of any issues with Django's caching framework when deployed to Apache/Mod_WSGI?
When testing with the caching framework locally with the dev server, using the profiling middleware and either the FileBasedCache or LocMemCache, Django's very fast. My request time goes from ~0.125 sec to ~0.001 sec. Fantastic.
I deploy the identical code to a remote machine running Apache/Mod_WSGI and my request time goes from ~0.155 sec (before I deployed the change) to ~.400 sec (post deployment). That's right, caching slowed everything down.
I've spent hours digging through everything, looking for something I'm missing. I've tried using FileBasedCache with a location on tmpfs, but that also failed to improve performance.
I've monitored the remote machine with top, and it shows no other processes and it has 6GB available memory, so basically Django should have full rein. I love Django, but it's incredibly slow, and so far I've never been able to get the caching framework to make any noticeable impact in a production environment. Is there anything I'm missing?
EDIT: I've also tried memcached, with the same result. I confirmed memcached was running by telneting into it.

Indeed django is slow. But I must say most of the slowness goes from app itself.. django just forces you (bu providing bad examples in docs) to do lazy thing that are slow in production.
First of: try nginx + uwsgi. it is just the best.
To optimize you app: you need to find you what is causing slowness, it can be:
slow database queries (a lot of queries or just slow queries)
slow database itself
slow filesystem (nfs for example)
Try logging request queries and watch iostat or iotop or something like that.
I had this scenario with apache+mod_wsgi: first request from browser was very slow... then a few request from same browser were fast.. then if sat doing nothing for 2 minutes - wgain very slow. I don`t know if that was improperly configured apache if it was shutting down wsgi app and starting for each keepalive request. It just posted me off - I installed nging and with nginx+fgxi all was a lot faster than apache+mod_wsgi.

I had a similar problem with an app using memcached. The solution was running mod_wsgi in daemon mode instead of embeded mode, and Apache in mpm_worker mode. After that, application is working much faster.

Same thing happened to me and was wondering what is that is taking so much time.
each cache get was taking around 100 millisecond.
So I debugged the code django locmem code and found out that pickle was taking a lot of time (I was caching a whole table in locmemcache).
I wrapped the locmem as I didn't wanted anything advanced, so even if you remove the pickle and unpickle and put it. You will see a major improvement.
Hope it helps someone.

Is there a better way to serve the results of an expensive, blocking python process over HTTP?

We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.

Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.

Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.

You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.

Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).

It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)

CherryPy for a webhosting control panel application

For quite a long time I've wanted to start a pet project that will aim in
time to become a web hosting control panel, but mainly focused on Python hosting --
meaning I would like to make a way for users to generate/start Django/
other frameworks projects right from the panel. I seemed to have
found the perfect tool to build my app with it: CherryPy.
This would allow me to do it the way I want, building the app with its own HTTP/
HTTPS server and also all in my favorite programming language.
But now a new question arises: As CherryPy is a threaded server, will
it be the right for this kind of task?
There will be lots of time consuming tasks so if one of the
tasks blocks, the rest of the users trying to access other pages will
be left waiting and eventually get timed out.
I imagine that this kind of problem wouldn't happen on a fork based server.
What would you advise?

"Threaded" and "Fork based" servers are equivalent. A "threaded" server has multiple threads of execution, and if one blocks then the others will continue. A "Fork based" server has multiple processes executing, and if one blocks then the others will continue. The only difference is that threaded servers by default will share memory between the threads, "fork based" ones by default will not share memory.
One other point - the "subprocess" module is not thread safe, so if you try to use it from CherryPy you will get wierd errors. (This is Python Bug 1731717)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.