I am designing a web interface to a certain hardware appliance that provides its own custom API. Said web interface can manage multiple appliances at once. The data is retrieved from appliance through polling with the custom API so it'd be preferable to make it asynchronous.
The most obvious thing is to have a poller thread that polls for data, saves into a process wide singleton with semaphores and then the web server threads will retrieve data from said singleton and show it. I'm not a huge fan of singletons or mashed together designs, so I was thinking of maybe separating the poller datasource from the web server, looping it back on the local interface and using something like XML-RPC to consume data.
The application need not be 'enterprisey' or scalable really since it'll at most be accessed by a couple people at a time, but I'd rather make it robust by not mixing two kinds of logic together. There's a current implementation in python using CherryPy and it's the biggest mishmash of terrible design I've ever seen. I feel that if I go with the most obvious design I'll just end up reimplementing the same horrible thing my own way.
If you use Django and celery, you can create a Django project to be the web interface and a celery job to run in the background and poll. In that job, you can import your Django models so it can save the results of the polling very simply.
Related
I work on a project for our clients which is heavily ML based and is computationally intensive (as in complex and multi-level similarity scores, NLP, etc.) For the prototype, we delivered a Django RF where the API would have access to a database from the client and with every request at specific end-points it would literally do all the ML applications on the fly (in the backend).
Now that we are scaling and more user activity is taking place in the production, the app seems to lag a lot. A simple profiling shows that a single POST request could take upto 20 secs to respond. So no matter how much I optimize in terms of horizontal scaling, I can't get rid of the bottleneck of all the calculations happening with the API calls. I have a hunch that caching could be a kind of solution. But I am not sure. I can imagine a lot of 'theoretical' solutions but I don't want to reinvent the wheel (or shall I say, re-discover the wheel).
Are there specific design architectures for ML or computationally intensive REST API calls that I can refer to in redesigning my project?
Machine learning & natural language processing systems are often resource-hungry and in many cases there is not much one can do about it directly. Some operations simply take longer than others but this is actually not the main problem in your case. The main problem is that the user doesn't get any feedback while the backend does its job which is not a good user experience.
Therefore, it is not recommended to perform resource-heavy computation within the traditional HTTP request-response cycle. Instead of calling the ML logic within the API view and waiting for it to finish, consider setting up an asychronous task queue to perform the heavy lifting independently of the synchronous request-response cycle.
In the context of Django, the standard task queue implementation would be Celery. Setting it put will require some learning and additional infrastructure (e.g. a Redis instance and worker servers), but there is really no other way to not to break the user experience.
Once you have set up everything, you can then start an asynchronous task whenever your API endpoint receives a request and immediately inform the user that their request is being carried out via a normal view response. Once the ML task has finished and its results have been written to the database (using a Django model, of course), you can then notify the user (e.g. via mail or directly in the browser via WebSockets) to view the analysis results in a dedicated results view.
I want to create 2 applications in Python which should communicate with each other. One of these application should behave like a server and the second should be the GUI of a client. They could be run on the same system(on the same machine) or remotely and on different devices.
I want to ask you, which technology should I use - an AMQP messaging (like RabbitMQ), Twisted like server (or Tornado) or ZeroMQ and connect applications to it. In the future I would like to have some kind of authentication etc.
I have read really lot of questions and articles (like this one: Why do we need to use rabbitmq), and a lot of people are telling "rabbitmq and twisted are different". I know they are. I really love to know the differences and why one of these solutions will be superior than the other in this case.
EDIT:
I want to use it with following requirements:
There will be more than 1 user connected at a time - I think there will be 1 - 10 users connected to the same program and they would work collaboratively
The data send are "messages" telling what user did - something like remote calls (but don't focus on that, because the GUIS can be written in different languages, so the messages will be something like json informations).
The system should allow for collaborative work - so it should be as interactive as possible. (data will be send all the time when user something types or performs some action).
Additional I would love to hear why one solution would be better than the other not only in this particular case.
Twisted is used to solve the C10k networking problem by giving you asynchronous networking through the Reactor Pattern. Its also convenient because it provides a nice concurrency abstraction as threading/concurrency in Python is not as easy as say Erlang. Consequently some people use Twisted to dispatch work tasks but thats not what it is designed for.
RabbitMQ is based on the message queue pattern. Its all about reliable message passing and is not about networking. I stress the reliable part as there are many different asynchronous networking frameworks (Vert.x for example) that provide message passing (also known as pub/sub).
More often than not most people combine the two patterns together to create a "message bus" that will work with a variety of networking needs with out unnecessary network blocking and for great integration and scalability.
The reason a "message queue" goes so well with a networking "reactor loop" is that you should not block on the reactor loop so you have to dispatch blocking work to some other process (thread, lwp, separate machine process, queue, etc..). In practice the cleanest way to do this is distributed message passing.
Based on your requirements it sounds like you should use asynchronous networking if you want the results to show up instantly and scale but you could probably get away with a simple system that just polls given you only have handful of clients. So the question is how many total users (Twisted)? And how reliable do you want the updates to be delivered (RabbitMQ)? Finally do you want your architecture to be language and platform agnostic... maybe you want to use Node.js later (focus on the message queue instead of async networking... ie RabbitMQ). Personally I would take a look at Vert.x which allows you to write in Python.
When someone is telling you that Twisted and RabbitMQ is different is because compare both is like compare two things with different target.
Twisted is a asynchronous framework, like Tornadao. RabbitMQ is a message queue system. You can't compare each one straight for.
You should turn your ask into a two new questions, the first one wich protocol should I use to communicate my process ? The answer can be figure out with words like amqp, Protocol Buffers ...
And the other one, which framework should I use to write my client and server program ? Here the answer can fall on Twisted, Tornado, ....
I'm working on a turn-based web game that will perform all world updates (player orders, physics, scripted events, etc.) on the server. For now, I could simply update the world in a web request callback. Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
So what is the best way to separate the load from the web server, ideally in a way that could even be run on a separate machine?
A simple python module with infinite loop?
A distributed task in something like Celery?
Some sort of cross-platform Cron scheduler?
Some other fancy Django feature or third-party library that I don't know about?
I also want to minimize code duplication by using the same model layer. That probably means my service would need access to the Django model code, so that definitely determines how I architect the service.
I think Celery, which you mention in your question, is the way to go here. It will interface nicely with the rest of your setup, support your eventual aim of separating out the systems, and is compatible with Django.
I'd just write the backend to just use the Django database interface (look at the setup code in your manage.py), spawn it as its own process, and interface to it with Protocol Buffers. That route should move to a separate machine with little work. MPI may be an option, too.
Pipes, FIFOs, and most other IPC requires both processes to be on the same box.
Though I have to point out a flaw in your premise:
Unfortunately, that naive approach is not at all scalable. I don't want to bog down my web server when I start running many concurrent games.
If you run concurrent games, so long as you keep all the parts for a given game on the same server, this is a non-issue, unless there's a common resource needed by all games. Then the real issue becomes load balancing across the servers.
We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.
Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.
Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.
You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.
Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).
It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)
My question is: which python framework should I use to build my server?
Notes:
This server talks HTTP with it's clients: GET and POST (via pyAMF)
Clients "submit" "tasks" for processing and, then, sometime later, retrieve the associated "task_result"
submit and retrieve might be separated by days - different HTTP connections
The "task" is a lump of XML describing a problem to be solved, and a "task_result" is a lump of XML describing an answer.
When a server gets a "task", it queues it for processing
The server manages this queue and, when tasks get to the top, organises that they are processed.
the processing is performed by a long running (15 mins?) external program (via subprocess) which is feed the task XML and which produces a "task_result" lump of XML which the server picks up and stores (for later Client retrieval).
it serves a couple of basic HTML pages showing the Queue and processing status (admin purposes only)
I've experimented with twisted.web, using SQLite as the database and threads to handle the long running processes.
But I can't help feeling that I'm missing a simpler solution. Am I? If you were faced with this, what technology mix would you use?
I'd recommend using an existing message queue. There are many to choose from (see below), and they vary in complexity and robustness.
Also, avoid threads: let your processing tasks run in a different process (why do they have to run in the webserver?)
By using an existing message queue, you only need to worry about producing messages (in your webserver) and consuming them (in your long running tasks). As your system grows you'll be able to scale up by just adding webservers and consumers, and worry less about your queuing infrastructure.
Some popular python implementations of message queues:
http://code.google.com/p/stomper/
http://code.google.com/p/pyactivemq/
http://xph.us/software/beanstalkd/
I'd suggest the following. (Since it's what we're doing.)
A simple WSGI server (wsgiref or werkzeug). The HTTP requests coming in will naturally form a queue. No further queueing needed. You get a request, you spawn the subprocess as a child and wait for it to finish. A simple list of children is about all you need.
I used a modification of the main "serve forever" loop in wsgiref to periodically poll all of the children to see how they're doing.
A simple SQLite database can track request status. Even this may be overkill because your XML inputs and results can just lay around in the file system.
That's it. Queueing and threads don't really enter into it. A single long-running external process is too complex to coordinate. It's simplest if each request is a separate, stand-alone, child process.
If you get immense bursts of requests, you might want a simple governor to prevent creating thousands of children. The governor could be a simple queue, built using a list with append() and pop(). Every request goes in, but only requests that fit will in some "max number of children" limit are taken out.
My reaction is to suggest Twisted, but you've already looked at this. Still, I stick by my answer. Without knowing you personal pain-points, I can at least share some things that helped me reduce almost all of the deferred-madness that arises when you have several dependent, blocking actions you need to perform for a client.
Inline callbacks (lightly documented here: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.defer.html) provide a means to make long chains of deferreds much more readable (to the point of looking like straight-line code). There is an excellent example of the complexity reduction this affords here: http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
You don't always have to get your bulk processing to integrate nicely with Twisted. Sometimes it is easier to break a large piece of your program off into a stand-alone, easily testable/tweakable/implementable command line tool and have Twisted invoke this tool in another process. Twisted's ProcessProtocol provides a fairly flexible way of launching and interacting with external helper programs. Furthermore, if you suddenly decide you want to cloudify your application, it is not all that big of a deal to use a ProcessProtocol to simply run your bulk processing on a remote server (random EC2 instances perhaps) via ssh, assuming you have the keys setup already.
You can have a look at celery
It seems any python web framework will suit your needs. I work with a similar system on a daily basis and I can tell you, your solution with threads and SQLite for queue storage is about as simple as you're going to get.
Assuming order doesn't matter in your queue, then threads should be acceptable. It's important to make sure you don't create race conditions with your queues or, for example, have two of the same job type running simultaneously. If this is the case, I'd suggest a single threaded application to do the items in the queue one by one.