How Python handles asynchronous REST requests

How Python handles asynchronous REST requests - python

Scenario: Lets say I have a REST API written in Python (using Flask maybe) that has a global variable stored. The API has two endpoints, one that reads the variable and returns it and the other one that writes it. Now, I have two clients that at the same time call both endpoints (one the read, one the write).
I know that in Python multiple threads will not actually run concurrently (due to the GIL), but there are some I/O operations that behave as asynchronously, would this scenario cause any conflict? And how does it behave, I'm assuming that the request that "wins the race" will hold the other request (is that right)?

In short: You should overthink your rest api design and implement some kind of fifo queue.
You have to endpoints (W for writing and R for reading). Lets say the global variable has some value V0 in the beginning. If the clients A reads from R while at the same time client B writes to W. Two things can happen.
The read request is faster. Client A will read V0.
The write request is faster. Client A will read V1.
You won't run into an inconsistent memory state due to the GIL you mentioned, but which of the cases from above happens, is completely unpredictable. One time the read request could be slightly faster and the other time the write request could be slightly faster. Much of the request handling is done in your operating system (e.g. address resolution or TCP connection management). Also the requests may traverse other machines like routers or switches in you network. All these things are completly out of your control and could delay the read request slightly more than the write request or the other way around. So it does not matter with how many threads you run your REST server, the return value is almost unpredictable.
If you really need ordered read write interaction, you can make the resource a fifo queue. So each time any client reads, it will pop the first element from the queue. Each time any client writes it will push that element to the end of the queue. If you do this, you are guaranteed to not lose any data due to overwriting and also you read the data in the same order that it is written.

Related

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.

Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)

Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

Threads vs Asynchronous Networking (Twisted) Python

I am writing an implementation of a NAT. My algorithm is as follows:
Packet comes in
Check against lookup table if external, add to lookup table if internal
Swap the source address and send the packet on its way
I have been reading about Twisted. I was curious if Twisted takes advantage of multicore CPUs? Assume the system has thousands of users and one packet comes right after the other. With twisted can the lookup table operations be taking place at the same time on each core. I hear with threads the GIL will not allow this anyway. Perhaps I could benifit from multiprocessing>
Nginix is asynchronous and happily serves thousands of users at the same time.

Using threads with twisted is discouraged. It has very good performance when used asynchronously, but the code you write for the request handlers must not block. So if your handler is a pretty big piece of code, break it up into smaller parts and utilize twisted's famous Deferreds to attach the other parts via callbacks. It certainly requires a somewhat different thinking than most programmers are used to, but it has benefits. If the code has blocking parts, like database operations, or accessing other resources via network to get some result, try finding asynchronous libraries for those tasks too, so you can use Deferreds in those cases also. If you can't use asynchronous libraries you may finally use the deferToThread function, which will run the function you want to call in a different thread and return a Deferred for it, and fire your callback when finished, but it's better to use that as a last resort, if nothing else can be done.
Here is the official tutorial for Deferreds:
http://twistedmatrix.com/documents/10.1.0/core/howto/deferredindepth.html
And another nice guide, which can help to get used to think in "async mode":
http://ezyang.com/twisted/defer2.html

Architecture for interactive webapp with concurrent users and heavy data processing

I've run into a specific problem and thought of an solution. But since the solution is pretty involved, I was wondering if others have encountered something similar and could comment on best practises or propose alternatives.
The problem is as follows:
I have a webapp written in Django which has some screen in which data from multiple tables is collected, grouped and aggregated in time intervals.
It's basically a big excel like matrix where we have data aggregated in time intervals on one axis, against resources for the aggregated data per interval on the other axis.
It involves many inner and left joins to gather all data, and because of the "report" like character of the presented data, I use raw sql to query everything together.
The problem is that multiple users can concurrently view & edit data in these intervals. They can also edit data on finer or coarser granularities than other users working with the same data, but in sub/overlapping intervals. Currently, when a user edits some data, a django request is fired, the data is altered, the affected intervals are aggregated & grouped again and presented back. But because of the volatile nature of this data, other users might have changed something before them. Also grouping/aggregating and rerendering the table each time is a very heavy operation (depending on amount of data and range of the intervals). This gets worse with concurrent users editting..
My proposed solution:
It's clear a http request/response mechanism is not really ideal for this kind of thing; The grouping/aggregation is pretty heavyweight, not ideal to do this per request, the concurrency would ideally be channeled amongst users, and feedback should be realtime like googledocs instead of full page refreshes.
I was thinking about making a daemon process which reads in flat data of interestfrom the dbms on request and caches this in memory. All changes to the data would then occur in memory with a write-through to the dbms. This daemon channels access to the data through a lock, so the daemon can handle which users can overwrite others changes.
The flat data is aggregated and grouped using python code and only the slices required by the user are returned; user/daemon communication would run over websockets. The daemon would provide a subscriber/publisher channel, where users interested in specific slices of data are notified when something changes. This daemon could be implemented using a framework like twisted. But I'm not sure an event driven approach would work here, as we want to "channel" all incomming requests... Maybe these should be put in a queue and be run in a seperate thread? Would it be better to have twisted run in a thread next to my scheduler, or should the twisted main loop spin off a thread that works on this queue? My understanding is that threading works best for IO, and python heavy code basically blocks other threads. I have both (websockets/dbms and processing data), would that work?
Has anyone done something similar before?
Thanks in advance!
Karl

The scheme Google implemented for the now abandoned Wave product's concurrent editing features is documented, http://www.waveprotocol.org/whitepapers/operational-transform. This aspect of Wave seemed like a success, even though Wave itself was quickly abandoned.
As far as the questions you asked about implementing your proposed scheme:
An event driven system is perfectly capable of implementing this idea. Being event driven is a way to organize your code. It doesn't prevent you from implementing any particular functionality.
Threading doesn't work best for very much, particularly in Python.
It has significant disadvantages for CPU-bound work, since CPython only runs a single Python thread at a time (regardless of available hardware resources). This means a multi-threaded CPU-bound Python program is typically no faster, or even slower, than the single-threaded equivalent.
For IO, this shortcoming is less of a limitation, because IO does not involve running Python code on CPython (the IO APIs are all implemented in C). This means you can do IO in multiple threads concurrently, so threading is potentially a benefit. However, doing IO concurrently in a single thread is exactly what Twisted is for. Threading offers no benefits over doing the IO in a single thread, as long as you're doing the IO non-blockingly (or perhaps asychronously).
Hello world.

I tried something similar and you might be interested in the solution. Here is my question:
python Socket.IO client for sending broadcast messages to TornadIO2 server
And this is the answer:
https://stackoverflow.com/a/10950702/675065
He also wrote a blog post about the solution:
http://blog.y3xz.com/blog/2012/06/08/a-modern-python-stack-for-a-real-time-web-application/
The software stack consists of:
SockJS Client
SockJS Tornado Server
Redis Pub/Sub
Django Redis Client: Brukva
I implemented this myself and it works like a charm.

Memory bounds in twisted applications

Consider the following scenario: A process on the server is used to handle data from a network connection. Twisted makes this very easy with spawnProcess and you can easily connect the ProcessTransport with your protocol on the network side.
However, I was unable to determine how Twisted handles a situation where the data from the network is available faster than the process performs reads on its standard input. As far as I can see, Twisted code mostly uses an internal buffer (self._buffer or similar) to store unconsumed data. Doesn't this mean that concurrent requests from a fast connection (eg. over local gigabit LAN) could fill up main memory and induce heavy swapping, making the situation even worse? How can this be prevented?
Ideally, the internal buffer would have an upper bound. As I understand it, the OS's networking code would automatically stall the connection/start dropping packets if the OS's buffers are full, which would slow down the client. (Yes I know, DoS on the network level is still possible, but this is a different problem). This is also the approach I would take if implementing it myself: just don't read from the socket if the internal buffer is full.
Restricting the maximum request size is also not an option in my case, as the service should be able to process files of arbitrary size.

The solution has two parts.
One part is called producers. Producers are objects that data comes out of. A TCP transport is a producer. Producers have a couple useful methods: pauseProducing and resumeProducing. pauseProducing causes the transport to stop reading data from the network. resumeProducing causes it to start reading again. This gives you a way to avoid building up an unbounded amount of data in memory that you haven't processed yet. When you start to fall behind, just pause the transport. When you catch up, resume it.
The other part is called consumers. Consumers are objects that data goes in to. A TCP transport is also a consumer. More importantly for your case, though, a child process transport is also a consumer. Consumers have a few methods, one in particular is useful to you: registerProducer. This tells the consumer which producer data is coming to it from. The consumer can them call pauseProducing and resumeProducing according to its ability to process the data. When a transport (TCP or process) cannot send data as fast as a producer is asking it to send data, it will pause the producer. When it catches up, it will resume it again.
You can read more about producers and consumers in the Twisted documentation.

Python "Task Server"

My question is: which python framework should I use to build my server?
Notes:
This server talks HTTP with it's clients: GET and POST (via pyAMF)
Clients "submit" "tasks" for processing and, then, sometime later, retrieve the associated "task_result"
submit and retrieve might be separated by days - different HTTP connections
The "task" is a lump of XML describing a problem to be solved, and a "task_result" is a lump of XML describing an answer.
When a server gets a "task", it queues it for processing
The server manages this queue and, when tasks get to the top, organises that they are processed.
the processing is performed by a long running (15 mins?) external program (via subprocess) which is feed the task XML and which produces a "task_result" lump of XML which the server picks up and stores (for later Client retrieval).
it serves a couple of basic HTML pages showing the Queue and processing status (admin purposes only)
I've experimented with twisted.web, using SQLite as the database and threads to handle the long running processes.
But I can't help feeling that I'm missing a simpler solution. Am I? If you were faced with this, what technology mix would you use?

I'd recommend using an existing message queue. There are many to choose from (see below), and they vary in complexity and robustness.
Also, avoid threads: let your processing tasks run in a different process (why do they have to run in the webserver?)
By using an existing message queue, you only need to worry about producing messages (in your webserver) and consuming them (in your long running tasks). As your system grows you'll be able to scale up by just adding webservers and consumers, and worry less about your queuing infrastructure.
Some popular python implementations of message queues:
http://code.google.com/p/stomper/
http://code.google.com/p/pyactivemq/
http://xph.us/software/beanstalkd/

I'd suggest the following. (Since it's what we're doing.)
A simple WSGI server (wsgiref or werkzeug). The HTTP requests coming in will naturally form a queue. No further queueing needed. You get a request, you spawn the subprocess as a child and wait for it to finish. A simple list of children is about all you need.
I used a modification of the main "serve forever" loop in wsgiref to periodically poll all of the children to see how they're doing.
A simple SQLite database can track request status. Even this may be overkill because your XML inputs and results can just lay around in the file system.
That's it. Queueing and threads don't really enter into it. A single long-running external process is too complex to coordinate. It's simplest if each request is a separate, stand-alone, child process.
If you get immense bursts of requests, you might want a simple governor to prevent creating thousands of children. The governor could be a simple queue, built using a list with append() and pop(). Every request goes in, but only requests that fit will in some "max number of children" limit are taken out.

My reaction is to suggest Twisted, but you've already looked at this. Still, I stick by my answer. Without knowing you personal pain-points, I can at least share some things that helped me reduce almost all of the deferred-madness that arises when you have several dependent, blocking actions you need to perform for a client.
Inline callbacks (lightly documented here: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.defer.html) provide a means to make long chains of deferreds much more readable (to the point of looking like straight-line code). There is an excellent example of the complexity reduction this affords here: http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
You don't always have to get your bulk processing to integrate nicely with Twisted. Sometimes it is easier to break a large piece of your program off into a stand-alone, easily testable/tweakable/implementable command line tool and have Twisted invoke this tool in another process. Twisted's ProcessProtocol provides a fairly flexible way of launching and interacting with external helper programs. Furthermore, if you suddenly decide you want to cloudify your application, it is not all that big of a deal to use a ProcessProtocol to simply run your bulk processing on a remote server (random EC2 instances perhaps) via ssh, assuming you have the keys setup already.

You can have a look at celery

It seems any python web framework will suit your needs. I work with a similar system on a daily basis and I can tell you, your solution with threads and SQLite for queue storage is about as simple as you're going to get.
Assuming order doesn't matter in your queue, then threads should be acceptable. It's important to make sure you don't create race conditions with your queues or, for example, have two of the same job type running simultaneously. If this is the case, I'd suggest a single threaded application to do the items in the queue one by one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.