I am going to create web server which could receive a lot of connections. These 10000 connected users will send to server numbers and server will return these squared numbers to users back.
10000 connections are too many and asynchronous approach is appropriate here.
I found two libraries for Python 3.4 which can help:
socketserver
&
asyncio
With socketserver library we can use ThreadingMixIn and ForkingMixIn classes as async handlers. But this is restricted by number of cores.
On the other hand we have asyncio library. And I don't understand how exactly does it works.
Which one should I use? And could these two libraries work together?
There are different approaches to asynchronous programming.
The first approach is to monitor IO operations using threads, and manage those operations in a non-blocking manner. This is what SocketServer does.
The second approach is to monitor IO operations in the main thread using an event loop and a selector. This is usually what people mean when they talk about asynchronous programming, and that's what asyncio, twisted and gevent do.
The single-threaded approach has two advantages:
it limits the risk of race condition since the callbacks are running in the same thread
it gets rid of the overhead of creating one thread per client (see the 10K problem)
Here is an example of an asyncio TCP server. In your case, simply replace the handle_echo coroutine with your own implementation:
async def handle_client(reader, writer):
data = await reader.readline()
result = int(data.decode().strip()) ** 2
writer.write(str(result)).encode())
writer.close()
It should easily be able to handle thousands of clients.
Related
How can I minimize the thread lock with Tornado? Actually, I have already the working code, but I suspect that it is not fully asynchronous.
I have a really long task.
It consists of making several requests to CouchDB to get meta-data and to construct a final link. Then I need to make the last request to CouchDB and stream a file (from 10 MB up to 100 MB). So, the result will be the streaming of a large file to a client.
The problem that the server can receive 100 simultaneous requests to download large files and I need not to lock thread and keep recieving new requests (I have to minimize the thread lock).
So, I am making several synchronous requests (requests library) and then stream a large file with chunks with AsyncHttpClient.
The questions are as follows:
1) Should I use AsyncHTTPClient EVERYWHERE? Since I have some interface it will take quite a lot of time to replace all synchronous requests with asynchronous ones. Is it worth doing it?
2) Should I use tornado.curl_httpclient.CurlAsyncHTTPClient? Will the code run faster (file download, making requests)?
3) I see that Python 3.5 introduced async and theoretically it can be faster. Should I use async or keep using the decorator #gen.coroutine?
Use AsyncHTTPClient or CurlAsyncHTTPClient. Since the "requests" library is synchronous, it blocks the Tornado event loop during execution and you can only have one request in progress at a time. To do asynchronous networking operations with Tornado requires purpose-built asynchronous network code, like CurlAsyncHTTPClient.
Yes, CurlAsyncHTTPClient is a bit faster than AsyncHTTPClient, you may notice a speedup if you stream large amounts of data with it.
async and await are faster than gen.coroutine and yield, so if you have yield statements that are executed very frequently in a tight loop, or if you have deeply nested coroutines that call coroutines, it will be worthwhile to port your code.
I already worked with python async frameworks like Twisted and Tornado. Also I know that python already have native implementation of async calls via asyncio module. I thought that (threads, multiprocessing) and async calls are different concepts. But not long ago I watched a couple of videos related to threading and multiprocessing and seems that all this async staff build above them. Is it true?
No, async calls is the way to structure a program. threading, multiprocessing may be used to implement some of these calls (but they are neither necessary nor common in Python asynchronous frameworks).
Concurrency is not parallelism:
In programming, concurrency is the composition of independently
executing processes, while parallelism is the simultaneous execution
of (possibly related) computations
Do not confuse how the program text is organized and how it is implemented (or executed). The exact same asynchronous code may be executed in a single thread, in multiple threads, in multiple processes. It is easy to switch between a simple Pool code that uses multiprocessing.Pool (processes), multiprocessing.dummy.Pool (threads), or their gevent-patched versions (single-threaded). Also, if there is only a single CPU then processes won't necessarily run in parallel but OS can make them run concurrently.
If by async you mean async keyword in Python then it means a generator function -- just one of the ways to create awaitable objects. asyncio is not the only way to consume such object e.g., there is curio which uses async functions but the backend is independent from asyncio. Recommended video: Python Concurrency From the Ground Up: LIVE!.
No, generally, async is single-threaded, and to implement async absolutely does not require the use of multiple threads of processes (that's the whole point of async). But there are use cases where people may want to mix them together for whatever reason.
In this model [the async model], the tasks are interleaved with one another, but in a
single thread of control. This is simpler than the threaded case
because the programmer always knows that when one task is executing,
another task is not. Although in a single-processor system a threaded
program will also execute in an interleaved pattern, a programmer
using threads should still think in terms of Figure 2, not Figure 3,
lest the program work incorrectly when moved to a multi-processor
system. But a single-threaded asynchronous system will always execute
with interleaving, even on a multi-processor system.
Source: http://krondo.com/?p=1209
I am running a basic logger using a SocketHandler; essentially a minor variant of this code: https://docs.python.org/2.4/lib/network-logging.html.
My question is, is the logging from the client asynchronous? If it is not, is there a way to enforce a timeout? i.e. essentially the client should wait for the logging to happen till 't' seconds and then move on. I have multiple processes logging through the same server.
It's asynchronous in the sense that it can handle inputs from multiple processes interleaved with each other, but not asynchronous in the sense that the calls to sockets are blocking. Since each client connection is handled in a new thread, this doesn't matter too much as long as there aren't too many client connections.
Background:
I have a current implementation that receives data from about 120 different socket connections in python. In my current implementation, I handle each of these separate socket connections with a dedicated thread for each. Each of these threads parse the data and eventually store it within a shared locked dictionary. These sockets DO NOT have uniform data rates, some sockets get more data than others.
Question:
Is this the best way to handle incoming data in python, or does python have a better way on handling multiple sockets per thread?
Using an asynchronous approach will make you much happier. For an example of a well-done implementation of this as a well-known application Tornado is perfect. You can easily use Tornado's ioloop for things other than web servers, too.
There are alternative libraries such as gevent; but I believe Tornado is a better place to look at first since it both provides the loop and a web server implemented on top of it as a great example of how to use the loop well.
If you're using threads, that's basically the way you'd go about it.
The alternative is to use one of the various asynchronous networking libraries out there, such as Twisted, Tornado, or GEvent.
As mentioned in Asynchronous UDP Socket Reading question from you, asyncoro can be used to process many asynchronous sockets efficiently. Another benefit with asyncoro in your problem is that you don't need to worry about locking shared dictionary, as with asyncoro at most one coroutine is executing at any time and there is no forced preemption.
I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)