Using async-await over Thread + Queue or ThreadPoolExecutor?

Using async-await over Thread + Queue or ThreadPoolExecutor? - python

I've never used the async-await syntax but I do often need to make HTTP/S requests and parse responses while awaiting future responses. To accomplish this task, I currently use the ThreadPoolExecutor class which execute the calls asynchronously anyways; effectively I'm achieving (I believe) the same result I would get with more lines of code to use async-await.
Operating under the assumption that my current implementations work asynchronously, I am wondering how the async-await implementation would differ from that of my original one which used Threads and a Queue to manage workers; it also used a Semaphore to limit workers.
That implementation was devised under the following conditions:
There may be any number of requests
Total number of active requests may be 4
Only send next request when a response is received
The basic flow of the implementation was as follows:
Generate container of requests
Create a ListeningQueue
For each request create a Thread and pass the URL, ListeningQueue and Semaphore
Each Thread attempts to acquire the Semaphore (limited to 4 Threads)
Main Thread continues in a while checking ListeningQueue
When a Thread receives a response, place in ListeningQueue and release Semaphore
A waiting Thread acquires Semaphore (process repeats)
Main Thread processes responses until count equals number of requests
Because I need to limit the number of active Threads I use a Semaphore, and if I were to try this using async-await I would have to devise some logic in the Main Thread or in the async def that prevents a request from being sent if the limit has been reached. Apart from that constraint, I don't see where using async-await would be any more useful. Is it that it lowers overhead and race condition chances by eliminating Threads? Is that the main benefit? If so, even though using a ThreadPoolExecutor is making asynchronous calls it is using a pool of Threads, thus making async-await a better option?

Operating under the assumption that my current implementations work asynchronously, I am wondering how the async-await implementation would differ from that of my original one which used Threads and a Queue to manage workers
It would not be hard to implement very similar logic using asyncio and async-await, which has its own version of semaphore that is used in much the same way. See answers to this question for examples of limiting the number of parallel requests with a fixed number of tasks or by using a semaphore.
As for advantages of asyncio over equivalent code using threads, there are several:
Everything runs in a single thread regardless of the number of active connections. Your program can scale to a large number of concurrent tasks without swamping the OS with an unreasonable number of threads or the downloads having to wait for a free slot in the thread pool before they even start.
As you pointed out, single-threaded execution is less susceptible to race conditions because the points where a task switch can occur are clearly marked with await, and everything in-between is effectively atomic. The advantage of this is less obvious in small threaded programs where the executor just hands tasks to threads in a fire-and-collect fashion, but as the logic grows more complex and the threads begin to share more state (e.g. due to caching or some synchronization logic), this becomes more pronounced.
async/await allows you to easily create additional independent tasks for things like monitoring, logging and cleanup. When using threads, those do not fit the executor model and require additional threads, always with a design smell that suggests threads are being abused. With asyncio, each task can be as if it were running in its own thread, and use await to wait for something to happen (and yield control to others) - e.g. a timer-based monitoring task would consist of a loop that awaits asyncio.sleep(), but the logic could be arbitrarily complex. Despite the code looking sequential, each task is lightweight and carries no more weight to the OS than that of a small allocated object.
async/await supports reliable cancellation, which threads never did and likely never will. This is often overlooked, but in asyncio it is perfectly possible to cancel a running task, which causes it to wake up from await with an exception that terminates it. Cancellation makes it straightforward to implement timeouts, task groups, and other patterns that are impossible or a huge chore when using threads.
On the flip side, the disadvantage of async/await is that all your code must be async. Among other things, it means that you cannot use libraries like requests, you have to switch to asyncio-aware alternatives like aiohttp.

Related

Can you add coroutine to front of event loop queue?

Is there a way to create a task but to have it specifically the next task run in the event loop?
Suppose I have an event loop currently running several low priority coroutines. Perhaps a few high priority API request tasks come along and I want to immediately asynchronously make these requests and then yield control back to the tasks previously in the loop.
I realize that the latency with a network request is orders of magnitude larger than a few CPU cycles saved by reordering the cooperative tasks in the loop, but nevertheless I am curious if there is a way to achieve this.

I want to immediately asynchronously make these requests and then yield control back to the tasks previously in the loop.
There is no way to do that in the current asyncio, where all runnable tasks reside in a non-prioritized queue.
But there is a deeper issue with the above requirement. Asynchronous tasks potentially yield control to the event loop at every blocking IO call, or more generally at every await. So "immediately" and "asynchronously" don't go together: a truly asynchronous operation cannot be immediate because it has to be suspendable, and when it is suspended, other tasks will proceed.
If you really want something to happen immediately, you need to do it synchronously. Other tasks will be blocked anyway because the synchronous operation will not allow them to run.
This is likely the reason why asyncio doesn't support task prioritization. By their very nature tasks execute in short slices that can be interleaved in arbitrary ways, so the order in which they execute should not matter in general. In cases when the order does matter, one is expected to use the provided synchronization devices.

Is this multi-threaded function asynchronous

I'm afraid I'm still a bit confused (despite checking other threads) whether:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
My initial guess is no to both and that proper asynchronous code should be able to run in one thread - however it can be improved by adding threads for example like so:
So I constructed this toy example:
from threading import *
from queue import Queue
import time
def do_something_with_io_lag(in_work):
out = in_work
# Imagine we do some work that involves sending
# something over the internet and processing the output
# once it arrives
time.sleep(0.5) # simulate IO lag
print("Hello, bee number: ",
str(current_thread().name).replace("Thread-",""))
class WorkerBee(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
while True:
# Get some work from the queue
work_todo = self.q.get()
# This function will simiulate I/O lag
do_something_with_io_lag(work_todo)
# Remove task from the queue
self.q.task_done()
if __name__ == '__main__':
def time_me(nmbr):
number_of_worker_bees = nmbr
worktodo = ['some input for work'] * 50
# Create a queue
q = Queue()
# Fill with work
[q.put(onework) for onework in worktodo]
# Launch processes
for _ in range(number_of_worker_bees):
t = WorkerBee(q)
t.start()
# Block until queue is empty
q.join()
# Run this code in serial mode (just one worker)
%time time_me(nmbr=1)
# Wall time: 25 s
# Basically 50 requests * 0.5 seconds IO lag
# For me everything gets processed by bee number: 59
# Run this code using multi-tasking (launch 50 workers)
%time time_me(nmbr=50)
# Wall time: 507 ms
# Basically the 0.5 second IO lag + 0.07 seconds it took to launch them
# Now everything gets processed by different bees
Is it asynchronous?
To me this code does not seem asynchronous because it is Figure 3 in my example diagram. The I/O call blocks the thread (although we don't feel it because they are blocked in parallel).
However, if this is the case I am confused why requests-futures is considered asynchronous since it is a wrapper around ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
Can this function on just one thread?
Especially when compared to asyncio, which means it can run single-threaded
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.

First of all, one note: concurrent.futures.Future is not the same as asyncio.Future. Basically it's just an abstraction - an object, that allows you to refer to job result (or exception, which is also a result) in your program after you assigned a job, but before it is completed. It's similar to assigning common function's result to some variable.
Multithreading: Regarding your example, when using multiple threads you can say that your code is "asynchronous" as several operations are performed in different threads at the same time without waiting for each other to complete, and you can see it in the timing results. And you're right, your function due to sleep is blocking, it blocks the worker thread for the specified amount of time, but when you use several threads those threads are blocked in parallel. So if you would have one job with sleep and the other one without and run multiple threads, the one without sleep would perform calculations while the other would sleep. When you use single thread, the jobs are performed in in a serial manner one after the other, so when one job sleeps the other jobs wait for it, actually they just don't exist until it's their turn. All this is pretty much proven by your time tests. The thing happened with print has to do with "thread safety", i.e. print uses standard output, which is a single shared resource. So when your multiple threads tried to print at the same time the switching happened inside and you got your strange output. (This also show "asynchronicity" of your multithreaded example.) To prevent such errors there are locking mechanisms, e.g. locks, semaphores, etc.
Asyncio: To better understand the purpose note the "IO" part, it's not 'async computation', but 'async input/output'. When talking about asyncio you usually don't think about threads at first. Asyncio is about event loop and generators (coroutines). The event loop is the arbiter, that governs the execution of coroutines (and their callbacks), that were registered to the loop. Coroutines are implemented as generators, i.e. functions that allow to perform some actions iteratively, saving state at each iteration and 'returning', and on the next call continuing with the saved state. So basically the event loop is while True: loop, that calls all coroutines/generators, assigned to it, one after another, and they provide result or no-result on each such call - this provides possibility for "asynchronicity". (A simplification, as there's scheduling mechanisms, that optimize this behavior.) The event loop in this situation can run in single thread and if coroutines are non-blocking it will give you true "asynchronicity", but if they are blocking then it's basically a linear execution.
You can achieve the same thing with explicit multithreading, but threads are costly - they require memory to be assigned, switching them takes time, etc. On the other hand asyncio API allows you to abstract from actual implementation and just consider your jobs to be performed asynchronously. It's implementation may be different, it includes calling the OS API and the OS decides what to do, e.g. DMA, additional threads, some specific microcontroller use, etc. The thing is it works well for IO due to lower level mechanisms, hardware stuff. On the other hand, performing computation will require explicit breaking of computation algorithm into pieces to use as asyncio coroutine, so a separate thread might be a better decision, as you can launch the whole computation as one there. (I'm not talking about algorithms that are special to parallel computing). But asyncio event loop might be explicitly set to use separate threads for coroutines, so this will be asyncio with multithreading.
Regarding your example, if you'll implement your function with sleep as asyncio coroutine, shedule and run 50 of them single threaded, you'll get time similar to the first time test, i.e. around 25s, as it is blocking. If you will change it to something like yield from [asyncio.sleep][3](0.5) (which is a coroutine itself), shedule and run 50 of them single threaded, it will be called asynchronously. So while one coroutine will sleep the other will be started, and so on. The jobs will complete in time similar to your second multithreaded test, i.e. close to 0.5s. If you will add print here you'll get good output as it will be used by single thread in serial manner, but the output might be in different order then the order of coroutine assignment to the loop, as coroutines could be run in different order. If you will use multiple threads, then the result will obviously be close to the last one anyway.
Simplification: The difference in multythreading and asyncio is in blocking/non-blocking, so basicly blocking multithreading will somewhat come close to non-blocking asyncio, but there're a lot of differences.
Multithreading for computations (i.e. CPU bound code)
Asyncio for input/output (i.e. I/O bound code)
Regarding your original statement:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
I hope that I was able to show, that:
asynchronous code might be both single threaded and multi-threaded
all multi-threaded functions could be called "asynchronous"

I think the main confusion comes from the meaning of asynchronous. From the Free Online Dictionary of Computing, "A process [...] whose execution can proceed independently" is asynchronous. Now, apply that to what your bees do:
Retrieve an item from the queue. Only one at a time can do that, while the order in which they get an item is undefined. I wouldn't call that asynchronous.
Sleep. Each bee does so independently of all others, i.e. the sleep duration runs on all, otherwise the time wouldn't go down with multiple bees. I'd call that asynchronous.
Call print(). While the calls are independent, at some point the data is funneled into the same output target, and at that point a sequence is enforced. I wouldn't call that asynchronous. Note however that the two arguments to print() and also the trailing newline are handled independently, which is why they can be interleaved.
Lastly, the call to q.join(). Here of course the calling thread is blocked until the queue is empty, so some kind of synchronization is enforced and wanted. I don't see why this "seems to break" for you.

reactor design pattern in a single thread vs multiple threads

I've been reading about the reactor design pattern, specifically in the context of the Python Twisted networking framework. My simple understanding of the reactor design is that there is a single thread that will sit and wait until one or more I/O sources (or file descriptors) become available, and then it will synchronously loop through each of those sources, doing whatever callbacks specified for each of these sources. Which does mean that the program as a whole would block if any of the callbacks are themselves blocking. And regardless, once all callbacks have executed, the reactor goes back to waiting for more I/O sources to become ready.
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source. I imagine this may be less efficient if all your callbacks are very fast, as the OS now has to deal with managing multiple threads and swapping between them. But it seems that it's now impossible to block the main program, and as an added benefit, the main reactor can keep listening for sources. In short, why does something like Twisted not do this, instead keeping to a single-threaded model?

What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source.
What you're describing is basically what happens in a multithreaded program that uses blocking I/O APIs. In this case, the "reactor" moves into the kernel and the "asynchronous looping" is the kernel completing some outstanding blocking operation, freeing up a user-space thread to proceed.
The cons of this approach are the greatly increased complexity with respect to thread-safety (ie, correctness) that it incurs compared to a strictly single-threaded approach.
The pros are better utilization of multiple CPUs (but running multiple single-threaded event-driven processes often offers this benefit as well) and the greater number of programmers who are familiar and comfortable (though often mistakenly so) with the multithreading approach to concurrency.
Also related, though, are the PyPy team's efforts towards providing a better abstraction over the conventional multithreading model. PyPy's work towards Software Transactional Memory (STM) could offer a system in which work is dispatched asynchronously to multiple worker threads without violating the assumptions that are valid in a strictly single-threaded system. If this works out, it could offer the best of both worlds.

But it seems that it's now impossible to block the main program,
I'm not a Python guy but have done this in the context of Boost. Asio. You're correct—your callbacks need to execute quickly and return control to the main reactor. The idea is to only use asynchronous calls in your callbacks. For example, you wouldn't use an API for sending an IP datagram that blocks and returns a status code. Instead, you'd use a non-blocking API where you register success and failure callbacks. This lets the call send call return immediately. The reactor will then call the success/failure callback once the OS has dealt with the packet.

Twisted - should this code be run in separate threads

I am running some code that has X workers, each worker pulling tasks from a queue every second. For this I use twisted's task.LoopingCall() function. Each worker fulfills its request (scrape some data) and then pushes the response back to another queue. All this is done in the reactor thread since I am not deferring this to any other thread.
I am wondering whether I should run all these jobs in separate threads or leave them as they are. And if so, is there a problem if I call task.LoopingCall every second from each thread ?

No, you shouldn't use threads. You can't call LoopingCall from a thread (unless you use reactor.callFromThread), but it wouldn't help you make your code faster.
If you notice a performance problem, you may want to profile your workload, figure out where the CPU-intensive work is, and then put that work into multiple processes, spawned with spawnProcess. You really can't skip the step where you figure out where the expensive work is, though: there's no magic pixie dust you can sprinkle on your Twisted application that will make it faster. If you choose a part of your code which isn't very intensive and doesn't require blocking resources like CPU or disk, then you will discover that the overhead of moving work to a different process may outweigh any benefit of having it there.

You shouldn't use threads for that. Doing it all in the reactor thread is ok. If your scraping uses twisted.web.client to do the network access, it shouldn't block, so you will go as fast as it gets.

First, beware that Twisted's reactor sometimes multithreads and assigns tasks without telling you anything. Of course, I haven't seen your program in particular.
Second, in Python (that is, in CPython) spawning threads to do non-blocking computation has little benefit. Read up on the GIL (Global Interpreter Lock).

Threads vs. Async

I have been reading up on the threaded model of programming versus the asynchronous model from this really good article. http://krondo.com/blog/?p=1209
However, the article mentions the following points.
An async program will simply outperform a sync program by switching between tasks whenever there is a I/O.
Threads are managed by the operating system.
I remember reading that threads are managed by the operating system by moving around TCBs between the Ready-Queue and the Waiting-Queue(amongst other queues). In this case, threads don't waste time on waiting either do they?
In light of the above mentioned, what are the advantages of async programs over threaded programs?

It is very difficult to write code that is thread safe. With asyncronous code, you know exactly where the code will shift from one task to the next and race conditions are therefore much harder to come by.
Threads consume a fair amount of data since each thread needs to have its own stack. With async code, all the code shares the same stack and the stack is kept small due to continuously unwinding the stack between tasks.
Threads are OS structures and are therefore more memory for the platform to support. There is no such problem with asynchronous tasks.
Update 2022:
Many languages now support stackless co-routines (async/await). This allows us to write a task almost synchronously while yielding to other tasks (awaiting) at set places (sleeping or waiting for networking or other threads)

There are two ways to create threads:
synchronous threading - the parent creates one (or more) child threads and then must wait for each child to terminate. Synchronous threading is often referred to as the fork-join model.
asynchronous threading - the parent and child run concurrently/independently of one another. Multithreaded servers typically follow this model.
resource - http://www.amazon.com/Operating-System-Concepts-Abraham-Silberschatz/dp/0470128720

Assume you have 2 tasks, which does not involve any IO (on multiprocessor machine).
In this case threads outperform Async. Because Async like a
single threaded program executes your tasks in order. But threads can
execute both the tasks simultaneously.
Assume you have 2 tasks, which involve IO (on multiprocessor machine).
In this case both Async and Threads performs more or less same (performance
might vary based on number of cores, scheduling, how much process intensive
the task etc.). Also Async takes less amount of resources, low overhead and
less complex to program over multi threaded program.
How it works?
Thread 1 executes Task 1, since it is waiting for IO, it is moved to IO
waiting Queue. Similarly Thread 2 executes Task 2, since it is also involves
IO, it is moved to IO waiting Queue. As soon as it's IO request is resolved
it is moved to ready queue so the scheduler can schedule the thread for
execution.
Async executes Task 1 and without waiting for it's IO to complete it
continues with Task 2 then it waits for IO of both the task to complete. It
completes the tasks in the order of IO completion.
Async best suited for tasks which involve Web service calls, Database query
calls etc.,
Threads for process intensive tasks.
The below video explains aboutAsync vs Threaded model and also when to use etc.,
https://www.youtube.com/watch?v=kdzL3r-yJZY
Hope this is helpful.

First of all, note that a lot of the detail of how threads are implemented and scheduled are very OS-specific. In general, you shouldn't need to worry about threads waiting on each other, since the OS and the hardware will attempt to arrange for them to run efficiently, whether asynchronously on a single-processor system or in parallel on multi-processors.
Once a thread has finished waiting for something, say I/O, it can be thought of as runnable. Threads that are runnable will be scheduled for execution at some point soon. Whether this is implemented as a simple queue or something more sophisticated is, again, OS- and hardware-specific. You can think of the set of blocked threads as a set rather than as a strictly ordered queue.
Note that on a single-processor system, asynchronous programs as defined here are equivalent to threaded programs.

see http://en.wikipedia.org/wiki/Thread_(computing)#I.2FO_and_scheduling
However, the use of blocking system calls in user threads (as opposed to kernel threads) or fibers can be problematic. If a user thread or a fiber performs a system call that blocks, the other user threads and fibers in the process are unable to run until the system call returns. A typical example of this problem is when performing I/O: most programs are written to perform I/O synchronously. When an I/O operation is initiated, a system call is made, and does not return until the I/O operation has been completed. In the intervening period, the entire process is "blocked" by the kernel and cannot run, which starves other user threads and fibers in the same process from executing.
According to this, your whole process might be blocked, and no thread will be scheduled when one thread is blocked in IO. I think this is os-specific, and will not always be hold.

To be fair, let's point out the benefit of Threads under CPython GIL compared to async approach:
it's easier first to write typical code that has one flow of events (no parallel execution) and then to run multiple copies of it in separate threads: it will keep each copy responsive, while the benefit of executing all I/O operations in parallel will be achieved automatically;
many time-proven libraries are sync and therefore easy to be included in the threaded version, and not in async one;
some sync libraries actually let GIL go at C level that allows parallel execution for tasks beyond I/O-bound ones: e.g. NumPy;
it's harder to write async code in general: the inclusion of a heavy CPU-bound section will make concurrent tasks not responsive, or one may forget to await the result and finish execution earlier.
So if there are no immediate plans to scale your services beyond ~100 concurrent connections it may be easier to start with a threaded version and then rewrite it... using some other more performant language like Go.

Async I/O means there is already a thread in the driver that does the job, so you are duplicating functionality and incurring some overhead. On the other hand, often it is not documented how exactly the driver thread behaves, and in complex scenarios, when you want to control timeout/cancellation/start/stop behaviour, synchronization with other threads, it makes sense to implement your own thread. It is also sometimes easier to reason in sync terms.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.