Threads vs. Async - python

I have been reading up on the threaded model of programming versus the asynchronous model from this really good article. http://krondo.com/blog/?p=1209
However, the article mentions the following points.
An async program will simply outperform a sync program by switching between tasks whenever there is a I/O.
Threads are managed by the operating system.
I remember reading that threads are managed by the operating system by moving around TCBs between the Ready-Queue and the Waiting-Queue(amongst other queues). In this case, threads don't waste time on waiting either do they?
In light of the above mentioned, what are the advantages of async programs over threaded programs?

It is very difficult to write code that is thread safe. With asyncronous code, you know exactly where the code will shift from one task to the next and race conditions are therefore much harder to come by.
Threads consume a fair amount of data since each thread needs to have its own stack. With async code, all the code shares the same stack and the stack is kept small due to continuously unwinding the stack between tasks.
Threads are OS structures and are therefore more memory for the platform to support. There is no such problem with asynchronous tasks.
Update 2022:
Many languages now support stackless co-routines (async/await). This allows us to write a task almost synchronously while yielding to other tasks (awaiting) at set places (sleeping or waiting for networking or other threads)

There are two ways to create threads:
synchronous threading - the parent creates one (or more) child threads and then must wait for each child to terminate. Synchronous threading is often referred to as the fork-join model.
asynchronous threading - the parent and child run concurrently/independently of one another. Multithreaded servers typically follow this model.
resource - http://www.amazon.com/Operating-System-Concepts-Abraham-Silberschatz/dp/0470128720

Assume you have 2 tasks, which does not involve any IO (on multiprocessor machine).
In this case threads outperform Async. Because Async like a
single threaded program executes your tasks in order. But threads can
execute both the tasks simultaneously.
Assume you have 2 tasks, which involve IO (on multiprocessor machine).
In this case both Async and Threads performs more or less same (performance
might vary based on number of cores, scheduling, how much process intensive
the task etc.). Also Async takes less amount of resources, low overhead and
less complex to program over multi threaded program.
How it works?
Thread 1 executes Task 1, since it is waiting for IO, it is moved to IO
waiting Queue. Similarly Thread 2 executes Task 2, since it is also involves
IO, it is moved to IO waiting Queue. As soon as it's IO request is resolved
it is moved to ready queue so the scheduler can schedule the thread for
execution.
Async executes Task 1 and without waiting for it's IO to complete it
continues with Task 2 then it waits for IO of both the task to complete. It
completes the tasks in the order of IO completion.
Async best suited for tasks which involve Web service calls, Database query
calls etc.,
Threads for process intensive tasks.
The below video explains aboutAsync vs Threaded model and also when to use etc.,
https://www.youtube.com/watch?v=kdzL3r-yJZY
Hope this is helpful.

First of all, note that a lot of the detail of how threads are implemented and scheduled are very OS-specific. In general, you shouldn't need to worry about threads waiting on each other, since the OS and the hardware will attempt to arrange for them to run efficiently, whether asynchronously on a single-processor system or in parallel on multi-processors.
Once a thread has finished waiting for something, say I/O, it can be thought of as runnable. Threads that are runnable will be scheduled for execution at some point soon. Whether this is implemented as a simple queue or something more sophisticated is, again, OS- and hardware-specific. You can think of the set of blocked threads as a set rather than as a strictly ordered queue.
Note that on a single-processor system, asynchronous programs as defined here are equivalent to threaded programs.

see http://en.wikipedia.org/wiki/Thread_(computing)#I.2FO_and_scheduling
However, the use of blocking system calls in user threads (as opposed to kernel threads) or fibers can be problematic. If a user thread or a fiber performs a system call that blocks, the other user threads and fibers in the process are unable to run until the system call returns. A typical example of this problem is when performing I/O: most programs are written to perform I/O synchronously. When an I/O operation is initiated, a system call is made, and does not return until the I/O operation has been completed. In the intervening period, the entire process is "blocked" by the kernel and cannot run, which starves other user threads and fibers in the same process from executing.
According to this, your whole process might be blocked, and no thread will be scheduled when one thread is blocked in IO. I think this is os-specific, and will not always be hold.

To be fair, let's point out the benefit of Threads under CPython GIL compared to async approach:
it's easier first to write typical code that has one flow of events (no parallel execution) and then to run multiple copies of it in separate threads: it will keep each copy responsive, while the benefit of executing all I/O operations in parallel will be achieved automatically;
many time-proven libraries are sync and therefore easy to be included in the threaded version, and not in async one;
some sync libraries actually let GIL go at C level that allows parallel execution for tasks beyond I/O-bound ones: e.g. NumPy;
it's harder to write async code in general: the inclusion of a heavy CPU-bound section will make concurrent tasks not responsive, or one may forget to await the result and finish execution earlier.
So if there are no immediate plans to scale your services beyond ~100 concurrent connections it may be easier to start with a threaded version and then rewrite it... using some other more performant language like Go.

Async I/O means there is already a thread in the driver that does the job, so you are duplicating functionality and incurring some overhead. On the other hand, often it is not documented how exactly the driver thread behaves, and in complex scenarios, when you want to control timeout/cancellation/start/stop behaviour, synchronization with other threads, it makes sense to implement your own thread. It is also sometimes easier to reason in sync terms.

Related

In c# do we have control to explicitly run code on single processor or a certain number of processors?

Coming from python background where we have:
Multiprocessing- to run multiple tasks in parallel on the processors - ideal for cpu bound tasks.
Multi threading- to run multiple tasks in parallel via threads within a single processor. Ideal for io bound tasks. Io bound only because of presence of GIL which results in only 1 thread being able to use the processor at any point in time.
Async await- to do cooperative programming for io bound tasks.
Using python code I can achieve either of the above.
In the case of c# I am learning that:
We can either start threads explicitly or use concept of async await and Task.Run that implicitly handles the threads creation and management.
Async await is good for io bound tasks. Async await + Task.Run is good for cpu bound tasks.
In windows forms app we have the concept of synchronization context which ensures that only the single Ui thread runs all the code.
However if we use async await + Task.Run (which is good for cpu bound tasks), then the code inside Task.Run will run in separate thread.
Either ways code below the await will run on ui thread.
Where as in console app, whether we use async await or async await + Task.Run, code after the await will be run by multiple threads (from threadpool) in parallel due to the absence of synchronization context. An example of this is shown here:
In console app, why does synchronous blocking code (Thread.Sleep(..)) when used in an awaited async task behave like multi threading?
In c#, threads are derived from the threadpool (I think that there is one thread per processor core, please correct me). However, unlike python, we don't have control to run code explicitly to target single or certain number of processors in c# do we?
Take a look at ProcessorAffinity. But remind: choosing to run on a single CPU core doesn't solve thread synchronisation problems.
There are several things we need to address here:
Python multi-processing
Python multi-processing refers more to the concept of process rather than processors (as in CPU cores). The idea is that you can launch sections of code in a whole different process with it's own interpreter runtime (therefore it's own GIL), memory space, etc...
It does not mean that a single Python process can't use several cores at once!
It's a very common misconception regarding Python. Many Python operations do not hold the GIL. You can create your own examples by writing a function that does mostly heavy NumPy operations, and starting this function in 20 different threads at once. If you check the CPU usage you'll see that 20 cores are at ~100% usage. You can also use Numba to JIT functions in a way that doesn't hold the GIL.
Limiting Core Usage
The way to use 2 specific cores is via the OS by changing the processor affinity of the process. I'd want to hear more about why you'd want to limit core usage though, because it sounds like you're trying to do something that can be achieved much more cleanly. Maybe you want to run some background tasks but need a Semaphore in order to make sure that only 2 tasks at most run at once?
Threads vs Async/Await
The point of the whole async/await syntax is to spare you from micro-managing threads. The idea is that you as a programmer write what you want to do asynchronously, or in other words, what operations shouldn't block the execution of the calling method, and the JIT/Runtime will handle the execution for you. It can be on a single thread, multiple threads, etc...
There are cases like you mentioned with WPF where you need to keep in mind that certain things can only be done on the UI thread. However, we ideally want to occupy the UI thread as little as necessary, so we use Tasks for most of the work. We can also use the Dispatcher inside the background Tasks for specific UI thread code.
After the await
You mentioned that:
Either ways code below the await will run on ui thread.
The code after the "await" by default will attempt to resume on the previous context. In the UI thread it means that both sections (before and after the await) will execute on the UI thread. However, this can be avoided (and many times should be avoided) by using ConfigureAwait(false) call on the task that's being awaited.

Using async-await over Thread + Queue or ThreadPoolExecutor?

I've never used the async-await syntax but I do often need to make HTTP/S requests and parse responses while awaiting future responses. To accomplish this task, I currently use the ThreadPoolExecutor class which execute the calls asynchronously anyways; effectively I'm achieving (I believe) the same result I would get with more lines of code to use async-await.
Operating under the assumption that my current implementations work asynchronously, I am wondering how the async-await implementation would differ from that of my original one which used Threads and a Queue to manage workers; it also used a Semaphore to limit workers.
That implementation was devised under the following conditions:
There may be any number of requests
Total number of active requests may be 4
Only send next request when a response is received
The basic flow of the implementation was as follows:
Generate container of requests
Create a ListeningQueue
For each request create a Thread and pass the URL, ListeningQueue and Semaphore
Each Thread attempts to acquire the Semaphore (limited to 4 Threads)
Main Thread continues in a while checking ListeningQueue
When a Thread receives a response, place in ListeningQueue and release Semaphore
A waiting Thread acquires Semaphore (process repeats)
Main Thread processes responses until count equals number of requests
Because I need to limit the number of active Threads I use a Semaphore, and if I were to try this using async-await I would have to devise some logic in the Main Thread or in the async def that prevents a request from being sent if the limit has been reached. Apart from that constraint, I don't see where using async-await would be any more useful. Is it that it lowers overhead and race condition chances by eliminating Threads? Is that the main benefit? If so, even though using a ThreadPoolExecutor is making asynchronous calls it is using a pool of Threads, thus making async-await a better option?
Operating under the assumption that my current implementations work asynchronously, I am wondering how the async-await implementation would differ from that of my original one which used Threads and a Queue to manage workers
It would not be hard to implement very similar logic using asyncio and async-await, which has its own version of semaphore that is used in much the same way. See answers to this question for examples of limiting the number of parallel requests with a fixed number of tasks or by using a semaphore.
As for advantages of asyncio over equivalent code using threads, there are several:
Everything runs in a single thread regardless of the number of active connections. Your program can scale to a large number of concurrent tasks without swamping the OS with an unreasonable number of threads or the downloads having to wait for a free slot in the thread pool before they even start.
As you pointed out, single-threaded execution is less susceptible to race conditions because the points where a task switch can occur are clearly marked with await, and everything in-between is effectively atomic. The advantage of this is less obvious in small threaded programs where the executor just hands tasks to threads in a fire-and-collect fashion, but as the logic grows more complex and the threads begin to share more state (e.g. due to caching or some synchronization logic), this becomes more pronounced.
async/await allows you to easily create additional independent tasks for things like monitoring, logging and cleanup. When using threads, those do not fit the executor model and require additional threads, always with a design smell that suggests threads are being abused. With asyncio, each task can be as if it were running in its own thread, and use await to wait for something to happen (and yield control to others) - e.g. a timer-based monitoring task would consist of a loop that awaits asyncio.sleep(), but the logic could be arbitrarily complex. Despite the code looking sequential, each task is lightweight and carries no more weight to the OS than that of a small allocated object.
async/await supports reliable cancellation, which threads never did and likely never will. This is often overlooked, but in asyncio it is perfectly possible to cancel a running task, which causes it to wake up from await with an exception that terminates it. Cancellation makes it straightforward to implement timeouts, task groups, and other patterns that are impossible or a huge chore when using threads.
On the flip side, the disadvantage of async/await is that all your code must be async. Among other things, it means that you cannot use libraries like requests, you have to switch to asyncio-aware alternatives like aiohttp.

When should I be using asyncio over regular threads, and why? Does it provide performance increases?

I have a pretty basic understanding of multithreading in Python and an even basic-er understanding of asyncio.
I'm currently writing a small Curses-based program (eventually going to be using a full GUI, but that's another story) that handles the UI and user IO in the main thread, and then has two other daemon threads (each with their own queue/worker-method-that-gets-things-from-a-queue):
a watcher thread that watches for time-based and conditional (e.g. posts to a message board, received messages, etc.) events to occur and then puts required tasks into...
the other (worker) daemon thread's queue which then completes them.
All three threads are continuously running concurrently, which leads me to some questions:
When the worker thread's queue (or, more generally, any thread's queue) is empty, should it be stopped until is has something to do again, or is it okay to leave continuously running? Do concurrent threads take up a lot of processing power when they aren't doing anything other than watching its queue?
Should the two threads' queues be combined? Since the watcher thread is continuously running a single method, I guess the worker thread would be able to just pull tasks from the single queue that the watcher thread puts in.
I don't think it'll matter since I'm not multiprocessing, but is this setup affected by Python's GIL (which I believe still exists in 3.4) in any way?
Should the watcher thread be running continuously like that? From what I understand, and please correct me if I'm wrong, asyncio is supposed to be used for event-based multithreading, which seems relevant to what I'm trying to do.
The main thread is basically always just waiting for the user to press a key to access a different part of the menu. This seems like a situation asyncio would be perfect for, but, again, I'm not sure.
Thanks!
When the worker thread's queue (or, more generally, any thread's queue) is empty, should it be stopped until is has something to do again, or is it okay to leave continuously running? Do concurrent threads take up a lot of processing power when they aren't doing anything other than watching its queue?
You should just use a blocking call to queue.get(). That will leave the thread blocked on I/O, which means the GIL will be released, and no processing power (or at least a very minimal amount) will be used. Don't use non-blocking gets in a while loop, since that's going to require a lot more CPU wakeups.
Should the two threads' queues be combined? Since the watcher thread is continuously running a single method, I guess the worker thread would be able to just pull tasks from the single queue that the watcher thread puts in.
If all the watcher is doing is pulling things off a queue and immediately putting it into another queue, where it gets consumed by a single worker, it sounds like its unnecessary overhead - you may as well just consume it directly in the worker. It's not exactly clear to me if that's the case, though - is the watcher consuming from a queue, or just putting items into one? If it is consuming from a queue, who is putting stuff into it?
I don't think it'll matter since I'm not multiprocessing, but is this setup affected by Python's GIL (which I believe still exists in 3.4) in any way?
Yes, this is affected by the GIL. Only one of your threads can run Python bytecode at a time, so won't get true parallelism, except when threads are running I/O (which releases the GIL). If your worker thread is doing CPU-bound activities, you should seriously consider running it in a separate process via multiprocessing, if possible.
Should the watcher thread be running continuously like that? From what I understand, and please correct me if I'm wrong, asyncio is supposed to be used for event-based multithreading, which seems relevant to what I'm trying to do.
It's hard to say, because I don't know exactly what "running continuously" means. What is it doing continuously? If it spends most of its time sleeping or blocking on a queue, it's fine - both of those things release the GIL. If it's constantly doing actual work, that will require the GIL, and therefore degrade the performance of the other threads in your app (assuming they're trying to do work at the same time). asyncio is designed for programs that are I/O-bound, and can therefore be run in a single thread, using asynchronous I/O. It sounds like your program may be a good fit for that depending on what your worker is doing.
The main thread is basically always just waiting for the user to press a key to access a different part of the menu. This seems like a situation asyncio would be perfect for, but, again, I'm not sure.
Any program where you're mostly waiting for I/O is potentially a good for for asyncio - but only if you can find a library that makes curses (or whatever other GUI library you eventually choose) play nicely with it. Most GUI frameworks come with their own event loop, which will conflict with asyncio's. You would need to use a library that can make the GUI's event loop play nicely with asyncio's event loop. You'd also need to make sure that you can find asyncio-compatible versions of any other synchronous-I/O based library your application uses (e.g. a database driver).
That said, you're not likely to see any kind of performance improvement by switching from your thread-based program to something asyncio-based. It'll likely perform about the same. Since you're only dealing with 3 threads, the overhead of context switching between them isn't very significant, so switching from that a single-threaded, asynchronous I/O approach isn't going to make a very big difference. asyncio will help you avoid thread synchronization complexity (if that's an issue with your app - it's not clear that it is), and at least theoretically, would scale better if your app potentially needed lots of threads, but it doesn't seem like that's the case. I think for you, it's basically down to which style you prefer to code in (assuming you can find all the asyncio-compatible libraries you need).

reactor design pattern in a single thread vs multiple threads

I've been reading about the reactor design pattern, specifically in the context of the Python Twisted networking framework. My simple understanding of the reactor design is that there is a single thread that will sit and wait until one or more I/O sources (or file descriptors) become available, and then it will synchronously loop through each of those sources, doing whatever callbacks specified for each of these sources. Which does mean that the program as a whole would block if any of the callbacks are themselves blocking. And regardless, once all callbacks have executed, the reactor goes back to waiting for more I/O sources to become ready.
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source. I imagine this may be less efficient if all your callbacks are very fast, as the OS now has to deal with managing multiple threads and swapping between them. But it seems that it's now impossible to block the main program, and as an added benefit, the main reactor can keep listening for sources. In short, why does something like Twisted not do this, instead keeping to a single-threaded model?
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source.
What you're describing is basically what happens in a multithreaded program that uses blocking I/O APIs. In this case, the "reactor" moves into the kernel and the "asynchronous looping" is the kernel completing some outstanding blocking operation, freeing up a user-space thread to proceed.
The cons of this approach are the greatly increased complexity with respect to thread-safety (ie, correctness) that it incurs compared to a strictly single-threaded approach.
The pros are better utilization of multiple CPUs (but running multiple single-threaded event-driven processes often offers this benefit as well) and the greater number of programmers who are familiar and comfortable (though often mistakenly so) with the multithreading approach to concurrency.
Also related, though, are the PyPy team's efforts towards providing a better abstraction over the conventional multithreading model. PyPy's work towards Software Transactional Memory (STM) could offer a system in which work is dispatched asynchronously to multiple worker threads without violating the assumptions that are valid in a strictly single-threaded system. If this works out, it could offer the best of both worlds.
But it seems that it's now impossible to block the main program,
I'm not a Python guy but have done this in the context of Boost. Asio. You're correct—your callbacks need to execute quickly and return control to the main reactor. The idea is to only use asynchronous calls in your callbacks. For example, you wouldn't use an API for sending an IP datagram that blocks and returns a status code. Instead, you'd use a non-blocking API where you register success and failure callbacks. This lets the call send call return immediately. The reactor will then call the success/failure callback once the OS has dealt with the packet.

Twisted - should this code be run in separate threads

I am running some code that has X workers, each worker pulling tasks from a queue every second. For this I use twisted's task.LoopingCall() function. Each worker fulfills its request (scrape some data) and then pushes the response back to another queue. All this is done in the reactor thread since I am not deferring this to any other thread.
I am wondering whether I should run all these jobs in separate threads or leave them as they are. And if so, is there a problem if I call task.LoopingCall every second from each thread ?
No, you shouldn't use threads. You can't call LoopingCall from a thread (unless you use reactor.callFromThread), but it wouldn't help you make your code faster.
If you notice a performance problem, you may want to profile your workload, figure out where the CPU-intensive work is, and then put that work into multiple processes, spawned with spawnProcess. You really can't skip the step where you figure out where the expensive work is, though: there's no magic pixie dust you can sprinkle on your Twisted application that will make it faster. If you choose a part of your code which isn't very intensive and doesn't require blocking resources like CPU or disk, then you will discover that the overhead of moving work to a different process may outweigh any benefit of having it there.
You shouldn't use threads for that. Doing it all in the reactor thread is ok. If your scraping uses twisted.web.client to do the network access, it shouldn't block, so you will go as fast as it gets.
First, beware that Twisted's reactor sometimes multithreads and assigns tasks without telling you anything. Of course, I haven't seen your program in particular.
Second, in Python (that is, in CPython) spawning threads to do non-blocking computation has little benefit. Read up on the GIL (Global Interpreter Lock).

Categories