Python asyncio. Can too many asynchronous calls decrease performance? - python

As far as we know, it is bad if we start too many threads, and it may significantly decrease performance and increase memory usage. However, I can't find anywhere if the situation is the same if we call too many async functions.
As far as I know, asyncio is a kind of abstraction for parallel computing, and it may use or may not use actual threading.
In my project, multiple asynchronous tasks are run, and each such task (currently, it is done using threading) may start other threads. It is a risky situation. I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?

As far as I know, asyncio is a kind of abstraction for parallel computing and it may use or may not use actual threading.
Please do not confuse parallelism with asynchronous. In Python, you can achieve parallelism only using multiprocessing.
In my project, multiple asynchronous tasks are run, and each such task may start other threads.
All asynchronous tasks are run in one event loop and use only one thread.
I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?
In this answer I have demonstrated situations where we can use async functions. It mainly depends on the operations you do. If your application works with threading and does not need multiprocessing, it can be converted to asynchronous tasks.

Related

Does concurrency in Python mean executing multiple functions at the same time?

I have read through serval tutorials about concurrency in python and I also know the differences between concurrency and Parallelism, but I am still a little bit confused about the definition of concurrency.
Many people define concurrency as executing multiple tasks at the same time. I am wondering what are tasks in python. Are they functions? Can I say concurrency in Python is executing multiple functions at the same time?
Many people define concurrency as executing multiple tasks at the same time
The tasks here is not defined from the computer's view, but we human's view. As long as we can confirm things are not being served strictly in order(no one is blocking another), we can say they are happening concurrently.
Can I say concurrency in Python is executing multiple functions at the same time?
There are plenty of ways to support concurrency in Python, executing multiple functions(via multi-threads or multi-processes) at the same time is absolutely one of them(actually this is parallelism), but not the only one.

Timemout in python multithread/multiprocess worker

I'm trying to write program which executes time consuming tasks in parallel. I'd like to limit task execution time using signal within worker. Is it going to work? Any better approach?
Depending on you goal, trying to implement a multi-task operation on your own might be extremely time consuming and error prone. I would suggest you look into pre-made frameworks, such as Celery, which handle these exact tasks and allow, among other things, to set timeouts on tasks and so-forth.

Using defer module in python to execute a bunch of tasks async

I saw this page suggesting the usage of defer module to execute a series of tasks asynchronously.
I want to use it for my Project:
Calculate the median of each list of numbers I have (Got a list, containing lists of numbers)
Get the minimum and maximum medians, of all medians.
But for the matter of fact, I did not quite understand how to use it.
I would love some explanation about defer in python, and whether you think it is the appropriate way achieving my goal (considering the Global Interpreter Lock).
Thanks in advance!
No, using asynchronous programming (cooperative routines, aka coroutines), will not help your use case. Async is great for I/O intensive workloads, or anything else that has to wait for slower, external events to fire.
Coroutines work because they give up control (yield) to other coroutines whenever they have to wait for something (usually for some I/O to take place). If they do this frequently, the event loop can alternate between loads of coroutines, often far more than what threading could achieve, with a simpler programming model (no need to lock data structures all the time).
Your use-case is not waiting for I/O however; you have a computationally heavy workload. Such workloads do not have obvious places to yield, and because they don't need wait for external events, there is no reason to do so anyway. For such a workload, use a multiprocessing model to do work in parallel on different CPU cores.
Asynchronous programming does not defeat the GIL either, but does give the event loop the opportunity to move the waiting for I/O parts to C code that can unlock the GIL and handle all that I/O processing in parallel while other Python code (in a different coroutine) can execute.
See this talk by my colleague Ɓukasz Langa at PyCON 2016 for a good introduction to async programming.

python reuse thread object

Is it possible to reuse python thread object to avoid unnecessary creation of the thread?
It could be useful in the following situation. There are many tasks which must be parallelized using thread pool which size is much less than the number of the tasks.
I know that there is multiprocessing.Pool, but it is very important to use threads not processes.
If you're using Python 3, the best way is definetly to use concurrent.futures.ThreadPoolExecutor.
Actually you should read the whole documentation of concurrent.futures, it's not long, and there are many great examples.

Is Multithreading (in python) the same as calling the script multiple times?

Let's assume we have some task, that could be divided into independent subtasks and we want to process these tasks in parallel on the same machine.
I read about multithreading and ran into this post, which describes GlobalInterpreterLocks. Since I do not understand fully how processes are handled under the hood, I got to ask:
Putting aside the gain of threading: Is Multithreading (in my case in python) effectively the same as calling a script multiple times?
I hope this question does not lead to far and its answer is understandable for someone whose knowledge about the things happening on the low levels of a computer are sparse. Thanks for any enlightening in this matter.
Is Multithreading (in my case in python) effectivle the same as calling a script multiple times?
In a word, no.
Due to the GIL, in Python it is far easier to achieve true parallelism by using multiple processes than it is by using multiple threads. Calling the script multiple times (presumably with different arguments) is an example of using multiple processes. The multiprocessing module is another way to achieve parallelism by using multiple processes. Both are likely to give better performance than using threads.
If I were you, I'd probably consider multiprocessing as the first choice for distributing work across cores.
It is not the same thing one is Multithreading while the other is opening separate process for one another:
here is a short explanation taken from here :
It is important to first define the differences between processes and
threads. Threads are different than processes in that they share
state, memory, and resources. This simple difference is both a
strength and a weakness for threads. On one hand, threads are
lightweight and easy to communicate with, but on the other hand, they
bring up a whole host of problems including deadlocks, race
conditions, and sheer complexity. Fortunately, due to both the GIL and
the queuing module, threading in Python is much less complex to
implement than in other languages.

Categories