As far as we know, it is bad if we start too many threads, and it may significantly decrease performance and increase memory usage. However, I can't find anywhere if the situation is the same if we call too many async functions.
As far as I know, asyncio is a kind of abstraction for parallel computing, and it may use or may not use actual threading.
In my project, multiple asynchronous tasks are run, and each such task (currently, it is done using threading) may start other threads. It is a risky situation. I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?
As far as I know, asyncio is a kind of abstraction for parallel computing and it may use or may not use actual threading.
Please do not confuse parallelism with asynchronous. In Python, you can achieve parallelism only using multiprocessing.
In my project, multiple asynchronous tasks are run, and each such task may start other threads.
All asynchronous tasks are run in one event loop and use only one thread.
I'm thinking of two ways how to solve the issue with too many threads. The first one is to limit the number of 'software' threads to the number of 'hardware' threads. Another one is to use asyncio. Is the second option reasonable in such a case?
In this answer I have demonstrated situations where we can use async functions. It mainly depends on the operations you do. If your application works with threading and does not need multiprocessing, it can be converted to asynchronous tasks.
I was wondering what is the most efficient way, from a performance perspective, to use the TWS/IB API in Python? I want to compute and update my strategies based on real-time data (Python has a lot of libraries that may be helpful in contrast to Java I think) and based on that send buy/sell orders. These strategies computations may involve quite some processing time, so in that sense, I was thinking about implementing some sort of threading/concurrency (for Java it uses 3 threads if I understand correctly, see *1).
I know there is IBpy (I think it is the same only wrapped up some things for convenience). I came accross IB-insync as an alternative to threading in Python due to Python's concurrency limitations, if I understand correctly:
https://ib-insync.readthedocs.io/api.html
which implements the IB API asynchronously and single-threaded.
Reading about concurrency in Python here:
https://realpython.com/python-concurrency/
async has some major advantages if I understand correctly since Python was designed using Global Interpreter Lock (GIL) (only one thread to hold the control of the Python interpreter). However, the IB-insync library may have some limitations too (but can be fixed by adapting code, as suggested below):
If, for example, the user code spends much time in a calculation, or
uses time.sleep() with a long delay, the framework will stop spinning,
messages accumulate and things may go awry
If a user operation takes a long time then it can be farmed out to a
different process. Alternatively the operation can be made such that
it periodically calls IB.sleep(0); This will let the framework handle
any pending work and return when finished. The operation should be
aware that the current state may have been updated during the sleep(0)
call.
For introducing a delay, never use time.sleep() but use sleep()
instead.
Would a multi-threading solution be better just like Java (I do not know if there is a Java Async equivalent which can be combined with a lot of easy tools/libs that manipulate data)? Or should I stick to Python Async? Other suggestions are welcome, too. With regard to multiple threads in Python (and Java), the following site:
https://interactivebrokers.github.io/tws-api/connection.html
mentions (*1):
API programs always have at least two threads of execution. One thread
is used for sending messages to TWS, and another thread is used for
reading returned messages. The second thread uses the API EReader
class to read from the socket and add messages to a queue. Everytime a
new message is added to the message queue, a notification flag is
triggered to let other threads now that there is a message waiting to
be processed. In the two-thread design of an API program, the message
queue is also processed by the first thread. In a three-thread design,
an additional thread is created to perform this task.
The phrase "The two-threaded design is used in the IB Python sample Program.py..." suggests that there are already two threads involved, which is a little bit confusion to me since the second reference mentions Python being single-threaded.
Python is not technically single-threaded, you can create multiple threads in Python, but there is GIL, which only allows one thread to run at a time, that is why it is sometimes said single-threaded ! But, GIL handles it so efficiently that it doesn't seem single threaded ! I have used multi-threading in Python and it is good . GIL handles all the orchestration of switching and swapping threads, but this proves to be a significance to single-threaded programs as a small speed boost, and a bit slow in multi-threaded programs .
I am also searching for a multi-threaded SDK for IB API ! I have not found one yet, except the Native one, which is a bit complicated for me .
And IB_Insync is not allowing for multi-threading :(
Btw, I am new to Stack Overflow, so don't mind me ...
Does the presence of python GIL imply that in python multi threading the same operation is not so different from repeating it in a single thread?.
For example, If I need to upload two files, what is the advantage of doing them in two threads instead of uploading them one after another?.
I tried a big math operation in both ways. But they seem to take almost equal time to complete.
This seems to be unclear to me. Can someone help me on this?.
Thanks.
Python's threads get a slightly worse rap than they deserve. There are three (well, 2.5) cases where they actually get you benefits:
If non-Python code (e.g. a C library, the kernel, etc.) is running, other Python threads can continue executing. It's only pure Python code that can't run in two threads at once. So if you're doing disk or network I/O, threads can indeed buy you something, as most of the time is spent outside of Python itself.
The GIL is not actually part of Python, it's an implementation detail of CPython (the "reference" implementation that the core Python devs work on, and that you usually get if you just run "python" on your Linux box or something.
Jython, IronPython, and any other reimplementations of Python generally do not have a GIL, and multiple pure-Python threads can execute simultaneously.
The 0.5 case: Even if you're entirely pure-Python and see little or no performance benefit from threading, some problems are really convenient in terms of developer time and difficulty to solve with threads. This depends in part on the developer, too, of course.
It really depends on the library you're using. The GIL is meant to prevent Python objects and its internal data structures to be changed at the same time. If you're doing an upload, the library you use to do the actual upload might release the GIL while it's waiting for the actual HTTP request to complete (I would assume that is the case with the HTTP modules in the standard library, but I didn't check).
As a side note, if you really want to have things running in parallel, just use multiple processes. It will save you a lot of trouble and you'll end up with better code (more robust, more scalable, and most probably better structured).
It depends on the native code module that's executing. Native modules can release the GIL and then go off and do their own thing allowing another thread to lock the GIL. The GIL is normally held while code, both python and native, are operating on python objects. If you want more detail you'll probably need to go and read quite a bit about it. :)
See:
What is a global interpreter lock (GIL)? and Thread State and the Global Interpreter Lock
Multithreading is a concept where two are more tasks need be completed simultaneously, for example, I have word processor in this application there are N numbers of a parallel task have to work. Like listening to keyboard, formatting input text, sending a formatted text to display unit. In this context with sequential processing, it is time-consuming and one task has to wait till the next task completion. So we put these tasks in threads and simultaneously complete the task. Three threads are always up and waiting for the inputs to arrive, then take that input and produce the output simultaneously.
So multi-threading works faster if we have multi-core and processors. But in reality with single processors, threads will work one after the other, but we feel it's executing with greater speed, Actually, one instruction executes at a time and a processor can execute billions of instructions at a time. So the computer creates illusion that multi-task or thread working parallel. It just an illusion.
I've been reading about the reactor design pattern, specifically in the context of the Python Twisted networking framework. My simple understanding of the reactor design is that there is a single thread that will sit and wait until one or more I/O sources (or file descriptors) become available, and then it will synchronously loop through each of those sources, doing whatever callbacks specified for each of these sources. Which does mean that the program as a whole would block if any of the callbacks are themselves blocking. And regardless, once all callbacks have executed, the reactor goes back to waiting for more I/O sources to become ready.
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source. I imagine this may be less efficient if all your callbacks are very fast, as the OS now has to deal with managing multiple threads and swapping between them. But it seems that it's now impossible to block the main program, and as an added benefit, the main reactor can keep listening for sources. In short, why does something like Twisted not do this, instead keeping to a single-threaded model?
What are the pros and cons of this, compared to asynchronously looping through each source as they appear, i.e. launching a separate thread for each source.
What you're describing is basically what happens in a multithreaded program that uses blocking I/O APIs. In this case, the "reactor" moves into the kernel and the "asynchronous looping" is the kernel completing some outstanding blocking operation, freeing up a user-space thread to proceed.
The cons of this approach are the greatly increased complexity with respect to thread-safety (ie, correctness) that it incurs compared to a strictly single-threaded approach.
The pros are better utilization of multiple CPUs (but running multiple single-threaded event-driven processes often offers this benefit as well) and the greater number of programmers who are familiar and comfortable (though often mistakenly so) with the multithreading approach to concurrency.
Also related, though, are the PyPy team's efforts towards providing a better abstraction over the conventional multithreading model. PyPy's work towards Software Transactional Memory (STM) could offer a system in which work is dispatched asynchronously to multiple worker threads without violating the assumptions that are valid in a strictly single-threaded system. If this works out, it could offer the best of both worlds.
But it seems that it's now impossible to block the main program,
I'm not a Python guy but have done this in the context of Boost. Asio. You're correct—your callbacks need to execute quickly and return control to the main reactor. The idea is to only use asynchronous calls in your callbacks. For example, you wouldn't use an API for sending an IP datagram that blocks and returns a status code. Instead, you'd use a non-blocking API where you register success and failure callbacks. This lets the call send call return immediately. The reactor will then call the success/failure callback once the OS has dealt with the packet.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
What are the modules used to write multi-threaded applications in Python? I'm aware of the basic concurrency mechanisms provided by the language and also of Stackless Python, but what are their respective strengths and weaknesses?
In order of increasing complexity:
Use the threading module
Pros:
It's really easy to run any function (any callable in fact) in its
own thread.
Sharing data is if not easy (locks are never easy :), at
least simple.
Cons:
As mentioned by Juergen Python threads cannot actually concurrently access state in the interpreter (there's one big lock, the infamous Global Interpreter Lock.) What that means in practice is that threads are useful for I/O bound tasks (networking, writing to disk, and so on), but not at all useful for doing concurrent computation.
Use the multiprocessing module
In the simple use case this looks exactly like using threading except each task is run in its own process not its own thread. (Almost literally: If you take Eli's example, and replace threading with multiprocessing, Thread, with Process, and Queue (the module) with multiprocessing.Queue, it should run just fine.)
Pros:
Actual concurrency for all tasks (no Global Interpreter Lock).
Scales to multiple processors, can even scale to multiple machines.
Cons:
Processes are slower than threads.
Data sharing between processes is trickier than with threads.
Memory is not implicitly shared. You either have to explicitly share it or you have to pickle variables and send them back and forth. This is safer, but harder. (If it matters increasingly the Python developers seem to be pushing people in this direction.)
Use an event model, such as Twisted
Pros:
You get extremely fine control over priority, over what executes when.
Cons:
Even with a good library, asynchronous programming is usually harder than threaded programming, hard both in terms of understanding what's supposed to happen and in terms of debugging what actually is happening.
In all cases I'm assuming you already understand many of the issues involved with multitasking, specifically the tricky issue of how to share data between tasks. If for some reason you don't know when and how to use locks and conditions you have to start with those. Multitasking code is full of subtleties and gotchas, and it's really best to have a good understanding of concepts before you start.
You've already gotten a fair variety of answers, from "fake threads" all the way to external frameworks, but I've seen nobody mention Queue.Queue -- the "secret sauce" of CPython threading.
To expand: as long as you don't need to overlap pure-Python CPU-heavy processing (in which case you need multiprocessing -- but it comes with its own Queue implementation, too, so you can with some needed cautions apply the general advice I'm giving;-), Python's built-in threading will do... but it will do it much better if you use it advisedly, e.g., as follows.
"Forget" shared memory, supposedly the main plus of threading vs multiprocessing -- it doesn't work well, it doesn't scale well, never has, never will. Use shared memory only for data structures that are set up once before you spawn sub-threads and never changed afterwards -- for everything else, make a single thread responsible for that resource, and communicate with that thread via Queue.
Devote a specialized thread to every resource you'd normally think to protect by locks: a mutable data structure or cohesive group thereof, a connection to an external process (a DB, an XMLRPC server, etc), an external file, etc, etc. Get a small thread pool going for general purpose tasks that don't have or need a dedicated resource of that kind -- don't spawn threads as and when needed, or the thread-switching overhead will overwhelm you.
Communication between two threads is always via Queue.Queue -- a form of message passing, the only sane foundation for multiprocessing (besides transactional-memory, which is promising but for which I know of no production-worthy implementations except In Haskell).
Each dedicated thread managing a single resource (or small cohesive set of resources) listens for requests on a specific Queue.Queue instance. Threads in a pool wait on a single shared Queue.Queue (Queue is solidly threadsafe and won't fail you in this).
Threads that just need to queue up a request on some queue (shared or dedicated) do so without waiting for results, and move on. Threads that eventually DO need a result or confirmation for a request queue a pair (request, receivingqueue) with an instance of Queue.Queue they just made, and eventually, when the response or confirmation is indispensable in order to proceed, they get (waiting) from their receivingqueue. Be sure you're ready to get error-responses as well as real responses or confirmations (Twisted's deferreds are great at organizing this kind of structured response, BTW!).
You can also use Queue to "park" instances of resources which can be used by any one thread but never be shared among multiple threads at one time (DB connections with some DBAPI compoents, cursors with others, etc) -- this lets you relax the dedicated-thread requirement in favor of more pooling (a pool thread that gets from the shared queue a request needing a queueable resource will get that resource from the apppropriate queue, waiting if necessary, etc etc).
Twisted is actually a good way to organize this minuet (or square dance as the case may be), not just thanks to deferreds but because of its sound, solid, highly scalable base architecture: you may arrange things to use threads or subprocesses only when truly warranted, while doing most things normally considered thread-worthy in a single event-driven thread.
But, I realize Twisted is not for everybody -- the "dedicate or pool resources, use Queue up the wazoo, never do anything needing a Lock or, Guido forbid, any synchronization procedure even more advanced, such as semaphore or condition" approach can still be used even if you just can't wrap your head around async event-driven methodologies, and will still deliver more reliability and performance than any other widely-applicable threading approach I've ever stumbled upon.
It depends on what you're trying to do, but I'm partial to just using the threading module in the standard library because it makes it really easy to take any function and just run it in a separate thread.
from threading import Thread
def f():
...
def g(arg1, arg2, arg3=None):
....
Thread(target=f).start()
Thread(target=g, args=[5, 6], kwargs={"arg3": 12}).start()
And so on. I often have a producer/consumer setup using a synchronized queue provided by the Queue module
from Queue import Queue
from threading import Thread
q = Queue()
def consumer():
while True:
print sum(q.get())
def producer(data_source):
for line in data_source:
q.put( map(int, line.split()) )
Thread(target=producer, args=[SOME_INPUT_FILE_OR_SOMETHING]).start()
for i in range(10):
Thread(target=consumer).start()
Kamaelia is a python framework for building applications with lots of communicating processes.
(source: kamaelia.org) Kamaelia - Concurrency made useful, fun
In Kamaelia you build systems from simple components that talk to each other. This speeds development, massively aids maintenance and also means you build naturally concurrent software. It's intended to be accessible by any developer, including novices. It also makes it fun :)
What sort of systems? Network servers, clients, desktop applications, pygame based games, transcode systems and pipelines, digital TV systems, spam eradicators, teaching tools, and a fair amount more :)
Here's a video from Pycon 2009. It starts by comparing Kamaelia to Twisted and Parallel Python and then gives a hands on demonstration of Kamaelia.
Easy Concurrency with Kamaelia - Part 1 (59:08)
Easy Concurrency with Kamaelia - Part 2 (18:15)
Regarding Kamaelia, the answer above doesn't really cover the benefit here. Kamaelia's approach provides a unified interface, which is pragmatic not perfect, for dealing with threads, generators & processes in a single system for concurrency.
Fundamentally it provides a metaphor of a running thing which has inboxes, and outboxes. You send messages to outboxes, and when wired together, messages flow from outboxes to inboxes. This metaphor/API remains the same whether you're using generators, threads or processes, or speaking to other systems.
The "not perfect" part is due to syntactic sugar not being added as yet for inboxes and outboxes (though this is under discussion) - there is a focus on safety/usability in the system.
Taking the producer consumer example using bare threading above, this becomes this in Kamaelia:
Pipeline(Producer(), Consumer() )
In this example it doesn't matter if these are threaded components or otherwise, the only difference is between them from a usage perspective is the baseclass for the component. Generator components communicate using lists, threaded components using Queue.Queues and process based using os.pipes.
The reason behind this approach though is to make it harder to make hard to debug bugs. In threading - or any shared memory concurrency you have, the number one problem you face is accidentally broken shared data updates. By using message passing you eliminate one class of bugs.
If you use bare threading and locks everywhere you're generally working on the assumption that when you write code that you won't make any mistakes. Whilst we all aspire to that, it's very rare that will happen. By wrapping up the locking behaviour in one place you simplify where things can go wrong. (Context handlers help, but don't help with accidental updates outside the context handler)
Obviously not every piece of code can be written as message passing and shared style which is why Kamaelia also has a simple software transactional memory (STM), which is a really neat idea with a nasty name - it's more like version control for variables - ie check out some variables, update them and commit back. If you get a clash you rinse and repeat.
Relevant links:
Europython 09 tutorial
Monthly releases
Mailing list
Examples
Example Apps
Reusable components (generator & thread)
Anyway, I hope that's a useful answer. FWIW, the core reason behind Kamaelia's setup is to make concurrency safer & easier to use in python systems, without the tail wagging the dog. (ie the big bucket of components
I can understand why the other Kamaelia answer was modded down, since even to me it looks more like an ad than an answer. As the author of Kamaelia it's nice to see enthusiasm though I hope this contains a bit more relevant content :-)
And that's my way of saying, please take the caveat that this answer is by definition biased, but for me, Kamaelia's aim is to try and wrap what is IMO best practice. I'd suggest trying a few systems out, and seeing which works for you. (also if this is inappropriate for stack overflow, sorry - I'm new to this forum :-)
I would use the Microthreads (Tasklets) of Stackless Python, if I had to use threads at all.
A whole online game (massivly multiplayer) is build around Stackless and its multithreading principle -- since the original is just to slow for the massivly multiplayer property of the game.
Threads in CPython are widely discouraged. One reason is the GIL -- a global interpreter lock -- that serializes threading for many parts of the execution. My experiance is, that it is really difficult to create fast applications this way. My example codings where all slower with threading -- with one core (but many waits for input should have made some performance boosts possible).
With CPython, rather use seperate processes if possible.
If you really want to get your hands dirty, you can try using generators to fake coroutines. It probably isn't the most efficient in terms of work involved, but coroutines do offer you very fine control of co-operative multitasking rather than pre-emptive multitasking you'll find elsewhere.
One advantage you'll find is that by and large, you will not need locks or mutexes when using co-operative multitasking, but the more important advantage for me was the nearly-zero switching speed between "threads". Of course, Stackless Python is said to be very good for that as well; and then there's Erlang, if it doesn't have to be Python.
Probably the biggest disadvantage in co-operative multitasking is the general lack of workaround for blocking I/O. And in the faked coroutines, you'll also encounter the issue that you can't switch "threads" from anything but the top level of the stack within a thread.
After you've made an even slightly complex application with fake coroutines, you'll really begin to appreciate the work that goes into process scheduling at the OS level.