python: waiting on multiple objects (Queue, Lock, Condition, etc.) - python

I'm using the Python threading library. Works fine (subject to the Global Interpreter Lock, of course).
Now I have a condundrum. I have two separate sources of concurrency: either two Queues, or a Queue and a Condition. How can I wait for the first one that is ready? (They have to be separate objects since they are owned by different modular parts of my application.)
Windows has the WaitForMultipleObjects function; is there something similar for Python concurrency primitives?

There is not an already existing function that I know of that you asked about. However there is the threading.enumaerate() which I think just might return a list off all current daemon threads no matter the source. Once you have that list you could iterate over it looking for the condition you want. To set a thread as a daemon each thread has a method that can be called like thread.setDaemon(True) before the thread is started.
I cant say for sure that this is your answer. I don't have as much experience as apparently you do, but I looked this up in a book I have, The Python Standard Library by Example - by Doug Hellmann. He has 23 pages on managing concurrent operations in the section on threading and enumerate seamed to be something that would help.

You could create a new synchronization object (queue, condition, etc.) let's call it the ready_event, and one Thread for each sync object you want to watch. Each thread waits for its sync object to be ready, when a thread's sync object is ready, the thread signals it via the ready_event. after you created and started the threads, you can wait on that ready_event.

Related

Does threading.Condition maintain a collection of Thread objects?

Trying to wrap my wits around how threading works. The high-level language in the docs and source code is helpful up to a degree but still leaves me scratching my head. What exactly, in terms of data structures, is the relationship between Thread and Condition objects? What does it mean when a thread "releases" a lock? That the Condition object dequeues its reference to the thread? Is there a lower-level description of these interactions, preferably in Python terms, to be found on the Internet?
A Condition maintains a list (actually a collections.deque) of what are notionally threads, waiting on the condition. It actually stores locks that the waiting threads are blocked on, but thinking of it storing the threads is a conceptual shortcut if you don't care too much about the implementation. The list is initially empty, but any time a thread calls the Condition's wait method, it will create a new lock and add it to the list before blocking on the lock (conceptually, this adds the thread to the list, and suspends it). Locks are removed from the list after another thread calls notify or notify_all, which unlocks one or more of the lock objects in the list, waking up the corresponding threads.
Releasing a lock means unlocking it. It's a basic operation on a Lock object (the reverse of acquire, which locks the Lock). A lock is "held" in between an acquire and a release, and only one thread can hold a Lock at a given time (other threads will either block in acquire, or the operation will fail, perhaps after a timeout). You can use the context manager protocol to call acquire and release for you in simple cases:
with some_lock: # this acquires some_lock, blocking until it's available
do_stuff() # some_lock is held while this runs
# some_lock will be released automatically when the with block ends
Each Condition object is associated with a Lock, either a pre-existing one that you pass to its constructor, or one it creates internally for you (if you don't pass anything). The main Condition operations (wait and notify, and their variants) require that you already hold the associated lock before you call them. You can do the lock operations directly on the Condition object itself, since it proxies the Lock's acquire and release methods (and the equivalent context manager methods).
The Condition class is written in pure Python, so if you want to know how it works on a low level, there's probably no better source of information than the source code itself!
It might also be useful to see how a Condition is used to synchronize multithreaded access to an object. A good example of that is the queue module in the standard library, where each Queue uses three Conditions (not_full, not_empty and all_tasks_done) to efficiently manage threads that are trying to access or modify its data.

When should I be using asyncio over regular threads, and why? Does it provide performance increases?

I have a pretty basic understanding of multithreading in Python and an even basic-er understanding of asyncio.
I'm currently writing a small Curses-based program (eventually going to be using a full GUI, but that's another story) that handles the UI and user IO in the main thread, and then has two other daemon threads (each with their own queue/worker-method-that-gets-things-from-a-queue):
a watcher thread that watches for time-based and conditional (e.g. posts to a message board, received messages, etc.) events to occur and then puts required tasks into...
the other (worker) daemon thread's queue which then completes them.
All three threads are continuously running concurrently, which leads me to some questions:
When the worker thread's queue (or, more generally, any thread's queue) is empty, should it be stopped until is has something to do again, or is it okay to leave continuously running? Do concurrent threads take up a lot of processing power when they aren't doing anything other than watching its queue?
Should the two threads' queues be combined? Since the watcher thread is continuously running a single method, I guess the worker thread would be able to just pull tasks from the single queue that the watcher thread puts in.
I don't think it'll matter since I'm not multiprocessing, but is this setup affected by Python's GIL (which I believe still exists in 3.4) in any way?
Should the watcher thread be running continuously like that? From what I understand, and please correct me if I'm wrong, asyncio is supposed to be used for event-based multithreading, which seems relevant to what I'm trying to do.
The main thread is basically always just waiting for the user to press a key to access a different part of the menu. This seems like a situation asyncio would be perfect for, but, again, I'm not sure.
Thanks!
When the worker thread's queue (or, more generally, any thread's queue) is empty, should it be stopped until is has something to do again, or is it okay to leave continuously running? Do concurrent threads take up a lot of processing power when they aren't doing anything other than watching its queue?
You should just use a blocking call to queue.get(). That will leave the thread blocked on I/O, which means the GIL will be released, and no processing power (or at least a very minimal amount) will be used. Don't use non-blocking gets in a while loop, since that's going to require a lot more CPU wakeups.
Should the two threads' queues be combined? Since the watcher thread is continuously running a single method, I guess the worker thread would be able to just pull tasks from the single queue that the watcher thread puts in.
If all the watcher is doing is pulling things off a queue and immediately putting it into another queue, where it gets consumed by a single worker, it sounds like its unnecessary overhead - you may as well just consume it directly in the worker. It's not exactly clear to me if that's the case, though - is the watcher consuming from a queue, or just putting items into one? If it is consuming from a queue, who is putting stuff into it?
I don't think it'll matter since I'm not multiprocessing, but is this setup affected by Python's GIL (which I believe still exists in 3.4) in any way?
Yes, this is affected by the GIL. Only one of your threads can run Python bytecode at a time, so won't get true parallelism, except when threads are running I/O (which releases the GIL). If your worker thread is doing CPU-bound activities, you should seriously consider running it in a separate process via multiprocessing, if possible.
Should the watcher thread be running continuously like that? From what I understand, and please correct me if I'm wrong, asyncio is supposed to be used for event-based multithreading, which seems relevant to what I'm trying to do.
It's hard to say, because I don't know exactly what "running continuously" means. What is it doing continuously? If it spends most of its time sleeping or blocking on a queue, it's fine - both of those things release the GIL. If it's constantly doing actual work, that will require the GIL, and therefore degrade the performance of the other threads in your app (assuming they're trying to do work at the same time). asyncio is designed for programs that are I/O-bound, and can therefore be run in a single thread, using asynchronous I/O. It sounds like your program may be a good fit for that depending on what your worker is doing.
The main thread is basically always just waiting for the user to press a key to access a different part of the menu. This seems like a situation asyncio would be perfect for, but, again, I'm not sure.
Any program where you're mostly waiting for I/O is potentially a good for for asyncio - but only if you can find a library that makes curses (or whatever other GUI library you eventually choose) play nicely with it. Most GUI frameworks come with their own event loop, which will conflict with asyncio's. You would need to use a library that can make the GUI's event loop play nicely with asyncio's event loop. You'd also need to make sure that you can find asyncio-compatible versions of any other synchronous-I/O based library your application uses (e.g. a database driver).
That said, you're not likely to see any kind of performance improvement by switching from your thread-based program to something asyncio-based. It'll likely perform about the same. Since you're only dealing with 3 threads, the overhead of context switching between them isn't very significant, so switching from that a single-threaded, asynchronous I/O approach isn't going to make a very big difference. asyncio will help you avoid thread synchronization complexity (if that's an issue with your app - it's not clear that it is), and at least theoretically, would scale better if your app potentially needed lots of threads, but it doesn't seem like that's the case. I think for you, it's basically down to which style you prefer to code in (assuming you can find all the asyncio-compatible libraries you need).

Invoking a method in a thread

My code spawns a number of threads to manage communications with a number of I/O boards. Generally the threads receive events from the boards and update external data sources as necessary. The threads (1 or more) are invoked as:
phThreadDict[devId] = ifkit(self, phDevId, phIpAddr, phIpPort, phSerial)
phThreadDict[devId].start()
This works fine. However, in some cases I also need the thread to send a message to the boards. The thread contains a method that does the work and I call that method, from the main thread, as: (this example turns on a digital output)
phThreadDict[devId].writeDigitalOutput(digitalOut, True)
this is the method contained in the thread:
def writeDigitalOutput(self,index, state):
interfaceKit.setOutputState(index, state)
threading.enumerate() produces:
{134997634: <ifkit(Thread-1, started daemon)>, 554878244: <ifkit(Thread-3, started daemon)>, 407897606: <tempsensor(Thread-4, started daemon)>}
and the instance is:
<ifkit(Thread-3, started daemon)>
This works fine if I have only one thread. But, if I have multiple threads, only one is used - the choice appears to be made at random when the program starts.
I suspect that storing the thread identifier in the dict is the problem, but still, it works with one thread.
Instead of storing your threads in a "simple" associative array maybe you should instantiate a threadpool beforehand (you can find an example of implementation here h**p://code.activestate.com/recipes/577187-python-thread-pool/ or directly use the following lib http://pypi.python.org/pypi/threadpool).
Also instantiate a "watchdog", each of your thread will hold a reference to this watchdog, so when your threads need to do their callback they'll send back the info to this watchdog. (beware of the deadlock, look at http://dabeaz.blogspot.fr/2009/11/python-thread-deadlock-avoidance_20.html).
Note : sorry for the lame "h**p" but SO won't let me post more than 2 links....

What's the pythonic way to deal with worker processes that must coordinate their tasks?

I'm currently learning Python (from a Java background), and I have a question about something I would have used threads for in Java.
My program will use workers to read from some web-service some data periodically. Each worker will call on the web-service at various times periodically.
From what I have read, it's preferable to use the multiprocessing module and set up the workers as independent processes that get on with their data-gathering tasks. On Java I would have done something conceptually similar, but using threads. While it appears I can use threads in Python, I'll lose out on multi-cpu utilisation.
Here's the guts of my question: The web-service is throttled, viz., the workers must not call on it more than x times per second. What is the best way for the workers to check on whether they may request data?
I'm confused as to whether this should be achieved using:
Pipes as a way to communicate to some other 'managing object', which monitors the total calls per second.
Something along the lines of nmap, to share some data/value between the processes that describes if they may call the web-service.
A Manager() object that monitors the calls per seconds and informs workers if they have permission to make their calls.
Of course, I guess this may come down to how I keep track of the calls per second. I suppose one option would be for the workers to call a function on some other object, which makes the call to the web-service and records the current number of calls/sec. Another option would be for the function that calls the web-service to live within each worker, and for them to message a managing object every time they make a call to the web-service.
Thoughts welcome!
Delegate the retrieval to a separate process which queues the requests until it is their turn.
I think that you'll find that the multiprocessing module will provide you with some fairly familiar constructs.
You might find that multiprocessing.Queue is useful for connecting your worker threads back to a managing thread that could provide monitoring or throttling.
Not really an answer to your question, but an alternative approach to your problem: You could get rid of synchronization issues when doing requests event driven, e.g. by using the Python async module or Twisted. You wouldn't benefit from multiple CPUs/cores, but in context of network communication that's usually negligible.

A multi-part/threaded downloader via python?

I've seen a few threaded downloaders online, and even a few multi-part downloaders (HTTP).
I haven't seen them together as a class/function.
If any of you have a class/function lying around, that I can just drop into any of my applications where I need to grab multiple files, I'd be much obliged.
If there is there a library/framework (or a program's back-end) that does this, please direct me towards it?
Threadpool by Christopher Arndt may be what you're looking for. I've used this "easy to use object-oriented thread pool framework" for the exact purpose you describe and it works great. See the usage examples at the bottom on the linked page. And it really is easy to use: just define three functions (one of which is an optional exception handler in place of the default handler) and you are on your way.
from http://www.chrisarndt.de/projects/threadpool/:
Object-oriented, reusable design
Provides callback mechanism to process results as they are returned from the worker threads.
WorkRequest objects wrap the tasks assigned to the worker threads and allow for easy passing of arbitrary data to the callbacks.
The use of the Queue class solves most locking issues.
All worker threads are daemonic, so they exit when the main program exits, no need for joining.
Threads start running as soon as you create them. No need to start or stop them. You can increase or decrease the pool size at any time, superfluous threads will just exit when they finish their current task.
You don't need to keep a reference to a thread after you have assigned the last task to it. You just tell it: "don't come back looking for work, when you're done!"
Threads don't eat up cycles while waiting to be assigned a task, they just block when the task queue is empty (though they wake up every few seconds to check whether they are dismissed).
Also available at http://pypi.python.org/pypi/threadpool, easy_install, or as a subversion checkout (see project homepage).

Categories