I have an application (actually a plugin for another application) that manages threads to communicate with external sensor devices. The external sensors send events to the application, but the application may also send actions to the sensors. There are several types of devices and each has unique qualities (temperature, Pressure, etc.) that require special coding. All communications with the sensor devices is over IP.
In the applications, I create a thread for each instance of a sensor. This is an example of the code
self.phThreadDict[phDevId] = tempsensor(self, phDevId, phIpAddr, phIpPort, phSerial, self.triggerDict)
self.phThreadDict[phDevId].start()
In each thread I setup callback handlers for events sent by the sensor and then go into a loop at the end.
while not self.shutdown:
self.plugin.sleep(0.5)
The threads then handle incoming events and make calls into the main thread, or the actual program that spawned the main thread. All of this works quite well.
But, at times I also need to send requests to a specific sensor. Methods are defined in each thread for that purpose and I call those methods from the main thread. For example:
self.phThreadDict[dev.id].writeDigitalOutput(textLine, lcdMessage)
This also works, but I believe the code is actually executed in the main thread rather than in the thread specific to the sensor.
My question is: What options do I have for passing work to the specific target thread and having the thread execute the work and then return success or fail?
Expanding a bit on Thomas Orozco's spot-on comments,
self.phThreadDict[dev.id].writeDigitalOutput(textLine, lcdMessage)
is executed in whichever thread runs it. If you run it from the main thread, then the main thread will do all of it. If from some other thread, then that thread will run it.
In addition to a Queue per thread, for the threads to receive descriptions of work items to process, you also want a single Queue for threads to put results on (you can also use another Queue per thread for this, but that's overkill).
The main thread will pull results off the latter Queue. Note that you can - and it's very common to do so - put tuples on Queues. So, for example, on the talk-back-to-the-main-thread Queue threads will likely put tuples of the form:
(result, my_thread_id, original_work_description)
That's enough to figure out which thread returned what result in response to which work item. Maybe you don't need all of that. Maybe you need more than that. Can't guess ;-)
Indeed, this is executing code in the main thread.
Use queues, that's what they're meant for (task synchronization and message passing between threads).
Use one queue per sensor manager thread.
Your sensor manager threads should be getting items from the queue instead of sleeping (this is a blocking call).
Your "main" thread should be putting items in the queue instead of running functions (this is generally a non-blocking call).
All you need to do is define a message format that lets the main thread tell the manager threads what functions to execute and what arguments to use.
Related
I'm using the Python threading library. Works fine (subject to the Global Interpreter Lock, of course).
Now I have a condundrum. I have two separate sources of concurrency: either two Queues, or a Queue and a Condition. How can I wait for the first one that is ready? (They have to be separate objects since they are owned by different modular parts of my application.)
Windows has the WaitForMultipleObjects function; is there something similar for Python concurrency primitives?
There is not an already existing function that I know of that you asked about. However there is the threading.enumaerate() which I think just might return a list off all current daemon threads no matter the source. Once you have that list you could iterate over it looking for the condition you want. To set a thread as a daemon each thread has a method that can be called like thread.setDaemon(True) before the thread is started.
I cant say for sure that this is your answer. I don't have as much experience as apparently you do, but I looked this up in a book I have, The Python Standard Library by Example - by Doug Hellmann. He has 23 pages on managing concurrent operations in the section on threading and enumerate seamed to be something that would help.
You could create a new synchronization object (queue, condition, etc.) let's call it the ready_event, and one Thread for each sync object you want to watch. Each thread waits for its sync object to be ready, when a thread's sync object is ready, the thread signals it via the ready_event. after you created and started the threads, you can wait on that ready_event.
My code spawns a number of threads to manage communications with a number of I/O boards. Generally the threads receive events from the boards and update external data sources as necessary. The threads (1 or more) are invoked as:
phThreadDict[devId] = ifkit(self, phDevId, phIpAddr, phIpPort, phSerial)
phThreadDict[devId].start()
This works fine. However, in some cases I also need the thread to send a message to the boards. The thread contains a method that does the work and I call that method, from the main thread, as: (this example turns on a digital output)
phThreadDict[devId].writeDigitalOutput(digitalOut, True)
this is the method contained in the thread:
def writeDigitalOutput(self,index, state):
interfaceKit.setOutputState(index, state)
threading.enumerate() produces:
{134997634: <ifkit(Thread-1, started daemon)>, 554878244: <ifkit(Thread-3, started daemon)>, 407897606: <tempsensor(Thread-4, started daemon)>}
and the instance is:
<ifkit(Thread-3, started daemon)>
This works fine if I have only one thread. But, if I have multiple threads, only one is used - the choice appears to be made at random when the program starts.
I suspect that storing the thread identifier in the dict is the problem, but still, it works with one thread.
Instead of storing your threads in a "simple" associative array maybe you should instantiate a threadpool beforehand (you can find an example of implementation here h**p://code.activestate.com/recipes/577187-python-thread-pool/ or directly use the following lib http://pypi.python.org/pypi/threadpool).
Also instantiate a "watchdog", each of your thread will hold a reference to this watchdog, so when your threads need to do their callback they'll send back the info to this watchdog. (beware of the deadlock, look at http://dabeaz.blogspot.fr/2009/11/python-thread-deadlock-avoidance_20.html).
Note : sorry for the lame "h**p" but SO won't let me post more than 2 links....
The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.
As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.
Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.
Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.
In my program I have a bunch of threads running and I'm trying
to interrupt the main thread to get it to do something asynchronously.
So I set up a handler and send the main process a SIGUSR1 - see the code
below:
def SigUSR1Handler(signum, frame):
self._logger.debug('Received SIGUSR1')
return
signal.signal(signal.SIGUSR1, SigUSR1Handler)
[signal.signal(signal.SIGUSR1, signal.SIG_IGN)]
In the above case, all the threads and the main process stops - from a 'c'
point of view this was unexpected - I want the threads to continue as they
were before the signal. If I put the SIG_IGN in instead, everything continues
fine.
Can somebody tell me how to do this? Maybe I have to do something with the 'frame'
manually to get back to where it was..just a guess though
thanks in advance,
Thanks for your help on this.
To explain a bit more, I have thread instances writing string information to
a socket which is also output to a file. These threads run their own timers so they
independently write their outputs to the socket. When the program runs I also see
their output on stdout but it all stops as soon as I see the debug line from the signal.
I need the threads to constantly send this info but I need the main program to
take a command so it also starts doing something else (in parallel) for a while.
I thought I'd just be able to send a signal from the command line to trigger this.
Mixing signals and threads is always a little precarious. What you describe should not happen, however. Python only handles signals in the main thread. If the OS delivered the signal to another thread, that thread may be briefly interrupted (when it's performing, say, a systemcall) but it won't execute the signal handler. The main thread will be asked to execute the signalhandler at the next opportunity.
What are your threads (including the main thread) actually doing when you send the signal? How do you notice that they all 'stop'? Is it a brief pause (easily explained by the fact that the main thread will need to acquire the GIL before handling the signal) or does the process break down entirely?
I'll sort-of answer my own question:
In my first attempt at this I was using time.sleep(run_time) in the main
thread to control how long the threads ran until they were stopped. By adding
debug I could see that the sleep loop seemed to be exiting as soon as the
signal handler returned so everything was shutting down normally but early!
I've replaced the sleep with a while loop and that doesn't jump out after
the signal handler returns so my threads keep running. So it solves the
problem but I'm still a bit puzzled about sleep()'s behaviour.
You should probably use a threading.Condition variable instead of sending signals. Have your main thread check it every loop and perform its special operation if it's been set.
If you insist on using signals, you'll want to move to using subprocess instead of threads, as your problem is likely due to the GIL.
Watch this presentation by David Beazley.
http://blip.tv/file/2232410
It also explains some quirky behavior related to threads and signals (Python specific, not the general quirkiness of the subject :-) ).
http://pyprocessing.berlios.de/ Pyprocessing is a neat library that makes it easier to work with separate processes in Python.
I have a wxPython application (http://www.OpenSTV.org) that counts ballots using methods that have multiple rounds. I'd like to do two things:
(1) For a large number of ballots, this can be a bit slow, so I'd like to show the user a progress dialog so he doesn't think the application is frozen.
(2) I'd like to allow the user to break ties manually, and this requires the counting code to show a dialog window.
To achieve (1), I create a thread to run the counting code, and this allows me to present a nice progress dialog to the user.
The problem with this, however, is that the counting code is not the main thread, and only the main thread in wxPython can process window events.
I suppose I could create a thread to run the progress dialog instead, but this seems awkward. Is there a better way of accomplishing both (1) and (2)?
Use Queue to communicate and synchronize among threads, with each thread "owning" and exclusively interacting with a resource that's not handy to share.
In GUI toolkits where only the main thread can really handle the GUI, the main thread should play along -- set up and start the threads doing the actual work, then do nothing but GUI work, using Queues to communicate to and from the other threads.
For (1), when your counting thread has an update, it should put it to the Queue where the main thread is waiting; when your main thread gets a suitable message on that Queue, it updates the progress dialog.
For (2), the counting thread sends the "have the user break a tie" request, main thread gets it and responds appropriately, and sends back the resolution on a separate Queue.
So in general, there are two kinds of communications: one that don't require a response, and others that do. For the former kind, just put the notification on the appropriate queue and simply proceed -- it will be acted on in due course. For the latter kind, my favorite idiom is to put on the appropriate queue a pair (request, response_queue). If otherwise identical requests differ in that some need a response and others don't, queueing (request, None) when no response is needed (and (request, q) where q's a Queue when a response IS needed) is a nice, easy, and general idiom, too.
There are several ways to call the main thread wxPython thread from a process thread. The simplest is wx.CallAfter() which will always execute the functional passed to it in the main thread. You can also use wx.PostEvent() and there's an example of this in the demo (labeled: Threads), and there are several more complicated but more customizable ways which are discussed in the last chapter of wxPython in Action.