twisted reactor invocations on separate thread

twisted reactor invocations on separate thread - python

I am working on an application that involves fetching data over tcp using twisted api.
Our process is listener application that keeps listening to events and does the following..
Process event notification and build dictionary to sent it to third party application
In order to complete dictionary..it calls a process using twisted api to get some additional data and complete the dictionary.
I can not execute twisted api executions on main thread else after single execution..reactor stops and main thread no further progress happens..
What I want is:
for each event notification--- spool a new thread to make a twisted call over tcp to get data.
join mainThread with newThread to wait for its completion
get results, merge with half built dictionary and send it to third party app.
Let say I listen to eventObj1 in main thread ---
processing involves steps for ex step1---step2---step3---step4--send to party.
assume step4 involves fetching data over tcp and we should wait until results are available before we can finish dictionary and send it to third party.
so as eventObj1 comes I queue it for fetching over tcp.. while doing so I say reactor.start() (reactor is started in the main thread) and after sometime I get the data and callback is invoked dict is built for event1 and send to third party.
and but I cant queue up any more events for data fetching, until I do reactor.stop() ..because until reactor.stop() is not called main thread can't go back to process eventObj2..
So I think, what I need is reactor start in separate thread..and keep queuing up events from main thread stop reactor before main program exits

You don't need any threads. You just want the reactor to do multiple things, which is actually the whole point of having a reactor. See this question for an explanation: Twisted reactor starting multiple times in a single program?

Related

Interruptable Sleep?

I am currently building a python app which should trigger functions at given timestamps entered by the user (not entered in chronological order).
I ran into a problem because I don't want my program to be busy-waiting checking if a new timestamp has been entered that must be added to the timer queue but also not creating a whole bunch of threads for every time a new timestamp is creating with its single purpose waiting for that timestamp.
What I thought of is putting it all together in one thread and doing something like an interruptable sleep, but I can't think of another way besides this:
while timer_not_depleted:
sleep(1)
if something_happened:
break
which is essentially busy-waiting.
So any suggestions on realizing an interruptable sleep?

Your intuition of using threads is correct. The following master-worker construction can work:
The master thread spawns a worker thread that waits for "jobs";
The two threads share a Queue - whenever a new job needs to be scheduled, the master thread puts a job specification into the queue;
Meanwhile, the worker thread does the following:
Maintain a separate list of future jobs to run and keep track of how long to keep sleeping until the next job runs;
Continue listening to new jobs by calling Queue.get(block=True, timeout=<time-to-next-job>);
In this case, if no new jobs are scheduled until the timeout, Queue.get will raise Empty exception - and at this point the worker thread should run the scheduled function and get back to polling. If a new job is scheduled in the meantime, Queue.get returns the new job, such that you can update the timeout value and then get back to waiting.

I'd like to suggest select.
Call it with a timeout equal to the delay to the nearest event (heap queue is a good data structure to maintain a queue of future timestamps) and provide a socket (as an item in the rlist arg), where your program listens on for updates from the user.
The select call returns when the socket has incoming data or when the timeout has occurred.

Python message passing to unique threads

I have an application (actually a plugin for another application) that manages threads to communicate with external sensor devices. The external sensors send events to the application, but the application may also send actions to the sensors. There are several types of devices and each has unique qualities (temperature, Pressure, etc.) that require special coding. All communications with the sensor devices is over IP.
In the applications, I create a thread for each instance of a sensor. This is an example of the code
self.phThreadDict[phDevId] = tempsensor(self, phDevId, phIpAddr, phIpPort, phSerial, self.triggerDict)
self.phThreadDict[phDevId].start()
In each thread I setup callback handlers for events sent by the sensor and then go into a loop at the end.
while not self.shutdown:
self.plugin.sleep(0.5)
The threads then handle incoming events and make calls into the main thread, or the actual program that spawned the main thread. All of this works quite well.
But, at times I also need to send requests to a specific sensor. Methods are defined in each thread for that purpose and I call those methods from the main thread. For example:
self.phThreadDict[dev.id].writeDigitalOutput(textLine, lcdMessage)
This also works, but I believe the code is actually executed in the main thread rather than in the thread specific to the sensor.
My question is: What options do I have for passing work to the specific target thread and having the thread execute the work and then return success or fail?

Expanding a bit on Thomas Orozco's spot-on comments,
self.phThreadDict[dev.id].writeDigitalOutput(textLine, lcdMessage)
is executed in whichever thread runs it. If you run it from the main thread, then the main thread will do all of it. If from some other thread, then that thread will run it.
In addition to a Queue per thread, for the threads to receive descriptions of work items to process, you also want a single Queue for threads to put results on (you can also use another Queue per thread for this, but that's overkill).
The main thread will pull results off the latter Queue. Note that you can - and it's very common to do so - put tuples on Queues. So, for example, on the talk-back-to-the-main-thread Queue threads will likely put tuples of the form:
(result, my_thread_id, original_work_description)
That's enough to figure out which thread returned what result in response to which work item. Maybe you don't need all of that. Maybe you need more than that. Can't guess ;-)

Indeed, this is executing code in the main thread.
Use queues, that's what they're meant for (task synchronization and message passing between threads).
Use one queue per sensor manager thread.
Your sensor manager threads should be getting items from the queue instead of sleeping (this is a blocking call).
Your "main" thread should be putting items in the queue instead of running functions (this is generally a non-blocking call).
All you need to do is define a message format that lets the main thread tell the manager threads what functions to execute and what arguments to use.

Non-blocking server in Twisted

I am building an application that needs to run a TCP server on a thread other than the main. When trying to run the following code:
reactor.listenTCP(ServerConfiguration.tcpport, TcpCommandFactory())
reactor.run()
I get the following error
exceptions.ValueError: signal only works in main thread
Can I run the twisted servers on threads other than the main one?

Twisted can run in any thread - but only one thread at a time. If you want to run in the non-main thread, simply do reactor.run(installSignalHandlers=False). However, you cannot use a reactor on the non-main thread to spawn subprocesses, because their termination will never be detected. (This is a limitation of UNIX, really, not of Twisted.)

What's the best pattern to design an asynchronous RPC application using Python, Pika and AMQP?

The producer module of my application is run by users who want to submit work to be done on a small cluster. It sends the subscriptions in JSON form through the RabbitMQ message broker.
I have tried several strategies, and the best so far is the following, which is still not fully working:
Each cluster machine runs a consumer module, which subscribes itself to the AMQP queue and issues a prefetch_count to tell the broker how many tasks it can run at once.
I was able to make it work using SelectConnection from the Pika AMQP library. Both consumer and producer start two channels, one connected to each queue. The producer sends requests on channel [A] and waits for responses in channel [B], and the consumer waits for requests on channel [A] and send responses on channel [B]. It seems, however, that when the consumer runs the callback that calculates the response, it blocks, so I have only one task executed at each consumer at each time.
What I need in the end:
the consumer [A] subscribes his tasks (around 5k each time) to the cluster
the broker dispatches N messages/requests for each consumer, where N is the number of concurrent tasks it can handle
when a single task is finished, the consumer replies to the broker/producer with the result
the producer receives the replies, update the computation status and, in the end, prints some reports
Restrictions:
If another user submits work, all of his tasks will be queued after the previous user (I guess this is automatically true from the queue system, but I haven't thought about the implications on a threaded environment)
Tasks have an order to be submitted, but the order they are replied is not important
UPDATE
I have studied a bit further and my actual problem seems to be that I use a simple function as callback to the pika's SelectConnection.channel.basic_consume() function. My last (unimplemented) idea is to pass a threading function, instead of a regular one, so the callback would not block and the consumer can keep listening.

As you have noticed, your process blocks when it runs a callback. There are several ways to deal with this depending on what your callback does.
If your callback is IO-bound (doing lots of networking or disk IO) you can use either threads or a greenlet-based solution, such as gevent, eventlet, or greenhouse. Keep in mind, though, that Python is limited by the GIL (Global Interpreter Lock), which means that only one piece of python code is ever running in a single python process. This means that if you are doing lots of computation with python code, these solutions will likely not be much faster than what you already have.
Another option would be to implement your consumer as multiple processes using multiprocessing. I have found multiprocessing to be very useful when doing parallel work. You could implement this by either using a Queue, having the parent process being the consumer and farming out work to its children, or by simply starting up multiple processes which each consume on their own. I would suggest, unless your application is highly concurrent (1000s of workers), to simply start multiple workers, each of which consumes from their own connection. This way, you can use the acknowledgement feature of AMQP, so if a consumer dies while still processing a task, the message is sent back to the queue automatically and will be picked up by another worker, rather than simply losing the request.
A last option, if you control the producer and it is also written in Python, is to use a task library like celery to abstract the task/queue workings for you. I have used celery for several large projects and have found it to be very well written. It will also handle the multiple consumer issues for you with the appropriate configuration.

Your setup sounds good to me. And you are right, you can simply set the callback to start a thread and chain that to a separate callback when the thread finishes to queue the response back over Channel B.
Basically, your consumers should have a queue of their own (size of N, amount of parallelism they support). When a request comes in via Channel A, it should store the result in the queue shared between the main thread with Pika and the worker threads in the thread pool. As soon it is queued, pika should respond back with ACK, and your worker thread would wake up and start processing.
Once the worker is done with its work, it would queue the result back on a separate result queue and issue a callback to the main thread to send it back to the consumer.
You should take care and make sure that the worker threads are not interfering with each other if they are using any shared resources, but that's a separate topic.

Being unexperienced in threading, my setup would run multiple consumer processes (the number of which basically being your prefetch count). Each would connect to the two queues and they would process jobs happily, unknowning of eachother's existence.

Parent Thread exiting before Child Threads [python]

I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?

Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()

You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.