Python - while loop inside twisted main loop? - python

I need to know if its possible to run a while loop inside a twisted websocket main loop
the while loop im referring to is the lib you see in this question: shout-python segmentation fault how can I fix this?
all I need it to do is send the new title once it updates, though I can handle that part. Its that while self.running: in the play() function. If you can help I'll surely appreciate it.

For Twisted's single-threaded, cooperative multitasking system to operate at its best, it's important that any particular piece of code running in the reactor thread not run for too long without giving control back to the reactor. As long as any one piece of code is running in that thread, no other code is running in that thread. In a single-threaded, cooperative multitasking system that means other events aren't being serviced.
Depending on your application, it may be fine for a single piece of code to run without giving up control for many milliseconds, many seconds, perhaps even minutes. It's entirely dependent on what events your application is responsible for handling and what level of responsiveness you want to get from it. When writing general purpose library code for a system like this, most people assume that it's only okay to run code for a single task for a handful of milliseconds or so before giving up control - to err on the side of being suitable for use in more applications rather than fewer (although people rarely consider the exact time limit, mostly operations are separated into "pretty quick" and everything else).
What's almost always unacceptable is to run a single piece of code indefinitely without giving control back to the reactor. The loop in the answer you linked to is effectively infinite and so it will hold control for an arbitrarily long period of time (perhaps for most of the runtime of the program). There are few if any applications that can tolerate this since the result is that other events will never be handled. If it's tolerable for your application to be unable to respond to any events while it spends its entire run time working on a single task then you may not need a multitasking system at all (ie, you may not need Twisted, you may be able to just use a while loop).
The loop in that answer is basically a "process some data as quickly as possible" loop. There are a few options for implementations of this kind of work in ways that are more multitasking-friendly.
One way is a literal translation of the loop into a pattern that's friendly to the reactor. You can do this with a generator:
from twisted.internet.task import cooperate
class Audio(object):
def play(self):
# ... setup stuff ...
while self.running:
# ... one chunk of work ...
yield
def main():
...
cooperate(Audio().play())
cooperate takes an iterator and iterates over it - but not all at once. It iterates it a few times and then gives up control to the reactor. Then it iterates it a few more times and then gives up control again. This continues until the iterator is exhausted (or the reactor is stopped).
Another slightly less literal translation is based on LoopingCall which takes over responsibility for the looping construct, leaving you only to supply the body of the loop:
from twisted.internet.task import LoopingCall
class Audio(object):
def play(self):
# ... setup stuff ...
LoopingCall(self._play_iteration).start(0)
def _play_iteration(self):
# ... one chunk of work
This gives you control over the rate at which the loop iterates. The 0 passed to start in this example means "as fast as possible" (wait 0 seconds between iterations) - while remaining cooperative with the rest of the system. If you wanted one iteration per second, you would pass 1, etc.
Another less literal option is to use a data flow abstraction - for example, Twisted's native producer/consumer system or the newer tubes library - to set up multitasking-friendly data processing pipelines that are further abstraction from the specific "read, process" loop in the linked answer.

Related

Is this multi-threaded function asynchronous

I'm afraid I'm still a bit confused (despite checking other threads) whether:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
My initial guess is no to both and that proper asynchronous code should be able to run in one thread - however it can be improved by adding threads for example like so:
So I constructed this toy example:
from threading import *
from queue import Queue
import time
def do_something_with_io_lag(in_work):
out = in_work
# Imagine we do some work that involves sending
# something over the internet and processing the output
# once it arrives
time.sleep(0.5) # simulate IO lag
print("Hello, bee number: ",
str(current_thread().name).replace("Thread-",""))
class WorkerBee(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
while True:
# Get some work from the queue
work_todo = self.q.get()
# This function will simiulate I/O lag
do_something_with_io_lag(work_todo)
# Remove task from the queue
self.q.task_done()
if __name__ == '__main__':
def time_me(nmbr):
number_of_worker_bees = nmbr
worktodo = ['some input for work'] * 50
# Create a queue
q = Queue()
# Fill with work
[q.put(onework) for onework in worktodo]
# Launch processes
for _ in range(number_of_worker_bees):
t = WorkerBee(q)
t.start()
# Block until queue is empty
q.join()
# Run this code in serial mode (just one worker)
%time time_me(nmbr=1)
# Wall time: 25 s
# Basically 50 requests * 0.5 seconds IO lag
# For me everything gets processed by bee number: 59
# Run this code using multi-tasking (launch 50 workers)
%time time_me(nmbr=50)
# Wall time: 507 ms
# Basically the 0.5 second IO lag + 0.07 seconds it took to launch them
# Now everything gets processed by different bees
Is it asynchronous?
To me this code does not seem asynchronous because it is Figure 3 in my example diagram. The I/O call blocks the thread (although we don't feel it because they are blocked in parallel).
However, if this is the case I am confused why requests-futures is considered asynchronous since it is a wrapper around ThreadPoolExecutor:
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
Can this function on just one thread?
Especially when compared to asyncio, which means it can run single-threaded
There are only two ways to have a program on a single processor do
“more than one thing at a time.” Multi-threaded programming is the
simplest and most popular way to do it, but there is another very
different technique, that lets you have nearly all the advantages of
multi-threading, without actually using multiple threads. It’s really
only practical if your program is largely I/O bound. If your program
is processor bound, then pre-emptive scheduled threads are probably
what you really need. Network servers are rarely processor bound,
however.
First of all, one note: concurrent.futures.Future is not the same as asyncio.Future. Basically it's just an abstraction - an object, that allows you to refer to job result (or exception, which is also a result) in your program after you assigned a job, but before it is completed. It's similar to assigning common function's result to some variable.
Multithreading: Regarding your example, when using multiple threads you can say that your code is "asynchronous" as several operations are performed in different threads at the same time without waiting for each other to complete, and you can see it in the timing results. And you're right, your function due to sleep is blocking, it blocks the worker thread for the specified amount of time, but when you use several threads those threads are blocked in parallel. So if you would have one job with sleep and the other one without and run multiple threads, the one without sleep would perform calculations while the other would sleep. When you use single thread, the jobs are performed in in a serial manner one after the other, so when one job sleeps the other jobs wait for it, actually they just don't exist until it's their turn. All this is pretty much proven by your time tests. The thing happened with print has to do with "thread safety", i.e. print uses standard output, which is a single shared resource. So when your multiple threads tried to print at the same time the switching happened inside and you got your strange output. (This also show "asynchronicity" of your multithreaded example.) To prevent such errors there are locking mechanisms, e.g. locks, semaphores, etc.
Asyncio: To better understand the purpose note the "IO" part, it's not 'async computation', but 'async input/output'. When talking about asyncio you usually don't think about threads at first. Asyncio is about event loop and generators (coroutines). The event loop is the arbiter, that governs the execution of coroutines (and their callbacks), that were registered to the loop. Coroutines are implemented as generators, i.e. functions that allow to perform some actions iteratively, saving state at each iteration and 'returning', and on the next call continuing with the saved state. So basically the event loop is while True: loop, that calls all coroutines/generators, assigned to it, one after another, and they provide result or no-result on each such call - this provides possibility for "asynchronicity". (A simplification, as there's scheduling mechanisms, that optimize this behavior.) The event loop in this situation can run in single thread and if coroutines are non-blocking it will give you true "asynchronicity", but if they are blocking then it's basically a linear execution.
You can achieve the same thing with explicit multithreading, but threads are costly - they require memory to be assigned, switching them takes time, etc. On the other hand asyncio API allows you to abstract from actual implementation and just consider your jobs to be performed asynchronously. It's implementation may be different, it includes calling the OS API and the OS decides what to do, e.g. DMA, additional threads, some specific microcontroller use, etc. The thing is it works well for IO due to lower level mechanisms, hardware stuff. On the other hand, performing computation will require explicit breaking of computation algorithm into pieces to use as asyncio coroutine, so a separate thread might be a better decision, as you can launch the whole computation as one there. (I'm not talking about algorithms that are special to parallel computing). But asyncio event loop might be explicitly set to use separate threads for coroutines, so this will be asyncio with multithreading.
Regarding your example, if you'll implement your function with sleep as asyncio coroutine, shedule and run 50 of them single threaded, you'll get time similar to the first time test, i.e. around 25s, as it is blocking. If you will change it to something like yield from [asyncio.sleep][3](0.5) (which is a coroutine itself), shedule and run 50 of them single threaded, it will be called asynchronously. So while one coroutine will sleep the other will be started, and so on. The jobs will complete in time similar to your second multithreaded test, i.e. close to 0.5s. If you will add print here you'll get good output as it will be used by single thread in serial manner, but the output might be in different order then the order of coroutine assignment to the loop, as coroutines could be run in different order. If you will use multiple threads, then the result will obviously be close to the last one anyway.
Simplification: The difference in multythreading and asyncio is in blocking/non-blocking, so basicly blocking multithreading will somewhat come close to non-blocking asyncio, but there're a lot of differences.
Multithreading for computations (i.e. CPU bound code)
Asyncio for input/output (i.e. I/O bound code)
Regarding your original statement:
all asynchronous code is multi-threaded
all multi-threaded functions are asynchronous
I hope that I was able to show, that:
asynchronous code might be both single threaded and multi-threaded
all multi-threaded functions could be called "asynchronous"
I think the main confusion comes from the meaning of asynchronous. From the Free Online Dictionary of Computing, "A process [...] whose execution can proceed independently" is asynchronous. Now, apply that to what your bees do:
Retrieve an item from the queue. Only one at a time can do that, while the order in which they get an item is undefined. I wouldn't call that asynchronous.
Sleep. Each bee does so independently of all others, i.e. the sleep duration runs on all, otherwise the time wouldn't go down with multiple bees. I'd call that asynchronous.
Call print(). While the calls are independent, at some point the data is funneled into the same output target, and at that point a sequence is enforced. I wouldn't call that asynchronous. Note however that the two arguments to print() and also the trailing newline are handled independently, which is why they can be interleaved.
Lastly, the call to q.join(). Here of course the calling thread is blocked until the queue is empty, so some kind of synchronization is enforced and wanted. I don't see why this "seems to break" for you.

Use twisted with custom main loop [duplicate]

I have an existing program that has its own main loop, and does computations based on input it receives - let's say from the user, to make it simple. I want to now do the computations remotely instead of locally, and I decided to implement the RPCs in Twisted.
Ideally I just want to change one of my functions, say doComputation(), to make a call to twisted to perform the RPC, get the results, and return. The rest of the program should stay the same. How can I accomplish this, though? Twisted hijacks the main loop when I call reactor.run(). I also read that you don't really have threads in twisted, that all the tasks run in sequence, so it seems I can't just create a LoopingCall and run my main loop that way.
You have a couple of different options, depending on what sort of main loop your existing program has.
If it's a mainloop from a GUI library, Twisted may already have support for it. In that case, you can just go ahead and use it.
You could also write your own reactor. There isn't a lot of great documentation for this, but you can look at the way that qtreactor implements a reactor plugin externally to Twisted.
You can also write a minimal reactor using threadedselectreactor. The documentation for this is also sparse, but the wxpython reactor is implemented using it. Personally I wouldn't recommend this approach as it is difficult to test and may result in confusing race conditions, but it does have the advantage of letting you leverage almost all of Twisted's default networking code with only a thin layer of wrapping.
If you are really sure that you don't want your doComputation to be asynchronous, and you want your program to block while waiting for Twisted to answer, do the following:
start Twisted in another thread before your main loop starts up, with something like twistedThread = Thread(target=reactor.run); twistedThread.start()
instantiate an object to do your RPC communication (let's say, RPCDoer) in your own main loop's thread, so that you have a reference to it. Make sure to actually kick off its Twisted logic with reactor.callFromThread so you don't need to wrap all of its Twisted API calls.
Implement RPCDoer.doRPC to return a Deferred, using only Twisted API calls (i.e. don't call into your existing application code, so you don't need to worry about thread safety for your application objects; pass doRPC all the information that it needs as arguments).
You can now implement doComputation like this:
def doComputation(self):
rpcResult = blockingCallFromThread(reactor, self.myRPCDoer.doRPC)
return self.computeSomethingFrom(rpcResult)
Remember to call reactor.callFromThread(reactor.stop); twistedThread.join() from your main-loop's shutdown procedure, otherwise you may see some confusing tracebacks or log messages on exit.
Finally, one option that you should really consider, especially in the long term: dump your existing main loop, and figure out a way to just use Twisted's. In my experience this is the right answer for 9 out of 10 askers of questions like this. I'm not saying that this is always the way to go - there are plenty of cases where you really need to keep your own main loop, or where it's just way too much effort to get rid of the existing loop. But, maintaining your own loop is work too. Keep in mind that the Twisted loop has been extensively tested by millions of users and used in a huge variety of environments. If your loop is also extremely mature, that may not be a big deal, but if you're writing a small, new program, the difference in reliability may be significant.
It seems like the correct and very simple answer here is a LoopingCall:
http://www.saltycrane.com/blog/2008/10/running-functions-periodically-using-twisteds-loopingcall/
from datetime import datetime
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
def doComputation():
print "Custom fn run at", datetime.now()
lc = LoopingCall(doComputation)
lc.start(0.1) # run your own loop 10 times a second
# put your other twisted here
reactor.run()

How to run a program and also execute code when data is received from the network?

I've written a little program in Python that basically does the following:
Gets a hotword as input from the user. If it matches the set keyword it continues.
After entering the correct hotword the user is asked to enter a command.
After the command is read the progam checks a command file to see if there is a command that matches that input
If a matching command is found, execute whatever that command says.
I'd like to add the ability to execute commands over a network as follows (and learn to use Twisted on the way):
Client #1 enters a command targeted at client #2.
The command gets sent to a server which routes it to client #2.
Client #2 receives the command and executes it if it's valid.
Note: Entering commands locally (using the code below) and remotely should be possible.
After some thinking I couldn't come up with any other way to implement this other than:
Have the above program run as process #1 (the program that runs locally as I've written at the beginning).
A Twisted client will be run as process #2 and receive the commands from remote clients. Whenever a command is received, the Twisted client will initialize a thread that'll parse the command, check for its validity and execute it if it's valid.
Since I don't have that much experience with threads and none with network programming, I couldn't think of any other scheme that makes sense to me.
Is the scheme stated above overly complicated?
I would appreciate some insight before trying to implement it this way.
The code for the python program (without the client) is:
The main (which is the start() method):
class Controller:
def __init__(self,listener, executor):
self.listener = listener
self.executor = executor
def start(self):
while True:
text = self.listener.listen_for_hotword()
if self.executor.is_hotword(text):
text = self.listener.listen_for_command()
if self.executor.has_matching_command(text):
self.executor.execute_command(text)
else:
tts.say("No command found. Please try again")
The Listener (gets input from the user):
class TextListener(Listener):
def listen_for_hotword(self):
text = raw_input("Hotword: ")
text =' '.join(text.split()).lower()
return text
def listen_for_command(self):
text = raw_input("What would you like me to do: ")
text = ' '.join(text.split()).lower()
return text
The executor (the class that executes the given command):
class Executor:
#TODO: Define default path
def __init__(self,parser, audio_path='../Misc/audio.wav'):
self.command_parser = parser
self.audio_path = audio_path
def is_hotword(self,hotword):
return self.command_parser.is_hotword(hotword)
def has_matching_command(self,command):
return self.command_parser.has_matching_command(command)
def execute_command(self,command):
val = self.command_parser.getCommand(command)
print val
val = os.system(val) #So we don't see the return value of the command
The command file parser:
class KeyNotFoundException(Exception):
pass
class YAMLParser:
THRESHOLD = 0.6
def __init__(self,path='Configurations/commands.yaml'):
with open(path,'r') as f:
self.parsed_yaml = yaml.load(f)
def getCommand(self,key):
try:
matching_command = self.find_matching_command(key)
return self.parsed_yaml["Commands"][matching_command]
except KeyError:
raise KeyNotFoundException("No key matching {}".format(key))
def has_matching_command(self,key):
try:
for command in self.parsed_yaml["Commands"]:
if jellyfish.jaro_distance(command,key) >=self.THRESHOLD:
return True
except KeyError:
return False
def find_matching_command(self,key):
for command in self.parsed_yaml["Commands"]:
if jellyfish.jaro_distance(command,key) >=0.5:
return command
def is_hotword(self,hotword):
return jellyfish.jaro_distance(self.parsed_yaml["Hotword"],hotword)>=self.THRESHOLD
Example configuration file:
Commands:
echo : echo hello
Hotword: start
I'm finding it extremely difficult to follow the background in your questions, but I'll take a stab at the questions themselves.
How to run a program and also execute code when data is received from the network?
As you noted in your question, the typical way to write a "walk and chew-gum" style app is to design your code in a threaded or an event-loop style.
Given you talking about threading and Twisted (which is event-loop style) I'm worried that you may be thinking about mixing the two.
I view them as fundamentally different styles of programming (each with places they excel) and that mixing them is generally a path to hell.
Let me give you some background to explain
Background: Threading vs Event programming
Threading
How to think of the concept:
I have multiple things I need to do at the same time, and I want my operating system to figure how and when to run those separate tasks.
Pluses:
'The' way to let one program use multiple processor cores at the same time
In the posix world the only way to let one process run on multiple CPU cores at the same time is via threads (with the typical ideal number of threads being no more then the cores in a given machine)
Easier to start with
The same code that you were running inline can be tossed off into a thread usually without needing a redesign (without GIL some locking would be required but more on that later)
Much easier to use with tasks that will eat all the CPU you can give at them
I.E. in most cases math is way easier to deal with using threading solutions then using event/async frameworks
Minuses:
Python has a special problem with threads
In CPython the global interpreter lock(GIL) can negate threads ability to multitask (making threads nearly useless). Avoiding the GIL is messy and can undo all the ease of use of working in threads
As you add threads (and locks) things get complicated fast, see this SO: Threading Best Practices
Rarely optimal for IO/user-interacting tasks
While threads are very good at dealing with small numbers of tasks that want to use lots of CPU (Ideally one thread per core), they are far less optimal at dealing with large counts of tasks that spend most of their time waiting.
Best use:
Computationally expensive things.
If you have big chucks of math that you want to run concurrently, its very unlikely that your going to be able to schedule the CPU utilization more intelligently then the operation system.
( Given CPythons GIL problem, threading shouldn't manually be used for math, instead a library that internally threads (like NumPy) should be used )
Event-loop/Asynchronous programming
How to think of the concept:
I have multiple things I need to do at the same time, but I (the programer) want direct control/direct-implimentation over how and when my sub-tasks are run
How you should be thinking about your code:
Think of all your sub-tasks in one big intertwined whole, your mind should always have the thought of "will this code run fast enough that it doesn't goof up the other sub-tasks I'm managing"
Pluses:
Extraordinarily efficient with network/IO/UI connections, including large counts of connections
Event-loop style programs are one of the key technologies that solved the c10k problem. Frameworks like Twisted can literally handle tens-of-thousands of connections in one python process running on a small machine.
Predictable (small) increase in complexity as other connections/tasks are added
While there is a fairly steep learning curve (particularly in twisted), once the basics are understood new connection types can be added to projects with a minimal increase in the overall complexity. Moving from a program that offers a keyboard interface to one that offers keyboard/telnet/web/ssl/ssh connections may just be a few lines of glue code per-interface (... this is highly variable by framework, but regardless of the framework event-loops complexity is more predictable then threads as connection counts increase)
Minuses:
Harder to get started.
Event/async programming requires a different style of design from the first line of code (though you'll find that style of design is somewhat portable from language to language)
One event-loop, one core
While event-loops can let you handle a spectacular number of IO connections at the same time, they in-and-of-themselves can't run on multiple cores at the same time. The conventional way to deal with this is to write programs in such a way that multiple instances of the program can be run at the same time, one for each core (this technique bypasses the GIL problem)
Multiplexing high CPU tasks can be difficult
Event programing requires cutting high CPU tasks into pieces such that each piece takes an (ideally predictably) small amount of CPU, otherwise the event system ceases to multitask whenever the high CPU task is run.
Best use:
IO based things
TL; DR - Given your question, look at twisted and -not- threads
While your application doesn't seem to be exclusively IO based, none of it seems to be CPU based (it looks like your currently playing audio via a system call, system spins off an independent process every time its called, so its work doesn't burn your processes CPU - though system blocks, so its a no-no in twisted - you have to use different calls in twisted).
Your question also doesn't suggest your concerned about maxing out multiple cores.
Therefor, given you specifically talked about Twisted, and an event-loop solution seems to be the best match for your application, I would recommend looking at Twisted and -not- threads.
Given the 'Best use' listed above you might be tempted to think that mixing twisted and threads is the way-to-got, but when doing that if you do anything even slightly wrong you will disable the advantages of both the event-loop (you'll block) and threading (GIL won't let the threads multitask) and have something super complex that provides you no advantage.
Is the scheme stated above overly complicated? I would appreciate some insight before trying to implement it this way.
The 'scheme' you gave is:
After some thinking I couldn't come up with any other way to implement this other than:
Have the above program run as process #1 (the program that runs locally as I've written at the beginning).
A Twisted client will be run as process #2 and receive the commands from remote clients. Whenever a command is received, the Twisted client will initialize a thread that'll parse the command, check for its validity and execute it if it's valid.
Since I don't have that much experience with threads and none with network programming, I couldn't think of any other scheme that makes sense to me.
In answer to "Is the scheme ... overly complicated", I would say almost certainly yes because your talking about twisted and threads. (see tr; dr above)
Given my certainly incomplete (and confused) understanding of what you want to build, I would imagine a twisted solution for you would look like:
Carefully study the krondo Twisted Introduction (it really requires tracing the example code line-for-line, but if you do the work its an AWESOME learning tool for twisted - and event programing in general)
From the ground-up you rewrite your 'hotword' thingy in twisted using what you learned in the krondo guide - starting out just providing whichever interface you currently have (keyboard?)
Add other communication interfaces to that code (telnet, web, etc) which would let you access the processing code you wrote for the keyboard(?) interface.
If, as you state in your question, you really need a server, you could write a second twisted program to provide that (you'll see examples of all that in the krondo guide). Though I'm guessing when you understand twisted's library support you'll realize you don't have to build any extra servers, that you can just include whichever protocols you need in your base code.

time.sleep that allows parent application to still evaluate?

I've run into situations as of late when writing scripts for both Maya and Houdini where I need to wait for aspects of the GUI to update before I can call the rest of my Python code. I was thinking calling time.sleep in both situations would have fixed my problem, but it seems that time.sleep just holds up the parent application as well. This means my script evaluates the exact same regardless of whether or not the sleep is in there, it just pauses part way through.
I have a thought to run my script in a separate thread in Python to see if that will free up the application to still run during the sleep, but I haven't had time to test this yet.
Thought I would ask in the meantime if anybody knows of some other solution to this scenario.
Maya - or more precisely Maya Python - is not really multithreaded (Python itself has a dodgy kind of multithreading because all threads fight for the dread global interpreter lock, but that's not your problem here). You can run threaded code just fine in Maya using the threading module; try:
import time
import threading
def test():
for n in range (0, 10):
print "hello"
time.sleep(1)
t = threading.Thread(target = test)
t.start()
That will print 'hello' to your listener 10 times at one second intervals without shutting down interactivity.
Unfortunately, many parts of maya - including most notably ALL user created UI and most kinds of scene manipulation - can only be run from the "main" thread - the one that owns the maya UI. So, you could not do a script to change the contents of a text box in a window using the technique above (to make it worse, you'll get misleading error messages - code that works when you run it from the listener but errors when you call it from the thread and politely returns completely wrong error codes). You can do things like network communication, writing to a file, or long calculations in a separate thread no problem - but UI work and many common scene tasks will fail if you try to do them from a thread.
Maya has a partial workaround for this in the maya.utils module. You can use the functions executeDeferred and executeInMainThreadWithResult. These will wait for an idle time to run (which means, for example, that they won't run if you're playing back an animation) and then fire as if you'd done them in the main thread. The example from the maya docs give the idea:
import maya.utils import maya.cmds
def doSphere( radius ):
maya.cmds.sphere( radius=radius )
maya.utils.executeInMainThreadWithResult( doSphere, 5.0 )
This gets you most of what you want but you need to think carefully about how to break up your task into threading-friendly chunks. And, of course, running threaded programs is always harder than the single-threaded alternative, you need to design the code so that things wont break if another thread messes with a variable while you're working. Good parallel programming is a whole big kettle of fish, although boils down to a couple of basic ideas:
1) establish exclusive control over objects (for short operations) using RLocks when needed
2) put shared data into safe containers, like Queue in #dylan's example
3) be really clear about what objects are shareable (they should be few!) and which aren't
Here's decent (long) overview.
As for Houdini, i don't know for sure but this article makes it sound like similar issues arise there.
A better solution, rather than sleep, is a while loop. Set up a while loop to check a shared value (or even a thread-safe structure like a Queue). The parent processes that your waiting on can do their work (or children, it's not important who spawns what) and when they finish their work, they send a true/false/0/1/whatever to the Queue/variable letting the other processes know that they may continue.

Use my own main loop in twisted

I have an existing program that has its own main loop, and does computations based on input it receives - let's say from the user, to make it simple. I want to now do the computations remotely instead of locally, and I decided to implement the RPCs in Twisted.
Ideally I just want to change one of my functions, say doComputation(), to make a call to twisted to perform the RPC, get the results, and return. The rest of the program should stay the same. How can I accomplish this, though? Twisted hijacks the main loop when I call reactor.run(). I also read that you don't really have threads in twisted, that all the tasks run in sequence, so it seems I can't just create a LoopingCall and run my main loop that way.
You have a couple of different options, depending on what sort of main loop your existing program has.
If it's a mainloop from a GUI library, Twisted may already have support for it. In that case, you can just go ahead and use it.
You could also write your own reactor. There isn't a lot of great documentation for this, but you can look at the way that qtreactor implements a reactor plugin externally to Twisted.
You can also write a minimal reactor using threadedselectreactor. The documentation for this is also sparse, but the wxpython reactor is implemented using it. Personally I wouldn't recommend this approach as it is difficult to test and may result in confusing race conditions, but it does have the advantage of letting you leverage almost all of Twisted's default networking code with only a thin layer of wrapping.
If you are really sure that you don't want your doComputation to be asynchronous, and you want your program to block while waiting for Twisted to answer, do the following:
start Twisted in another thread before your main loop starts up, with something like twistedThread = Thread(target=reactor.run); twistedThread.start()
instantiate an object to do your RPC communication (let's say, RPCDoer) in your own main loop's thread, so that you have a reference to it. Make sure to actually kick off its Twisted logic with reactor.callFromThread so you don't need to wrap all of its Twisted API calls.
Implement RPCDoer.doRPC to return a Deferred, using only Twisted API calls (i.e. don't call into your existing application code, so you don't need to worry about thread safety for your application objects; pass doRPC all the information that it needs as arguments).
You can now implement doComputation like this:
def doComputation(self):
rpcResult = blockingCallFromThread(reactor, self.myRPCDoer.doRPC)
return self.computeSomethingFrom(rpcResult)
Remember to call reactor.callFromThread(reactor.stop); twistedThread.join() from your main-loop's shutdown procedure, otherwise you may see some confusing tracebacks or log messages on exit.
Finally, one option that you should really consider, especially in the long term: dump your existing main loop, and figure out a way to just use Twisted's. In my experience this is the right answer for 9 out of 10 askers of questions like this. I'm not saying that this is always the way to go - there are plenty of cases where you really need to keep your own main loop, or where it's just way too much effort to get rid of the existing loop. But, maintaining your own loop is work too. Keep in mind that the Twisted loop has been extensively tested by millions of users and used in a huge variety of environments. If your loop is also extremely mature, that may not be a big deal, but if you're writing a small, new program, the difference in reliability may be significant.
It seems like the correct and very simple answer here is a LoopingCall:
http://www.saltycrane.com/blog/2008/10/running-functions-periodically-using-twisteds-loopingcall/
from datetime import datetime
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
def doComputation():
print "Custom fn run at", datetime.now()
lc = LoopingCall(doComputation)
lc.start(0.1) # run your own loop 10 times a second
# put your other twisted here
reactor.run()

Categories