For example, if one application does from twisted.internet import reactor, and another application does the same, are those reactors the same?
I am asking because Deluge, an application that uses twisted, looks like it uses the reactor to connect their UI (gtk) to the rest of the application being driven by twisted (I am trying to understand the source). For example, when the UI is closed it simply calls reactor.stop().
Is that all there is to it? It just seems kind of magic to me. What if I wanted to run another application that uses twisted?
Yes, every module in Python is always global, or, to put it better, a singleton: when you do from twisted.internet import reactor, Python's import mechanism first checks sys.modules['twisted.internet.reactor'], and, if that exists, returns said value; only if it doesn't exist (i.e., the first time a module is imported) is the module actually loaded for the first time (and stashed into an entry in sys.modules for possible future imports).
There is nothing especially magical in the Singleton design pattern, though it can sometimes prove limiting when you desperately need more than one of those thingies for which the architecture has decreed "there can be only one". Twisted's docs acknowledge that:
New application code should prefer to
pass and accept the reactor as a
parameter where it is needed, rather
than relying on being able to import
this module to get a reference. This
simplifies unit testing and may make
it easier to one day support multiple
reactors (as a performance
enhancement), though this is not
currently possible.
The best way to make it possible, if it's crucial to your app, is to contribute to the Twisted project, either labor (coding the subtle mechanisms needed to support multiple reactors, that is, multiple event loops, within a single app) or funding (money will enable sustaining somebody with a stipend in order to perform this work).
Otherwise, use separate processes (e.g. with the multiprocessing module of the standard library) with no more than one reactor each.
The reactor is indeed global. It takes care of the event loop, and you register handlers to consume events. If you want to use several applications with the same reactor, you can use the twistd daemon. http://twistedmatrix.com/documents/current/core/howto/application.html
Related
I have an existing program that has its own main loop, and does computations based on input it receives - let's say from the user, to make it simple. I want to now do the computations remotely instead of locally, and I decided to implement the RPCs in Twisted.
Ideally I just want to change one of my functions, say doComputation(), to make a call to twisted to perform the RPC, get the results, and return. The rest of the program should stay the same. How can I accomplish this, though? Twisted hijacks the main loop when I call reactor.run(). I also read that you don't really have threads in twisted, that all the tasks run in sequence, so it seems I can't just create a LoopingCall and run my main loop that way.
You have a couple of different options, depending on what sort of main loop your existing program has.
If it's a mainloop from a GUI library, Twisted may already have support for it. In that case, you can just go ahead and use it.
You could also write your own reactor. There isn't a lot of great documentation for this, but you can look at the way that qtreactor implements a reactor plugin externally to Twisted.
You can also write a minimal reactor using threadedselectreactor. The documentation for this is also sparse, but the wxpython reactor is implemented using it. Personally I wouldn't recommend this approach as it is difficult to test and may result in confusing race conditions, but it does have the advantage of letting you leverage almost all of Twisted's default networking code with only a thin layer of wrapping.
If you are really sure that you don't want your doComputation to be asynchronous, and you want your program to block while waiting for Twisted to answer, do the following:
start Twisted in another thread before your main loop starts up, with something like twistedThread = Thread(target=reactor.run); twistedThread.start()
instantiate an object to do your RPC communication (let's say, RPCDoer) in your own main loop's thread, so that you have a reference to it. Make sure to actually kick off its Twisted logic with reactor.callFromThread so you don't need to wrap all of its Twisted API calls.
Implement RPCDoer.doRPC to return a Deferred, using only Twisted API calls (i.e. don't call into your existing application code, so you don't need to worry about thread safety for your application objects; pass doRPC all the information that it needs as arguments).
You can now implement doComputation like this:
def doComputation(self):
rpcResult = blockingCallFromThread(reactor, self.myRPCDoer.doRPC)
return self.computeSomethingFrom(rpcResult)
Remember to call reactor.callFromThread(reactor.stop); twistedThread.join() from your main-loop's shutdown procedure, otherwise you may see some confusing tracebacks or log messages on exit.
Finally, one option that you should really consider, especially in the long term: dump your existing main loop, and figure out a way to just use Twisted's. In my experience this is the right answer for 9 out of 10 askers of questions like this. I'm not saying that this is always the way to go - there are plenty of cases where you really need to keep your own main loop, or where it's just way too much effort to get rid of the existing loop. But, maintaining your own loop is work too. Keep in mind that the Twisted loop has been extensively tested by millions of users and used in a huge variety of environments. If your loop is also extremely mature, that may not be a big deal, but if you're writing a small, new program, the difference in reliability may be significant.
It seems like the correct and very simple answer here is a LoopingCall:
http://www.saltycrane.com/blog/2008/10/running-functions-periodically-using-twisteds-loopingcall/
from datetime import datetime
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
def doComputation():
print "Custom fn run at", datetime.now()
lc = LoopingCall(doComputation)
lc.start(0.1) # run your own loop 10 times a second
# put your other twisted here
reactor.run()
After doing Gevent/Eventlet monkey patching - can I assume that whenever DB driver (eg redis-py, pymongo) uses IO through standard library (eg socket) it will be asynchronous?
So using eventlets monkey patching is enough to make eg: redis-py non blocking in eventlet application?
From what I know it should be enough if I take care about connection usage (eg to use different connection for each greenlet). But I want to be sure.
If you known what else is required, or how to use DB drivers correctly with Gevent/Eventlet please type it also.
You can assume it will be magically patched if all of the following are true.
You're sure of the I/O is built on top of standard Python sockets or other things that eventlet/gevent monkeypatches. No files, no native (C) socket objects, etc.
You pass aggressive=True to patch_all (or patch_select), or you're sure the library doesn't use select or anything similar.
The driver doesn't use any (implicit) internal threads. (If the driver does use threads internally, patch_thread may work, but it may not.)
If you're not sure, it's pretty easy to test—probably easier than reading through the code and trying to work it out. Have one greenlet that just does something like this:
while True:
print("running")
gevent.sleep(0.1)
Then have another that runs a slow query against the database. If it's monkeypatched, the looping greenlet will keep printing "running" 10 times/second; if not, the looping greenlet will not get to run while the program is blocked on the query.
So, what do you do if your driver blocks?
The easiest solution is to use a truly concurrent threadpool for DB queries. The idea is that you fire off each query (or batch) as a threadpool job and greenlet-block your gevent on the completion of that job. (For really simple cases, where you don't need many concurrent queries, you can just spawn a threading.Thread for each one instead, but usually you can't get away with that.)
If the driver does significant CPU work (e.g., you're using something that runs an in-process cache, or even an entire in-process DBMS like sqlite), you want this threadpool to actually be implemented on top of processes, because otherwise the GIL may prevent your greenlets from running. Otherwise (especially if you care about Windows), you probably want to use OS threads. (However, this means you can't patch_threads(); if you need to do that, use processes.)
If you're using eventlet, and you want to use threads, there's a built-in simple solution called tpool that may be sufficient. If you're using gevent, or you need to use processes, this won't work. Unfortunately, blocking a greenlet (without blocking the whole event loop) on a real threading object is a bit different between eventlet and gevent, and not documented very well, but the tpool source should give you the idea. Beyond that part, the rest is just using concurrent.futures (see futures on pypi if you need this in 2.x or 3.1) to execute the tasks on a ThreadPoolExecutor or ProcessPoolExecutor. (Or, if you prefer, you can go right to threading or multiprocessing instead of using futures.)
Can you explain why I should use OS threads on Windows?
The quick summary is: If you stick to threads, you can pretty much just write cross-platform code, but if you go to processes, you're effectively writing code for two different platforms.
First, read the Programming guidelines for the multiprocessing module (both the "All platforms" section and the "Windows" section). Fortunately, a DB wrapper shouldn't run into most of this. You only need to deal with processes via the ProcessPoolExecutor. And, whether you wrap things up at the cursor-op level or the query level, all your arguments and return values are going to be simple types that can be pickled. Still, it's something you have to be careful about, which otherwise wouldn't be an issue.
Meanwhile, Windows has very low overhead for its intra-process synchronization objects, but very high overhead for its inter-process ones. (It also has very fast thread creation and very slow process creation, but that's not important if you're using a pool.) So, how do you deal with that? I had a lot of fun creating OS threads to wait on the cross-process sync objects and signal the greenlets, but your definition of fun may vary.
Finally, tpool can be adapted trivially to a ppool for Unix, but it takes more work on Windows (and you'll have to understand Windows to do that work).
abarnert's answer is correct and very comprehensive. I just want to add that there is no "aggressive" patching in eventlet, probably gevent feature. Also if library uses select that is not a problem, because eventlet can monkey patch that too.
Indeed, in most cases eventlet.monkey_patch() is all you need. Of course, it must be done before creating any sockets.
If you still have any issues, feel free to open issue or write to eventlet mailing list or G+ community. All relevant links can be found at http://eventlet.net/
I have an existing program that has its own main loop, and does computations based on input it receives - let's say from the user, to make it simple. I want to now do the computations remotely instead of locally, and I decided to implement the RPCs in Twisted.
Ideally I just want to change one of my functions, say doComputation(), to make a call to twisted to perform the RPC, get the results, and return. The rest of the program should stay the same. How can I accomplish this, though? Twisted hijacks the main loop when I call reactor.run(). I also read that you don't really have threads in twisted, that all the tasks run in sequence, so it seems I can't just create a LoopingCall and run my main loop that way.
You have a couple of different options, depending on what sort of main loop your existing program has.
If it's a mainloop from a GUI library, Twisted may already have support for it. In that case, you can just go ahead and use it.
You could also write your own reactor. There isn't a lot of great documentation for this, but you can look at the way that qtreactor implements a reactor plugin externally to Twisted.
You can also write a minimal reactor using threadedselectreactor. The documentation for this is also sparse, but the wxpython reactor is implemented using it. Personally I wouldn't recommend this approach as it is difficult to test and may result in confusing race conditions, but it does have the advantage of letting you leverage almost all of Twisted's default networking code with only a thin layer of wrapping.
If you are really sure that you don't want your doComputation to be asynchronous, and you want your program to block while waiting for Twisted to answer, do the following:
start Twisted in another thread before your main loop starts up, with something like twistedThread = Thread(target=reactor.run); twistedThread.start()
instantiate an object to do your RPC communication (let's say, RPCDoer) in your own main loop's thread, so that you have a reference to it. Make sure to actually kick off its Twisted logic with reactor.callFromThread so you don't need to wrap all of its Twisted API calls.
Implement RPCDoer.doRPC to return a Deferred, using only Twisted API calls (i.e. don't call into your existing application code, so you don't need to worry about thread safety for your application objects; pass doRPC all the information that it needs as arguments).
You can now implement doComputation like this:
def doComputation(self):
rpcResult = blockingCallFromThread(reactor, self.myRPCDoer.doRPC)
return self.computeSomethingFrom(rpcResult)
Remember to call reactor.callFromThread(reactor.stop); twistedThread.join() from your main-loop's shutdown procedure, otherwise you may see some confusing tracebacks or log messages on exit.
Finally, one option that you should really consider, especially in the long term: dump your existing main loop, and figure out a way to just use Twisted's. In my experience this is the right answer for 9 out of 10 askers of questions like this. I'm not saying that this is always the way to go - there are plenty of cases where you really need to keep your own main loop, or where it's just way too much effort to get rid of the existing loop. But, maintaining your own loop is work too. Keep in mind that the Twisted loop has been extensively tested by millions of users and used in a huge variety of environments. If your loop is also extremely mature, that may not be a big deal, but if you're writing a small, new program, the difference in reliability may be significant.
It seems like the correct and very simple answer here is a LoopingCall:
http://www.saltycrane.com/blog/2008/10/running-functions-periodically-using-twisteds-loopingcall/
from datetime import datetime
from twisted.internet.task import LoopingCall
from twisted.internet import reactor
def doComputation():
print "Custom fn run at", datetime.now()
lc = LoopingCall(doComputation)
lc.start(0.1) # run your own loop 10 times a second
# put your other twisted here
reactor.run()
I'm trying to wrap my head around what is happening in this recipe, because I'm planning on implementing a wx/twisted app similar to this (ie. wx and twisted running in separate threads). I understand that both twisted and wx event-loops need to be accessed in a thread-safe manner (ie. reactor.callFromThread, wx.PostEvent, etc). What I am questioning is the thread-safety of passing in instance methods of objects instantiated in one thread (in the case of this recipe, the GUI thread) as deferred callBack and errBack methods for a reactor running in a separate thread. Is that a good idea?
There is a wxreactor available in twisted, but googling reveals that there have been numerous problems with it since it was introduced to the library. Even the person who initially came up with the wxreactor technique, advocates running wx and twisted in separate threads.
I haven't been able to find any other examples of this technique, but I'd love to see some.
I wouldn't say that it's a "good idea". You should just run the reactor and the GUI in the same thread with wxreactor.
The timer-driven event-loop starving approach described by Mr. Schroeder is the worst possible fail-safe way to implement event-loop integration. If you use wxreactor (not wxsupport) Twisted now uses an approach where multiplexing is shunted off to a thread internally so that nothing needs to use a timer. Better yet would be for wxpython to expose wxSocket and have someone base a reactor on it.
However, if you're set on using a separate thread to communicate with Twisted, the one thing to keep in mind is that while you can use objects that originate from any thread you like as the value to pass to Deferred.callback, you must call Deferred.callback only in the reactor thread itself. Deferreds are not threadsafe; thanks to some debugging utilities, not even the Deferred class is threadsafe, so you need to be very careful when you are using them to never leave the Twisted main thread. i.e. when you have a result in the UI thread, use reactor.callFromThread(myDeferred.callback, myresult).
The sole act of passing instance methods between threads is safe as long as you properly synchronize eventual destruction of those instances (threads share memory so it really doesn't matter which one did the allocation/initialization of a bit of it).
The overall thread safety depends on what those methods actually do, so just make them "play nice" and you should be ok.
I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.