I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.
Related
I've developed a set of audio streaming server, all of them are using Twisted, and they are in Python, of course. They work, but a problem keeps troubling me, when I found some bugs there in the running server, or I want add something into the server, I need to stop them and start. Unlike HTTP servers, it's okay to restart them whenever, but not okay with audio streaming servers. Once I restart my streaming server, it means my users will encounter a disconnection.
I did try to setup a manhole (a ssh service for Twisted servers, you can login and type Python code in the console to do something), and connect to the console, reload Python modules on the fly. It works sometimes, but hard to control. You never know how many instances of old class are there in the server, and some of them might be hard to reach, and relationships of class would be very complex. Also, it may works in some situations, but sometimes you really need to restart server, for example, you are running the server with selector reactor, and you want to run it with epoll reactor instead, then you have to restart it. Another example, when the memory usage goes too high, you have to restart them, too.
To build such system, I have an idea comes in my head, I'm thinking is that possible to hand over those connections and data from a process to another. For example:
We have a Server named Broadcasting, and the running instance is under rev.123, and we want replace it with rev.124.
Broadcasting rev.123 is running....
Startup Broadcasting rev.124 ....
Broadcasting rev.124 is stand by
Hand over connections from instance of rev.123 to instance of rev.124
Stop Broadcasting rev. 123 instance
Is this possible? I have no idea that does lifetime of socket handles bound to processes or not, I thought sockets created by a process will be closed when the creator process is killed, but I'm not sure. If it is possible, are there any guidelines or articles for designing such kind of hot code swapping mechanism? And is there something can achieve what I want for Twisted already be done?
Thanks.
I gave a talk about this at PyCon 2004. There's also some effort to add more functionality to help with this to Twisted itself.
We have several data-centres located in several countries (Japan, Hong Kong, Singapore etc.).
We run applications on multiple hosts at each of these locations - probably around 50-100 hosts in total.
I'm working on a Python script that queries the status of each application, sends various triggers to them, and retrieves other things from them during runtime. This script could conceivably query a central server, which would then send the request to an agent running on each host.
One of the requirements is that the script is as responsive as possible - e.g. if I query the status of applications on all hosts in all locations, I would like the result within 1-3 seconds, as opposed to 20-30 seconds.
Hence, querying each hosts sequentially would be too slow, particularly considering the WAN hops we'd need to make.
We can assume that the query on each host itself is fairly trivial (e.g. is process running or not).
I'm fairly new to concurrent programming or asynchronous programming, so would value any input at all here. What is the "best" approach to tackling this problem?
Use a multi-threaded or multi-process approach - e.g. spawn a new thread for each host, send them all out, then wait for replies?
Use asyncore, twisted, tornado - any comments on which if any are suitable here? (I get the impression that asyncore isn't that popular. Tornado might be fun to try, but not sure how it could be used here?)
Use some kind of message queue (e.g. Kombu/RabbitMQ)?
Use celery, somehow? Would it be responsive enough for the responsive times we want? (e.g. under 3 seconds for the above).
Cheers,
Victor
Use gevent.
How?
from gevent import monkey; monkey.patch_socket() # So anything socket-based now works asynchronously.
#This should be the first line of you code!
import gevent
def query_server(server_ip):
# do_something with server_ip and sockets
server_ips = [....]
jobs = [gevent.spawn(query_server, server_ip) for server_ip in server_ips]
gevent.joinall(jobs)
print [job.result for job in jobs]
Why bother?
All your code will run in a single process and a single thread. This means you won't have to bother with locks, semaphores and message passing.
Your task seems to be mostly network-bound. Gevent will let you do network-bound work asynchronously, which means your code won't busy-wait on network connections, and instead will let OS notify it when the data is received.
It's a personal preference, but I think that gevent is the easiest asynchronous library to use when you want to do one-off work. (Like, you don't have to start a reactor a-la twisted).
Will it work?
The response-time will be the response time of your slowest server.
If using gevent doesn't do it, then you'll have to fix your network.
Use multiprocessing.Pool, especially the map() or map_async() members.
Write a function that takes a single argument (e.g. the hostname, or a list/tuple of hostname and other data. Let that function query a host and return relevant data.
Now compule a list of input variables (hostnames), and use multiprocessing.Pool.map() or multiprocessing.Pool.map_async() to execute the functions in parallel. The async variant will start returning data sooner, but there is a limit to the amount of work you can do in a callback.
This will automatically use as many cores as your machine has to process the functions in parallel.
If there are network delays however, there is not much the python program can do about that.
I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)
I've developed a set of audio streaming server, all of them are using Twisted, and they are in Python, of course. They work, but a problem keeps troubling me, when I found some bugs there in the running server, or I want add something into the server, I need to stop them and start. Unlike HTTP servers, it's okay to restart them whenever, but not okay with audio streaming servers. Once I restart my streaming server, it means my users will encounter a disconnection.
I did try to setup a manhole (a ssh service for Twisted servers, you can login and type Python code in the console to do something), and connect to the console, reload Python modules on the fly. It works sometimes, but hard to control. You never know how many instances of old class are there in the server, and some of them might be hard to reach, and relationships of class would be very complex. Also, it may works in some situations, but sometimes you really need to restart server, for example, you are running the server with selector reactor, and you want to run it with epoll reactor instead, then you have to restart it. Another example, when the memory usage goes too high, you have to restart them, too.
To build such system, I have an idea comes in my head, I'm thinking is that possible to hand over those connections and data from a process to another. For example:
We have a Server named Broadcasting, and the running instance is under rev.123, and we want replace it with rev.124.
Broadcasting rev.123 is running....
Startup Broadcasting rev.124 ....
Broadcasting rev.124 is stand by
Hand over connections from instance of rev.123 to instance of rev.124
Stop Broadcasting rev. 123 instance
Is this possible? I have no idea that does lifetime of socket handles bound to processes or not, I thought sockets created by a process will be closed when the creator process is killed, but I'm not sure. If it is possible, are there any guidelines or articles for designing such kind of hot code swapping mechanism? And is there something can achieve what I want for Twisted already be done?
Thanks.
I gave a talk about this at PyCon 2004. There's also some effort to add more functionality to help with this to Twisted itself.
I am currently researching the Twisted framework as a way of implementing a network-based backup application, and I would like to achieve something that I cannot find any examples of on the net.
I plan to implement the system using the Perspective Broker, but I will also need a way of transferring binary files from the client to the server. I would like to be able to call a method on the PB, and then use some sort of UID to send the file over a separate data channel.
The reason for having these two separate communication channels is down to the fact that I would like to make the client multi-threaded (one thread scanning the directory tree, while another thread transfers the changed files to the server).
Is this possible with Twisted? I have read that having multiple threads calling methods on a reactor is bad news, so is this architecture doomed to failure?
I would appreciate any pointers in the right direction, as I mentioned I am still researching the possibilities - but I plan to use Django for this project, so Python is a must.
The reason for having these two separate communication channels is down to the fact that I would like to make the client multi-threaded (one thread scanning the directory tree, while another thread transfers the changed files to the server).
This reasoning doesn't follow. You can use a single protocol running over a single socket just fine, even if you have a thread wandering the filesystem looking for work to do.
There may be other reasons to want to send file data differently than you send metadata or other structured data between the client and server, though. However, the main one that comes to mind is that you might not want to force commands to wait for files to be completed, and this issue is relieved by PB's FilePager class.
The main thing to remember if you're going to have threads in a Twisted-using application is that whenever you want to invoke a Twisted API from any thread ''except'' the thread in which the reactor is running you must use reactor.callFromThread (or an API built solely on that method, such as twisted.internet.threads.blockingCallFromThread).
callFromThread sends some work (in the form of an object to call) to the reactor thread where the reactor will arrange to call it "soon". Any other Twisted API you invoke from the wrong thread will have undefined results.