I am currently researching the Twisted framework as a way of implementing a network-based backup application, and I would like to achieve something that I cannot find any examples of on the net.
I plan to implement the system using the Perspective Broker, but I will also need a way of transferring binary files from the client to the server. I would like to be able to call a method on the PB, and then use some sort of UID to send the file over a separate data channel.
The reason for having these two separate communication channels is down to the fact that I would like to make the client multi-threaded (one thread scanning the directory tree, while another thread transfers the changed files to the server).
Is this possible with Twisted? I have read that having multiple threads calling methods on a reactor is bad news, so is this architecture doomed to failure?
I would appreciate any pointers in the right direction, as I mentioned I am still researching the possibilities - but I plan to use Django for this project, so Python is a must.
The reason for having these two separate communication channels is down to the fact that I would like to make the client multi-threaded (one thread scanning the directory tree, while another thread transfers the changed files to the server).
This reasoning doesn't follow. You can use a single protocol running over a single socket just fine, even if you have a thread wandering the filesystem looking for work to do.
There may be other reasons to want to send file data differently than you send metadata or other structured data between the client and server, though. However, the main one that comes to mind is that you might not want to force commands to wait for files to be completed, and this issue is relieved by PB's FilePager class.
The main thing to remember if you're going to have threads in a Twisted-using application is that whenever you want to invoke a Twisted API from any thread ''except'' the thread in which the reactor is running you must use reactor.callFromThread (or an API built solely on that method, such as twisted.internet.threads.blockingCallFromThread).
callFromThread sends some work (in the form of an object to call) to the reactor thread where the reactor will arrange to call it "soon". Any other Twisted API you invoke from the wrong thread will have undefined results.
Related
I'm currently in the process of programming a server which can let clients interact with a piece of hardware. For the interested readers it's a device which monitors the wavelength of a set of lasers concurrently (and controls the lasers). The server should be able to broadcast the wavelengths (a list of floats) on a regular basis and let the clients change the settings of the device through dll calls.
My initial idea was to write a custom protocol to handle the communication, but after thinking about how to handle TCP fragmentation and data encoding I bumped into Twisted, and it looks like most of the work is already done if I use perspective broker to share the data and call server methods directly from the clients. This solution might be a bit overkill, but for me it appeared obvious, what do you think?
My main concern arrose when I thought about the clients. Basically I need two types of clients, one which just displays the wavelengths (this should be straight forward) and a second which can change the device settings and get feedback when it's changed. My idea was to create a single client capable of both, but thinking about combining it with our previous system got me thinking... The second client should be controlled from an already rather complex python framework which controls a lot of independant hardware with relatively strict timing requirements, and the settings of the wavelengthmeter should then be called within this sequential code. Now the thing is, how do I mix this with the Twisted client? As I understand Twisted is not threadsafe, so I can't simply spawn a new thread running the reactor and then inteact with it from my main thread, can I?
Any suggestions for writing this server/client framework through different means than Twisted are very welcome!
Thanks
You can start the reactor in a dedicated thread, and then issue calls to it with blockingCallFromThread from your existing "sequential" code.
Also, I'd recommend AMP for the protocol rather than PB, since AMP is more amenable to heterogeneous environments (see amp-protocol.net for independent protocol information), and it sounds like you have a substantial amount of other technology you might want to integrate with this system.
Have you tried zeromq?
It's a library that simplifies working with sockets. It can operate over TCP and implements several topologies, such as publisher/subscriber (for broadcasting data, such as your laser readings) and request/response (that you can use for you control scheme).
There are bindings for several languages and the site is full of examples. Also, it's amazingly fast.
Good stuff.
I need to write a proxy like program in Python, the work flow is very similar to a web proxy. The program sits in between the client and the server, incept requests sent by the client to the server, process the request, then send it to the original server. Of course the protocol used is a private protocol uses TCP.
To minimize the effort, I want to use Python Twisted to handle the request receiving (the part acts as a server) and resending (the part acts as a client).
To maximum the performance, I want to use python multiprocessing (threading has the GIL limit) to separate the program into three parts (processes). The first process runs Twisted to receive requests, put the request in a queue, and return success immediately to the original client. The second process take request from the queue, process the request further and put it to another queue. The 3rd process take request from the 2nd queue and send it to the original server.
I was a new comer to Python Twisted, I know it is event driven, I also heard it's better to not mix Twisted with threading or multiprocessing. So I don't know whether this way is appropriate or is there a more elegant way by just using Twisted?
Twisted has its own event-driven way of running subprocesses which is (in my humble, but correct, opinion) better than the multiprocessing module. The core API is spawnProcess, but tools like ampoule provide higher-level wrappers over it.
If you use spawnProcess, you will be able to handle output from subprocesses in the same way you'd handle any other event in Twisted; if you use multiprocessing, you'll need to develop your own queue-based way of getting output from a subprocess into the Twisted mainloop somehow, since the normal callFromThread API that a thread might use won't work from another process. Depending on how you call it, it will either try to pickle the reactor, or just use a different non-working reactor in the subprocess; either way it will lose your call forever.
ampoule is the first thing I think when reading your question.
It is a simple process pool implementation which uses the AMP protocol to communicate. You can use the deferToAMPProcess function, it's very easy to use.
You can try something like Cooperative Multitasking technique as it's described there http://us.pycon.org/2010/conference/schedule/event/73/ . It's simillar to technique as Glyph menitioned and it's worth a try.
You can try to use ZeroMQ with Twisted but it's really hard and experimental for now :)
I need to run a server side script like Python "forever" (or as long as possible without loosing state), so they can keep sockets open and asynchronously react to events like data received. For example if I use Twisted for socket communication.
How would I manage something like this?
Am I confused? or are there are better ways to implement asynchronous socket communication?
After starting the script once via Apache server, how do I stop it running?
If you are using twisted then it has a whole infrastructure for starting and stopping daemons.
http://twistedmatrix.com/projects/core/documentation/howto/application.html
How would I manage something like this?
Twisted works well for this, read the link above
Am I confused? or are there are better ways to implement asynchronous socket communication?
Twisted is very good at asynchronous socket communications. It is hard on the brain until you get the hang of it though!
After starting the script once via Apache server, how do I stop it running?
The twisted tools assume command line access, so you'd have to write a cgi wrapper for starting / stopping them if I understand what you want to do.
You can just write an script that is continuously in a while block waiting for the connection to happen and waits for a signal to close it.
http://docs.python.org/library/signal.html
Then to stop it you just need to run another script that sends that signal to him.
You can use a ‘double fork’ to run your code in a new background process unbound to the old one. See eg this recipe with more explanatory comments than you could possibly want.
I wouldn't recommend this as a primary way of running background tasks for a web site. If your Python is embedded in an Apache process, for example, you'll be forking more than you want. Better to invoke the daemon separately (just under a similar low-privilege user).
After starting the script once via Apache server, how do I stop it running?
You have your second fork write the process number (pid) of the daemon process to a file, and then read the pid from that file and send it a terminate signal (os.kill(pid, signal.SIG_TERM)).
Am I confused?
That's the question! I'm assuming you are trying to have a background process that responds on a different port to the web interface for some sort of unusual net service. If you merely talking about responding to normal web requests you shoudn't be doing this, you should rely on Apache to handle your sockets and service one request at a time.
I think Comet is what you're looking for. Make sure to take a look at Tornado too.
You may want to look at FastCGI, it sounds exactly like what you are looking for, but I'm not sure if it's under current development. It uses a CGI daemon and a special apache module to communicate with it. Since the daemon is long running, you don't have the fork/exec cost. But as a cost of managing your own resources (no automagic cleanup on every request)
One reason why this style of FastCGI isn't used much anymore is there are ways to embed interpreters into the Apache binary and have them run in server. I'm not familiar with mod_python, but i know mod_perl has configuration to allow long running processes. Be careful here, since a long running process in the server can cause resource leaks.
A general question is: what do you want to do? Why do you need this second process, but yet somehow controlled by apache? Why can'ty ou just build a daemon that talks to apache, why does it have to be controlled by apache?
My question is: which python framework should I use to build my server?
Notes:
This server talks HTTP with it's clients: GET and POST (via pyAMF)
Clients "submit" "tasks" for processing and, then, sometime later, retrieve the associated "task_result"
submit and retrieve might be separated by days - different HTTP connections
The "task" is a lump of XML describing a problem to be solved, and a "task_result" is a lump of XML describing an answer.
When a server gets a "task", it queues it for processing
The server manages this queue and, when tasks get to the top, organises that they are processed.
the processing is performed by a long running (15 mins?) external program (via subprocess) which is feed the task XML and which produces a "task_result" lump of XML which the server picks up and stores (for later Client retrieval).
it serves a couple of basic HTML pages showing the Queue and processing status (admin purposes only)
I've experimented with twisted.web, using SQLite as the database and threads to handle the long running processes.
But I can't help feeling that I'm missing a simpler solution. Am I? If you were faced with this, what technology mix would you use?
I'd recommend using an existing message queue. There are many to choose from (see below), and they vary in complexity and robustness.
Also, avoid threads: let your processing tasks run in a different process (why do they have to run in the webserver?)
By using an existing message queue, you only need to worry about producing messages (in your webserver) and consuming them (in your long running tasks). As your system grows you'll be able to scale up by just adding webservers and consumers, and worry less about your queuing infrastructure.
Some popular python implementations of message queues:
http://code.google.com/p/stomper/
http://code.google.com/p/pyactivemq/
http://xph.us/software/beanstalkd/
I'd suggest the following. (Since it's what we're doing.)
A simple WSGI server (wsgiref or werkzeug). The HTTP requests coming in will naturally form a queue. No further queueing needed. You get a request, you spawn the subprocess as a child and wait for it to finish. A simple list of children is about all you need.
I used a modification of the main "serve forever" loop in wsgiref to periodically poll all of the children to see how they're doing.
A simple SQLite database can track request status. Even this may be overkill because your XML inputs and results can just lay around in the file system.
That's it. Queueing and threads don't really enter into it. A single long-running external process is too complex to coordinate. It's simplest if each request is a separate, stand-alone, child process.
If you get immense bursts of requests, you might want a simple governor to prevent creating thousands of children. The governor could be a simple queue, built using a list with append() and pop(). Every request goes in, but only requests that fit will in some "max number of children" limit are taken out.
My reaction is to suggest Twisted, but you've already looked at this. Still, I stick by my answer. Without knowing you personal pain-points, I can at least share some things that helped me reduce almost all of the deferred-madness that arises when you have several dependent, blocking actions you need to perform for a client.
Inline callbacks (lightly documented here: http://twistedmatrix.com/documents/8.2.0/api/twisted.internet.defer.html) provide a means to make long chains of deferreds much more readable (to the point of looking like straight-line code). There is an excellent example of the complexity reduction this affords here: http://blog.mekk.waw.pl/archives/14-Twisted-inlineCallbacks-and-deferredGenerator.html
You don't always have to get your bulk processing to integrate nicely with Twisted. Sometimes it is easier to break a large piece of your program off into a stand-alone, easily testable/tweakable/implementable command line tool and have Twisted invoke this tool in another process. Twisted's ProcessProtocol provides a fairly flexible way of launching and interacting with external helper programs. Furthermore, if you suddenly decide you want to cloudify your application, it is not all that big of a deal to use a ProcessProtocol to simply run your bulk processing on a remote server (random EC2 instances perhaps) via ssh, assuming you have the keys setup already.
You can have a look at celery
It seems any python web framework will suit your needs. I work with a similar system on a daily basis and I can tell you, your solution with threads and SQLite for queue storage is about as simple as you're going to get.
Assuming order doesn't matter in your queue, then threads should be acceptable. It's important to make sure you don't create race conditions with your queues or, for example, have two of the same job type running simultaneously. If this is the case, I'd suggest a single threaded application to do the items in the queue one by one.
I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.