I've used twisted to make an SSH server similar to the one shown here. My question is, is it possible to use multithreading to run multiple commands simultaneously? I tried making a do_ function that started a thread and that didn't accomplish what I was after. Should I make multiple client connections instead?
Perhaps you should look at adding Twisted sub process spawning support to your code?
http://twistedmatrix.com/documents/current/core/howto/process.html
... it's a bit hard to be more specific as you haven't included any code for us to comment on.
Related
i have 4 systems , where in one i am using as a master and the rest 3 as slaves. I want to execute a set of functions in client systems and fetch back the result, to achieve this i had previously used Parallel Python , unfortunately as soon as job_server is created in it , it internally executes the given function using all systems. I want to individually assign particular function to be executed on an individual client machine , i have no idea on how to proceed with coding . Is there any framework in python, which allows users to do that ?
There is the multiprocessing module: http://docs.python.org/library/multiprocessing.html
You can find examples there that do exactly what you want (see Remote Managers).
As a side notice: I soon stumbled upon limitations in the multiprocessing module and now use PyRo as middleware. It's very simple and powerful but requires a bit more work to set up a basic framework.
There was a talk at a Pycon in the past few years about a tool that managed remote processes via ssh, which sounded really cool and I meant to check out. I've forgotten what the tool was, but you might find it by looking at the Pycon talks for the past few years.
The Python Parallel Processing wiki page has a big list of tools that might help you.
I don't quite understand what you have now and what you're trying to do. Pyro is easy to use, pretty fast and well documented for letting multiple python processes (on multiple systems) interact with each other. Take a look at that. You'll need to ensure that something is running on all of them so they're listening for commands, but that's not too hard.
In Pythomnic you can have multiple processes on multiple servers exchanging synchronous calls like this:
result = pmnc("other_process").module.method(...) # invokes remote method
So you can eiher have 3 distinctly named slaves and pick call target manually:
pmnc("slave_1").foo.bar() # calls bar() in foo.py on slave_1
or have slaves identically named and have the framework pick any available (if they are interchangeable of course)
pmnc("slave").foo.bar() # calls bar() in foo.py on any slave
check out RPyc
I'm making a python script that needs to do 3 things simultaneously.
What is a good way to achieve this as do to what i've heard about the GIL i'm not so lean into using threads anymore.
2 of the things that the script needs to do will be heavily active, they will have lots of work to do and then i need to have the third thing reporting to the user over a socket when he asks (so it will be like a tiny server) about the status of the other 2 processes.
Now my question is what would be a good way to achieve this? I don't want to have three different script and also due to GIL using threads i think i won't get much performance and i'll make things worse.
Is there a fork() for python like in C so from my script so fork 2 processes that will do their job and from the main process to report to the user? And how can i communicate from the forked processes with the main process?
LE:: to be more precise 1thread should get email from a imap server and store them into a database, another thread should get messages from db that needs to be sent and then send them and the main thread should be a tiny http server that will just accept one url and will show the status of those two threads in json format. So are threads oK? will the work be done simultaneously or due to the gil there will be performance issues?
I think you could use the multiprocessing package that has an API similar to the threading package and will allow you to get a better performance with multiple cores on a single CPU.
To view the gain of performance using multiprocessing instead threading, check on this link about the average time comparison of the same program using multiprocessing x threading.
The GIL is really only something to care about if you want to do multiprocessing, that is spread the load over several cores/processors. If that is the case, and it kinda sounds like it from your description, use multiprocessing.
If you just need to do three things "simultaneously" in that way that you need to wait in the background for things to happen, then threads are just fine. That's what threads are for in the first place. 8-I)
I wonder what is the best way to handle parallel SSH connections in python.
I need to open several SSH connections to keep in background and to feed commands in interactive or timed batch way.
Is this possible to do it with the paramiko libraries? It would be nice not to spawn a different SSH process for each connection.
Thanks.
It might be worth checking out what options are available in Twisted. For example, the Twisted.Conch page reports:
http://twistedmatrix.com/users/z3p/files/conch-talk.html
Unlike OpenSSH, the Conch server does not fork a process for each incoming connection. Instead, it uses the Twisted reactor to multiplex the connections.
Yes, you can do this with paramiko.
If you're connecting to one server, you can run multiple channels through a single connection. If you're connecting to multiple servers, you can start multiple connections in separate threads. No need to manage multiple processes, although you could substitute the multiprocessing module for the threading module and have the same effect.
I haven't looked into twisted conch in a while, but it looks like it getting updates again, which is nice. I couldn't give you a good feature comparison between the two, but I find paramiko is easier to get going. It takes a little more effort to get into twisted, but it could be well worth it if you're doing other network programming.
You can simply use subprocess.Popen for that purpose, without any problems.
However, you might want to simply install cronjobs on the remote machines. :-)
Reading the paramiko API docs, it looks like it is possible to open one ssh connection, and multiplex as many ssh tunnels on top of that as are wished. Common ssh clients (openssh) often do things like this automatically behind the scene if there is already a connection open.
I've tried clusterssh, and I don't like the multiwindow model. Too confusing in the common case when everything works.
I've tried pssh, and it has a few problems with quotation escaping and password prompting.
The best I've used is dsh:
Description: dancer's shell, or distributed shell
Executes specified command on a group of computers using remote shell
methods such as rsh or ssh.
.
dsh can parallelise job submission using several algorithms, such as using
fan-out method or opening as much connections as possible, or
using a window of connections at one time.
It also supports "interactive mode" for interactive maintenance of
remote hosts.
.
This tool is handy for administration of PC clusters, and multiple hosts.
Its very flexible in scheduling and topology: you can request something close to a calling tree if need be. But the default is a simple topology of one command node to many leaf nodes.
http://www.netfort.gr.jp/~dancer/software/dsh.html
This might not be relevant to your question. But there are tools like pssh, clusterssh etc. that can parallely spawn connections. You can couple Expect with pssh to control them too.
I need to run a server side script like Python "forever" (or as long as possible without loosing state), so they can keep sockets open and asynchronously react to events like data received. For example if I use Twisted for socket communication.
How would I manage something like this?
Am I confused? or are there are better ways to implement asynchronous socket communication?
After starting the script once via Apache server, how do I stop it running?
If you are using twisted then it has a whole infrastructure for starting and stopping daemons.
http://twistedmatrix.com/projects/core/documentation/howto/application.html
How would I manage something like this?
Twisted works well for this, read the link above
Am I confused? or are there are better ways to implement asynchronous socket communication?
Twisted is very good at asynchronous socket communications. It is hard on the brain until you get the hang of it though!
After starting the script once via Apache server, how do I stop it running?
The twisted tools assume command line access, so you'd have to write a cgi wrapper for starting / stopping them if I understand what you want to do.
You can just write an script that is continuously in a while block waiting for the connection to happen and waits for a signal to close it.
http://docs.python.org/library/signal.html
Then to stop it you just need to run another script that sends that signal to him.
You can use a ‘double fork’ to run your code in a new background process unbound to the old one. See eg this recipe with more explanatory comments than you could possibly want.
I wouldn't recommend this as a primary way of running background tasks for a web site. If your Python is embedded in an Apache process, for example, you'll be forking more than you want. Better to invoke the daemon separately (just under a similar low-privilege user).
After starting the script once via Apache server, how do I stop it running?
You have your second fork write the process number (pid) of the daemon process to a file, and then read the pid from that file and send it a terminate signal (os.kill(pid, signal.SIG_TERM)).
Am I confused?
That's the question! I'm assuming you are trying to have a background process that responds on a different port to the web interface for some sort of unusual net service. If you merely talking about responding to normal web requests you shoudn't be doing this, you should rely on Apache to handle your sockets and service one request at a time.
I think Comet is what you're looking for. Make sure to take a look at Tornado too.
You may want to look at FastCGI, it sounds exactly like what you are looking for, but I'm not sure if it's under current development. It uses a CGI daemon and a special apache module to communicate with it. Since the daemon is long running, you don't have the fork/exec cost. But as a cost of managing your own resources (no automagic cleanup on every request)
One reason why this style of FastCGI isn't used much anymore is there are ways to embed interpreters into the Apache binary and have them run in server. I'm not familiar with mod_python, but i know mod_perl has configuration to allow long running processes. Be careful here, since a long running process in the server can cause resource leaks.
A general question is: what do you want to do? Why do you need this second process, but yet somehow controlled by apache? Why can'ty ou just build a daemon that talks to apache, why does it have to be controlled by apache?
I'm building a program that has a class used locally, but I want the same class to be used the same way over the network. This means I need to be able to make synchronous calls to any of its public methods. The class reads and writes files, so I think XML-RPC is too much overhead. I created a basic rpc client/server using the examples from twisted, but I'm having trouble with the client.
c = ClientCreator(reactor, Greeter)
c.connectTCP(self.host, self.port).addCallback(request)
reactor.run()
This works for a single call, when the data is received I'm calling reactor.stop(), but if I make any more calls the reactor won't restart. Is there something else I should be using for this? maybe a different twisted module or another framework?
(I'm not including the details of how the protocol works, because the main point is that I only get one call out of this.)
Addendum & Clarification:
I shared a google doc with notes on what I'm doing. http://docs.google.com/Doc?id=ddv9rsfd_37ftshgpgz
I have a version written that uses fuse and can combine multiple local folders into the fuse mount point. The file access is already handled within a class, so I want to have servers that give me network access to the same class. After continuing to search, I suspect pyro (http://pyro.sourceforge.net/) might be what I'm really looking for (simply based on reading their home page right now) but I'm open to any suggestions.
I could achieve similar results by using an nfs mount and combining it with my local folder, but I want all of the peers to have access to the same combined filesystem, so that would require every computer to bee an nfs server with a number of nfs mounts equal to the number of computers in the network.
Conclusion:
I have decided to use rpyc as it gave me exactly what I was looking for. A server that keeps an instance of a class that I can manipulate as if it was local. If anyone is interested I put my project up on Launchpad (http://launchpad.net/dstorage).
If you're even considering Pyro, check out RPyC first, and re-consider XML-RPC.
Regarding Twisted: try leaving the reactor up instead of stopping it, and just ClientCreator(...).connectTCP(...) each time.
If you self.transport.loseConnection() in your Protocol you won't be leaving open connections.
For a synchronous client, Twisted probably isn't the right option. Instead, you might want to use the socket module directly.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((self.host, self.port))
s.send(output)
data = s.recv(size)
s.close()
The recv() call might need to be repeated until you get an empty string, but this shows the basics.
Alternatively, you can rearrange your entire program to support asynchronous calls...
Why do you feel that it needs to be synchronous?
If you want to ensure that only one of these is happening at a time, invoke all of the calls through a DeferredSemaphore so you can rate limit the actual invocations (to any arbitrary value).
If you want to be able to run multiple streams of these at different times, but don't care about concurrency limits, then you should at least separate reactor startup and teardown from the invocations (the reactor should run throughout the entire lifetime of the process).
If you just can't figure out how to express your application's logic in a reactor pattern, you can use deferToThread and write a chunk of purely synchronous code -- although I would guess this would not be necessary.
If you are using Twisted you should probably know that:
You will not be making synchronous calls to any network service
The reactor can only ever be run once, so do not stop it (by calling reactor.stop()) until your application is ready to exit.
I hope this answers your question. I personally believe that Twisted is exactly the correct solution for your use case, but that you need to work around your synchronicity issue.
Addendum & Clarification:
Part of what I don't understand is
that when I call reactor.run() it
seems to go into a loop that just
watches for network activity. How do I
continue running the rest of my
program while it uses the network? if
I can get past that, then I can
probably work through the
synchronicity issue.
That is exactly what reactor.run() does. It runs a main loop which is an event reactor. It will not only wait for entwork events, but anything else you have scheduled to happen. With Twisted you will need to structure the rest of your application in a way to deal with its asynchronous nature. Perhaps if we knew what kind of application it is, we could advise.