I'm trying to use twisted to create a cluster of computers that run one program on a piece of a larger dataset.
My "servers" receive a chunk of data from the client and run command x on it.
My "client" connects to multiple servers giving them each a chunk of data and telling them what parameters to run command x with.
My question is: is there a way to set up the reactor loop to connect to many servers:
reactor.connectTCP('localhost', PORT, BlastFactory())
reactor.run()
or do I have to swap client and server in my paradigm?
Just call connectTCP multiple times.
The trick, of course, is that reactor.run() blocks "forever" (the entire run-time of your program) so you don't want to call that multiple times.
You have several options; you can set up a timed call to make future connections, or you can start new connections from events on your connection (like connectionLost or clientConnectionFailed).
Or, at the simplest, you can just set up multiple connection attempts before reactor.run() kicks off the whole show, like this:
for host in hosts:
reactor.connectTCP(host, PORT, BlastFactory())
reactor.run()
Related
I am building a socket server with Python.
This server
receives data from client
does something here (approximately it takes about 10 sec in maximum depending on input data)
sends back data after some works done above
This system works fine unless client doesn't send data simultaneously in a row. For example, say server takes 5 seconds to process data and client sends data every 10 seconds. The problem, however, is that client send multiple requests at a time, thus causing a delay. Currently, client cannot send data to server unless server is ready to receive data which means that server is not doing any work. Below are what I want to build.
a) build a queue at socket server whose main task is to make a queue of input data so that client can send data to server even when server is busy
b) make a thread(here, I'm bit confused with concurrency and parallelism. Work in socket focused on computation rather than system call) at socket server so that server can do work 'simultaneously'.
c) send back data to client socket
My questions are as follows.
Is it Queue that I need to use in order to achieve a) ?
Is it thread or something else that I need to use in order to achieve b)?
Thanks in advance
Best
Gee
Yeah something like this could work.
First, you'll need a thread to receive and send data. If you have limited amount of clients, you can create a thread per client, but it's not an option for a more or less robust system. In order to be able to serve multiple clients in a single thread, the sockets should be nonblocking. Otherwise one long transmission would block other transmissions. Nonblocking code has more sophisticated structure that uses select, so I would advice to spend some time reading about it.
Then you'll need a thread to do the math. Or several threads/processes if "the math" is taking long to execute.
Last but not least, these socket threads and a "math" thread should use two queues to exchange data. Simple lists's are enough, but make sure they are synchronized. Guard them with mutexes, or locks. This is another vast topic that is worth reading about.
I'd like to create a python socket (or SocketServer) that, once connected to a single device, maintains an open connection in order for regular checks to be made to see if any data has been sent. The socket will only listen for one connection.
E.g.:
def get_data(conn):
response='back atcha'
data = conn.recv(1024)
print 'get_data:',data
if data:
conn.send(response)
s = open_socket()
conn, addr = s.accept()
while True:
print 'running'
time.sleep(1)
get_data(conn)
#do other stuff
Once the server socket is bound and the connection has been accepted, the socket blocks when running a .recv until either the connecting client sends some data or closes its socket. As I am waiting for irregular data (could be seconds, could be a day), and the program needs to perform other tasks in the meantime, this blocking is a problem.
I don't want the client to close its socket, as it may need to send (or receive) data at any time to (from) the server. Is the only solution to run this in a separate thread, or is there a simple way to setup the client/server sockets to maintain the connection forever (and is this safe? It'll be running on a VLAN) while not blocking when no data has been received?
You're looking for non-blocking I/O, also called asynchronous I/O. Using a separate thread which blocks on this is very inefficient but it's pretty straightforward.
For a Python asynchronous I/O framework I highly recommend Twisted. Also check out asyncore which comes with the standard library.
I want to create a python network application that can run on multiple ports (ex: TCP:1234, TCP:5678, etc).
So I have lets say n number of Sockets, each listening for a client connection. I programmed a simple network application that listens to a range of ports but when I run the application it gets stuck at the listening phase of the first socket process!
How can I make my single python program when run to listen to N number of ports and each waiting for a client to connect to it. All sockets are running and listening at the same time.
Socket/Process #1: Listening on TCP Port 5000
Socket/Process #2: Listening on TCP Port 5001
Socket/Process #3: Listening on TCP Port 5002
...
Socket/Process #N: Listening on TCP Port 6000
Appreciate any ideas.
#!/usr/bin/env python
import socket
def getPortList():
ports=[]
nPort=int(raw_input("# how many ports you want? "))
j = 0
for i in range(0,nPort):
ports.append(int(raw_input("Enter port number: ")))
j+=1
return ports
def myTCPSocket(port=5000):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,1)
s.bind(("", int(port)))
print ("\nWaiting for connections!\n")
s.listen(5)
(clientsock, clientaddr) = s.accept()
print(clientaddr)
data = "start"
while len(data):
clientsock.send("\nWelcome to Echo Server\n")
data = clientsock.recv(1024)
print ("Data sent is: ", data)
clientsock.send(data)
if data == "exit\r\n":
clientsock.close()
plst = getPortList()
for item in plst:
myTCPSocket(item)
Listening on multiple sockets is really no different from listening on a single socket.
You already need to handle the listener socket and all client connection sockets somehow. You can do this by:
Writing a loop around select.select (or poll, kqueue, epoll, etc.).
Using the standard-library reactor asyncore.
Using a third-party reactor or proactor like Twisted.
Using OS-specific functionality (e.g., using a Cocoa runloop and server via PyObjC).
Creating a thread for each new connection.
Creating a subprocess for each new connection.
Almost all of these schemes will also work for dealing with multiple listeners. The simplest thing to do is to combine the two into one (e.g., a single select loop that handles all of their listeners and all of their client sockets, or a separate thread for each listener and client socket).
For performance or debugging reasons, you might want to instead use a two-tier hybrid approach (e.g., a thread for each listener, each with a select loop for all of its client sockets, or a process for each listener, each with a thread for each client socket). But if you don't have any good reason to do that, don't add the complexity.
http://pastebin.com/QebZMKz3 shows a simple single-select implementation. Here's the output:
$ ./multiserve.py 22222 22223 &
(('127.0.0.1', 22222), ' listening')
(('127.0.0.1', 22223), ' listening')
$ echo 'abc' | nc localhost 22222
(('127.0.0.1', 22222), ' <- ', ('127.0.0.1', 64633))
(('127.0.0.1', 64633), ' <- ', 'abc\n')
(('127.0.0.1', 64633), ' EOF')
If you think you'll never actually need to handle two simultaneous clients… well, you're probably wrong, but… You can use most of the above techniques, and it may be marginally simpler. For example, you can select on the listeners, and then do the accept and client-socket communication synchronously before returning to the loop. Or you can create a process or thread for each listener but handle the accept and client-socket communication synchronously within each. And so on.
http://pastebin.com/wLVLT49i shows a simple example that seems to be what you were trying to do. Since it uses a process for each socket (via os.fork), it does allow simultaneous connections on different ports; since it doesn't do anything asynchronously within each process, it doesn't allow simultaneous connections to the same port. (And of course it's POSIX-specific because it uses fork.)
If you're looking to learn how to write asynchronous network servers in the first place, I'd suggest you do two different implementations: select and threads. They conceptually fundamental, and relatively easy to code.
First, for select, you have to get your head around the idea of an event loop—the events are each new incoming connection, each incoming network packet on an existing connection, even each time a pipe you were writing to gets unclogged. The tricky bit here is that, as with any event loop, you need to handle each event and return without blocking, and without spending too much CPU time. For example, for an echo server, you can't just do a write on the other sockets, because some of them might be busy. So instead, you have to stick the output in a write buffer for each socket, and they'll get it in some future run through the event loop, when thye're ready.
Meanwhile, for threads, a separate thread for each connection seems like it makes everything trivial, but what happens when you need to echo a message from one thread to another? You either need some form of inter-thread communication, or shared data with inter-thread synchronization. So, you might have a Queue for writes on each socket, so any other socket's thread can just push a message onto the queue.
Neither of these will be as good as what a well-turned reactor or proactor can do, but it'd worth learning the basics—especially since you're going to face both the blocking issue (from select) and the communication issue (from threads) with any solution, and they'll be much more mysterious and harder to debug when you're working at a higher level.
I have a website which sends out heavy processing tasks to a worker server. Right now, there is only one worker server however in the future more will be added. These jobs are quite time-consuming (takes 5mins - 1 hour). The idea is to have a configuration where just building a new worker server should suffice to increase the capacity of the whole system, without needing extra configuration in the webserver parts.
Currently, I've done a basic implementation using python-zeromq, with the PUSH/PULL architecture.
Everytime there's a new job request, the webserver creates a socket, connects to one of the workers and sends the job (no reply needed, this is a fire-and-forget type of job):
context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.connect("tcp://IP:5000")
socket.send(msg)
And on the worker side this is running all the time:
context = zmq.Context()
socket = context.socket(zmq.PULL)
# bind to port in it's own IP
socket.bind("tcp://IP:5000")
print("Listening for messages...")
while True:
msg = socket.recv()
<do something>
Now I looked more into this, and I think this is not quite the right way of doing it. Since adding a new worker server would require to add the IP of it to the webserver script, connect to both of them etc.
I would rather prefer the webserver to have a persistent socket on (and not create one everytime), and have workers connect to the webserver instead. Sort of like here:
https://github.com/taotetek/blog_examples/blob/master/python_multiprocessing_with_zeromq/workqueue_example.py
In short, as opposed to what is above, webserver's socket, binds to its own IP, and workers connects to it.I suppose then jobs are sent via round-robin style.
However what I'm worried about is, what happens if the webserver gets restarted (something that happens quite often) or gets offline for a while. Using zeromq, will all worker
connections will hang? Somehow become invalid? If the webserver goes down, will the current queue disappear?
In the current setup, things seem to run somewhat OK, but I'm not 100% sure what's the right (and not too complex) way of doing this.
From the ZeroMQ Guide:
Components can come and go dynamically and ØMQ will automatically reconnect.
If the underlying tcp connection is broken, ZeroMQ will repeatedly try to reconnect, sending your message once the connection succeeds.
Note that PAIR sockets are an exception. They don't automatically reconnect. (See the zmq_socket docs.)
Binding on the server might work. Are you sure you won't ever need more than one web server, though? I'd consider putting a broker between your server(s) and workers.
Either way, I think persistent sockets are the way to go.
I've been struggling along with sockets, making OK progress, but I keep running into problems, and feeling like I must be doing something wrong for things to be this hard.
There are plenty of tutorials out there that implement a TCP client and server, usually where:
The server runs in an infinite loop, listening for and echoing back data to clients.
The client connects to the server, sends a message, receives the same thing back, and then quits.
That I can handle. However, no one seems to go into the details of what you should and shouldn't be doing with sequential communication between the same two machines/processes.
I'm after the general sequence of function calls for doing multiple messages, but for the sake of asking a real question, here are some constraints:
Each event will be a single message client->server, and a single string response.
The messages are pretty short, say 100 characters max.
The events occur relatively slowly, max of say, 1 every 5 seconds, but usually less than half that speed.
and some specific questions:
Should the server be closing the connection after its response, or trying to hang on to the connection until the next communication?
Likewise, should the client close the connection after it receives the response, or try to reuse the connection?
Does a closed connection (either through close() or through some error) mean the end of the communication, or the end of the life of the entire object?
Can I reuse the object by connecting again?
Can I do so on the same port of the server?
Or do I have reinstantiate another socket object with a fresh call to socket.socket()?
What should I be doing to avoid getting 'address in use' errors?
If a recv() times out, is the socket reusable, or should I throw it away? Again, can I start a new connection with the same socket object, or do I need a whole new socket?
If you know that you will communicate between the two processes soon again, there is no need for closing the connection. If your server has to deal with other connections as well, you want to make it multithreaded, though.
The same. You know that both have to do the same thing, right?
You have to create a new socket on the client and you can also not reuse the socket on the server side: you have to use the new socket returned by the next (clientsocket, address) = serversocket.accept() call. You can use the same port. (Think of webservers, they always accept connections to the same port, from thousands of clients)
In both cases (closing or not closing), you should however have a message termination sign, for example a \n. Then you have to read from the socket until you have reached the sign. This usage is so common, that python has a construct for that: socket.makefile and file.readline
UPDATE:
Post the code. Probably you have not closed the connection correctly.
You can call recv() again.
UPDATE 2:
You should never assume that the connection is reliable, but include mechanisms to reconnect in case of errors. Therefore it is ok to try to use the same connection even if there are longer gaps.
As for errors you get: if you need specific help for your code, you should post small (but complete) examples.