maintaining socket connection, irregular data frequency - python

I'd like to create a python socket (or SocketServer) that, once connected to a single device, maintains an open connection in order for regular checks to be made to see if any data has been sent. The socket will only listen for one connection.
E.g.:
def get_data(conn):
response='back atcha'
data = conn.recv(1024)
print 'get_data:',data
if data:
conn.send(response)
s = open_socket()
conn, addr = s.accept()
while True:
print 'running'
time.sleep(1)
get_data(conn)
#do other stuff
Once the server socket is bound and the connection has been accepted, the socket blocks when running a .recv until either the connecting client sends some data or closes its socket. As I am waiting for irregular data (could be seconds, could be a day), and the program needs to perform other tasks in the meantime, this blocking is a problem.
I don't want the client to close its socket, as it may need to send (or receive) data at any time to (from) the server. Is the only solution to run this in a separate thread, or is there a simple way to setup the client/server sockets to maintain the connection forever (and is this safe? It'll be running on a VLAN) while not blocking when no data has been received?

You're looking for non-blocking I/O, also called asynchronous I/O. Using a separate thread which blocks on this is very inefficient but it's pretty straightforward.
For a Python asynchronous I/O framework I highly recommend Twisted. Also check out asyncore which comes with the standard library.

Related

Can you write to a python socket that is blocked on recvfrom

So, suppose in one thread the UDP socket is hanging on the recvfrom() call, waiting for incoming messages.
However, in another thread you would like to write through the socket to another address.
Is this possible? If not, what are my alternatives?

socket server responding multiple requests simultaneously from a client socket

I am building a socket server with Python.
This server
receives data from client
does something here (approximately it takes about 10 sec in maximum depending on input data)
sends back data after some works done above
This system works fine unless client doesn't send data simultaneously in a row. For example, say server takes 5 seconds to process data and client sends data every 10 seconds. The problem, however, is that client send multiple requests at a time, thus causing a delay. Currently, client cannot send data to server unless server is ready to receive data which means that server is not doing any work. Below are what I want to build.
a) build a queue at socket server whose main task is to make a queue of input data so that client can send data to server even when server is busy
b) make a thread(here, I'm bit confused with concurrency and parallelism. Work in socket focused on computation rather than system call) at socket server so that server can do work 'simultaneously'.
c) send back data to client socket
My questions are as follows.
Is it Queue that I need to use in order to achieve a) ?
Is it thread or something else that I need to use in order to achieve b)?
Thanks in advance
Best
Gee
Yeah something like this could work.
First, you'll need a thread to receive and send data. If you have limited amount of clients, you can create a thread per client, but it's not an option for a more or less robust system. In order to be able to serve multiple clients in a single thread, the sockets should be nonblocking. Otherwise one long transmission would block other transmissions. Nonblocking code has more sophisticated structure that uses select, so I would advice to spend some time reading about it.
Then you'll need a thread to do the math. Or several threads/processes if "the math" is taking long to execute.
Last but not least, these socket threads and a "math" thread should use two queues to exchange data. Simple lists's are enough, but make sure they are synchronized. Guard them with mutexes, or locks. This is another vast topic that is worth reading about.

Wait for signal to start generating data from another process in python

I have two independent processes in python: producer and consumer. Producer generates files and consumer does some processing on files.
To test both applications, I find myself constantly starting two programs, and producer has a delay function, etc. which is a pain.
What is the quickest way to implement some kind of signaling machinery in python so that consumer says "GO!" and producer starts doing things it does.
This way I can have producer running all the time.
One simple way to do this (you didn't mention what platform(s) you care about, so I'll assume you want something cross-platform) is a UDP socket with a known port. If the consumer just listens on localhost port 12345, it can block on a sock.recvfrom call every time it needs to wait, and the producer can do a sock.sendto to notify it.
For example, here's a consumer:
#!/usr/bin/env python
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.bind(('127.0.0.1', 12345))
while True:
dummy, addr = sock.recvfrom(1024)
# work on files until done
And a producer:
#!/usr/bin/env python
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
for thing in things:
# produce files for consumer
sock.sendto(b'X', ('127.0.0.1', 12345))
Other things to consider:
On Unix, there are advantages to using Unix sockets instead of UDP sockets.
On Windows, there are advantages to using named pipes instead of sockets.
You may want to make the consumer a daemon (with or without a built-in "service" tool to start and stop it) on Unix or a Windows service on Windows. (You can even merge the service tool into the producer, so, e.g., the default behavior is to start a consumer if one isn't there, then shut it down when it's done, but it can also be used to launch a long-running consumer in the background.)
You can extend this pretty easily to send more than just an empty notification—e.g., send a different message if you're done producing.

Multi Port Network Application

I want to create a python network application that can run on multiple ports (ex: TCP:1234, TCP:5678, etc).
So I have lets say n number of Sockets, each listening for a client connection. I programmed a simple network application that listens to a range of ports but when I run the application it gets stuck at the listening phase of the first socket process!
How can I make my single python program when run to listen to N number of ports and each waiting for a client to connect to it. All sockets are running and listening at the same time.
Socket/Process #1: Listening on TCP Port 5000
Socket/Process #2: Listening on TCP Port 5001
Socket/Process #3: Listening on TCP Port 5002
...
Socket/Process #N: Listening on TCP Port 6000
Appreciate any ideas.
#!/usr/bin/env python
import socket
def getPortList():
ports=[]
nPort=int(raw_input("# how many ports you want? "))
j = 0
for i in range(0,nPort):
ports.append(int(raw_input("Enter port number: ")))
j+=1
return ports
def myTCPSocket(port=5000):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR,1)
s.bind(("", int(port)))
print ("\nWaiting for connections!\n")
s.listen(5)
(clientsock, clientaddr) = s.accept()
print(clientaddr)
data = "start"
while len(data):
clientsock.send("\nWelcome to Echo Server\n")
data = clientsock.recv(1024)
print ("Data sent is: ", data)
clientsock.send(data)
if data == "exit\r\n":
clientsock.close()
plst = getPortList()
for item in plst:
myTCPSocket(item)
Listening on multiple sockets is really no different from listening on a single socket.
You already need to handle the listener socket and all client connection sockets somehow. You can do this by:
Writing a loop around select.select (or poll, kqueue, epoll, etc.).
Using the standard-library reactor asyncore.
Using a third-party reactor or proactor like Twisted.
Using OS-specific functionality (e.g., using a Cocoa runloop and server via PyObjC).
Creating a thread for each new connection.
Creating a subprocess for each new connection.
Almost all of these schemes will also work for dealing with multiple listeners. The simplest thing to do is to combine the two into one (e.g., a single select loop that handles all of their listeners and all of their client sockets, or a separate thread for each listener and client socket).
For performance or debugging reasons, you might want to instead use a two-tier hybrid approach (e.g., a thread for each listener, each with a select loop for all of its client sockets, or a process for each listener, each with a thread for each client socket). But if you don't have any good reason to do that, don't add the complexity.
http://pastebin.com/QebZMKz3 shows a simple single-select implementation. Here's the output:
$ ./multiserve.py 22222 22223 &
(('127.0.0.1', 22222), ' listening')
(('127.0.0.1', 22223), ' listening')
$ echo 'abc' | nc localhost 22222
(('127.0.0.1', 22222), ' <- ', ('127.0.0.1', 64633))
(('127.0.0.1', 64633), ' <- ', 'abc\n')
(('127.0.0.1', 64633), ' EOF')
If you think you'll never actually need to handle two simultaneous clients… well, you're probably wrong, but… You can use most of the above techniques, and it may be marginally simpler. For example, you can select on the listeners, and then do the accept and client-socket communication synchronously before returning to the loop. Or you can create a process or thread for each listener but handle the accept and client-socket communication synchronously within each. And so on.
http://pastebin.com/wLVLT49i shows a simple example that seems to be what you were trying to do. Since it uses a process for each socket (via os.fork), it does allow simultaneous connections on different ports; since it doesn't do anything asynchronously within each process, it doesn't allow simultaneous connections to the same port. (And of course it's POSIX-specific because it uses fork.)
If you're looking to learn how to write asynchronous network servers in the first place, I'd suggest you do two different implementations: select and threads. They conceptually fundamental, and relatively easy to code.
First, for select, you have to get your head around the idea of an event loop—the events are each new incoming connection, each incoming network packet on an existing connection, even each time a pipe you were writing to gets unclogged. The tricky bit here is that, as with any event loop, you need to handle each event and return without blocking, and without spending too much CPU time. For example, for an echo server, you can't just do a write on the other sockets, because some of them might be busy. So instead, you have to stick the output in a write buffer for each socket, and they'll get it in some future run through the event loop, when thye're ready.
Meanwhile, for threads, a separate thread for each connection seems like it makes everything trivial, but what happens when you need to echo a message from one thread to another? You either need some form of inter-thread communication, or shared data with inter-thread synchronization. So, you might have a Queue for writes on each socket, so any other socket's thread can just push a message onto the queue.
Neither of these will be as good as what a well-turned reactor or proactor can do, but it'd worth learning the basics—especially since you're going to face both the blocking issue (from select) and the communication issue (from threads) with any solution, and they'll be much more mysterious and harder to debug when you're working at a higher level.

What is the right ZMQ architecture for a webserver sending fire-and-forget tasks to a bunch of webservers?

I have a website which sends out heavy processing tasks to a worker server. Right now, there is only one worker server however in the future more will be added. These jobs are quite time-consuming (takes 5mins - 1 hour). The idea is to have a configuration where just building a new worker server should suffice to increase the capacity of the whole system, without needing extra configuration in the webserver parts.
Currently, I've done a basic implementation using python-zeromq, with the PUSH/PULL architecture.
Everytime there's a new job request, the webserver creates a socket, connects to one of the workers and sends the job (no reply needed, this is a fire-and-forget type of job):
context = zmq.Context()
socket = context.socket(zmq.PUSH)
socket.connect("tcp://IP:5000")
socket.send(msg)
And on the worker side this is running all the time:
context = zmq.Context()
socket = context.socket(zmq.PULL)
# bind to port in it's own IP
socket.bind("tcp://IP:5000")
print("Listening for messages...")
while True:
msg = socket.recv()
<do something>
Now I looked more into this, and I think this is not quite the right way of doing it. Since adding a new worker server would require to add the IP of it to the webserver script, connect to both of them etc.
I would rather prefer the webserver to have a persistent socket on (and not create one everytime), and have workers connect to the webserver instead. Sort of like here:
https://github.com/taotetek/blog_examples/blob/master/python_multiprocessing_with_zeromq/workqueue_example.py
In short, as opposed to what is above, webserver's socket, binds to its own IP, and workers connects to it.I suppose then jobs are sent via round-robin style.
However what I'm worried about is, what happens if the webserver gets restarted (something that happens quite often) or gets offline for a while. Using zeromq, will all worker
connections will hang? Somehow become invalid? If the webserver goes down, will the current queue disappear?
In the current setup, things seem to run somewhat OK, but I'm not 100% sure what's the right (and not too complex) way of doing this.
From the ZeroMQ Guide:
Components can come and go dynamically and ØMQ will automatically reconnect.
If the underlying tcp connection is broken, ZeroMQ will repeatedly try to reconnect, sending your message once the connection succeeds.
Note that PAIR sockets are an exception. They don't automatically reconnect. (See the zmq_socket docs.)
Binding on the server might work. Are you sure you won't ever need more than one web server, though? I'd consider putting a broker between your server(s) and workers.
Either way, I think persistent sockets are the way to go.

Categories