How to replace select() with epoll() in python?

How to replace select() with epoll() in python? - python

I'm programming python web socket chat server. I made working server using function select() to listen to clients, but when I connect more than 512 clients on Windows or 1024 clients on Linux, my server crashes. After research I found that this is system limit and I need to use poll() or epoll() for more connections.
This is part of code using select() that I need to rewrite using epoll() or poll() function:
from select import select
rList, wList, xList = select(listeners, writers, listeners, interval)
for ready in wList:
function1()
for ready in rList:
function2()
for failed in xList:
function3()
How can I do the same thing using epoll() or poll()? It still needs to call these functions above.

I believe it should be
rList, wList, xList = select.select(listeners, writers, listeners, interval)
you can change it to
poll = select.poll()
poll.register(eachconnection)
poll.poll(5000)

Related

Why is ZeroMQ poller not receiving messages (python)?

I'm trying to use the ZeroMQ Poller() functionality with two sockets in python:
import zmq
# Prepare our context and sockets
context = zmq.Context()
receiver = context.socket(zmq.DEALER)
receiver.connect("ipc:///tmp/interface-transducer")
subscriber = context.socket(zmq.SUB)
subscriber.bind("ipc:///tmp/fast-service")
subscriber.setsockopt(zmq.SUBSCRIBE, b"10001")
# Initialize poll set
poller = zmq.Poller()
poller.register(receiver, zmq.POLLIN)
poller.register(subscriber, zmq.POLLIN)
# Process messages from both sockets
while True:
try:
socks = dict(poller.poll())
except KeyboardInterrupt:
break
if receiver in socks:
message = receiver.recv()
print("RECEIVER OK\n")
if subscriber in socks:
message = subscriber.recv()
print("SUBSCRIBER OK\n")
And then the server that sends messages as a ROUTER is described as:
def main():
context = zmq.Context()
router = context.socket(zmq.ROUTER)
router.bind("ipc:///tmp/interface-transducer")
while True:
identity = b'electrode-service'
b_identity = identity
router.send_multipart([b_identity, b'[1,2]'])
print("Sent")
time.sleep(1)
if __name__ == "__main__":
main()
But when I run these two processes, it does not work as expected, the poller-script does not print anything. What could be the problem of such implementation?

Q : "What could be the problem of such implementation?"
such implementation is prone to deadlock & fails due to using exclusively the blocking-forms of .poll() & .recv() methods
such implementation is not self-defending enough in cases, where multiple peers get connected into AccessPoints, that implement round-robin incoming/outgoing traffic mappings
such implementation is awfully wrong in standing self-blinded in calling just a single .recv() in cases, where the .send_multipart() is strikingly warning, there will be multi-part message-handling needed
ipc:// Transport Class is prone to hide O/S related user-level code restrictions ( placed by the operating system on the format and length of a pathname and effective user-rights to R/W/X there )
ipc:// Transport Class .connect()-method's use is order-dependent for cases the target-address has not yet been created by O/S services ( a successful .bind() needs to happen first )
last but not least, any next attempt to .bind() onto the same ipc:// Transport Class target will silently destroy your intended ROUTER-access to the messaging/signalling-plane infrastructure & your implementation has spent zero-efforts to self-protect and self-diagnose errors that might silently appear "behind the curtains"
Shouldn't zeromq deal automatically with deadlocks? I tried using the example given in the zeromq guide mspoller If I can't use .poll() and recv() simultaneously, how should I use ZMQ Poller structure? – hao123
No,ZeroMQ zen-of-zero is performance + low-latency focused, so kindly consider all due care for blocking-prevention to be in your own hands (as needed & where needed, the core lib will never do a single more step than needed for the goal of achieving an almost linear scalable performance ).
No, use freely both .poll()- & .recv()-methods, yet complete it so as to fit into a non-blocking fashion - .poll( 0 ) & add active detection + handling of multi-part messages ( again, best in a non-blocking fashion, using zmq.NOBLOCK option flag where appropriate ). Self-blocking gets code out of control.

How to wait for any socket to have data?

I'm implementing a socket-client which opens several sockets at the same time. Any socket may have data at a different time and I want to execute code when any socket has data and is readable.
I'm not sure how to implement this, I was looking at select.select but it seems to wait for all the sockets to be readable.
I'd like to avoid using multiprocessing to handle data on the sockets, I would like it to be serial in reading from each socket but read when there is data available.
How do I wait for any socket to be readable?
# psudo code
sockets = [sock1, sock2, sock3]
while True:
if len(sockets) == 0:
break
for sock in sockets:
if sock.has_data():
do_stuff(sock)
sockets.remove(sock)
sleep(0.1)

You can use select.select for your problem:
sockets = [sock1, sock2, sock3]
while sockets:
rlist, _, _ = select.select(sockets, [], [])
for sock in rlist:
do_stuff(sock)
sockets.remove(sock)

If you are on POSIX, take a look at select.poll:
import socket
import select
p = select.poll()
s1 = socket.socket()
s2 = socket.socket()
# call connect on sockets here...
p.register(s1, select.POLLIN)
p.register(s2, select.POLLIN)
p.poll()

If you're using Python 3.4 or newer there is the selectors module in the standard library. It will use the "best" I/O multiplexing implementation that your system offers (select, poll, kqueue...) There's a simple echo server example at the end of the documentation page https://docs.python.org/3/library/selectors.html
There's a backport of this for older Python versions available as well.

select() on socket and another event mechanism

I want to read from a socket when data is available and in the same thread I want to read items from a message queue like this:
while True:
ready = select.select([some_socket, some_messagequeue], [], [])[0]
if some_socket in ready:
read_and_handle_data_from_socket()
if some_messagequeue in ready:
read_and_handle_data_from_messagequeue()
With other words: I want to abort select() as soon as the thread receives messages via some process internal messaging system.
From what I have read now I found two approaches: selecting on the message-queue itself or creating a os.pipe() for aborting the select() but I didn't find a nice implementation yet.
Approach 1: There seem to be two Queue implementations: multiprocessing.Queue and queue.Queue (Python3). While multiprocessing.Queue has a _reader member which can be used with select() only queue.Queue allows arbitrary data structures to be queued without having to mess with the pickling.
Question: Is there a way to use select() on a queue.Queue as well?
Approach 2: would look like this:
import os, queue, select
r, w = os.pipe()
some_socket = 67 # FD to some other socket
q = queue.Queue()
def read_fd():
while True:
ready = select.select([r, some_socket], [], [])[0]
if r in ready:
os.read(r, 100)
print('handle task: ', q.get())
if some_socket in ready:
print('socket has data')
threading.Thread(target=read_fd, daemon=True).start()
while True:
q.put('some task')
os.write(w, b'x')
print('scheduled task')
time.sleep(1)
And this works - but in my eyes this code is quite cumbersome and not very pythonic. Question: is there a nicer way to just send 'signals' through a os.pipe (or any other implementation)?
Approach 3..N: Question: how would you solve this?
I know libraries like ZeroMQ but since I'm working on an embedded project I'd prefer a solution that comes with the native Python (3.3) distribution. And I think a there should be a solution as short as the first example - after all I just want to abort the select() if something happens on the message queue.

Approach 3: have two threads. 1 waits on select. 2 waits on message queue. Mutex to prevent them from firing simultaneously. Why have threads if you aren't going to use them?

You can create a pipe pair of file descriptors, signal queue push via writing to the write end of it, and wait for queue activity in the same select on the read end of the pipe.
On Linux specifically, there is also the eventfd(2) system call that can be used for the same purpose. instead of pipe(2) (this might me useful).

Multiple socket monitoring

I'm designing a python program that'll talk to two other process at the same time through sockets. One of the process is a C daemon so this socket will be alive all the time - no problem there. The other process is a PHP web page. So that socket isn't established all the time. Most of the time, the socket is listen()ing on a port.
If both socket are alive all the time, a simple select() call can be used to monitor input from both. But in my situation, this is not possible. How can I achieve this easily?
Thanks,

You can use select() in this case, even in a single-threaded single-process program with only blocking sockets. Here's how you would accept incoming connections with select():
daemonSocket = socket.socket()
...
phpListenSocket = socket.socket()
phpListenSocket.bind(...)
phpListenSocket.listen(...)
phpSocket = None
while True:
rlist = ...
rready, wready, eready = select(rlist, [], [])
if phpListenSocket in rready:
phpSocket, remoteAddr = phpListenSocket.accept()

Client Server programming in python?

Here is source code for multithreaed server and client in python.
In the code client and server closes connection after the job is finished.
I want to keep the connections alive and send more data over the same connections to avoid overhead of closing and opening sockets every time.
Following code is from : http://www.devshed.com/c/a/Python/Basic-Threading-in-Python/1/
import pickle
import socket
import threading
# We'll pickle a list of numbers:
someList = [ 1, 2, 7, 9, 0 ]
pickledList = pickle.dumps ( someList )
# Our thread class:
class ClientThread ( threading.Thread ):
# Override Thread's __init__ method to accept the parameters needed:
def __init__ ( self, channel, details ):
self.channel = channel
self.details = details
threading.Thread.__init__ ( self )
def run ( self ):
print 'Received connection:', self.details [ 0 ]
self.channel.send ( pickledList )
for x in xrange ( 10 ):
print self.channel.recv ( 1024 )
self.channel.close()
print 'Closed connection:', self.details [ 0 ]
# Set up the server:
server = socket.socket ( socket.AF_INET, socket.SOCK_STREAM )
server.bind ( ( '', 2727 ) )
server.listen ( 5 )
# Have the server serve "forever":
while True:
channel, details = server.accept()
ClientThread ( channel, details ).start()
import pickle
import socket
import threading
# Here's our thread:
class ConnectionThread ( threading.Thread ):
def run ( self ):
# Connect to the server:
client = socket.socket ( socket.AF_INET, socket.SOCK_STREAM )
client.connect ( ( 'localhost', 2727 ) )
# Retrieve and unpickle the list object:
print pickle.loads ( client.recv ( 1024 ) )
# Send some messages:
for x in xrange ( 10 ):
client.send ( 'Hey. ' + str ( x ) + '\n' )
# Close the connection
client.close()
# Let's spawn a few threads:
for x in xrange ( 5 ):
ConnectionThread().start()

Spawning a new thread for every connection is a really bad design choice.
What happens if you get hit by a lot of connections?
In fact, using threads to wait for network IO is not worth it. Your program gets really complex and you get absolutely no benefit since waiting for network in threads won't make you wait faster. You only lose by using threads in this case.
The following text is from python documentation:
There are only two ways to have a
program on a single processor do “more
than one thing at a time.”
Multi-threaded programming is the
simplest and most popular way to do
it, but there is another very
different technique, that lets you
have nearly all the advantages of
multi-threading, without actually
using multiple threads. It’s really
only practical if your program is
largely I/O bound. If your program is
processor bound, then pre-emptive
scheduled threads are probably what
you really need. Network servers are
rarely processor bound, however.
And if it is a processor bound server case. you could always leave another process/thread to do the processor part. Continuing:
If your operating system supports the
select system call in its I/O library
(and nearly all do), then you can use
it to juggle multiple communication
channels at once; doing other work
while your I/O is taking place in the
“background.” Although this strategy
can seem strange and complex,
especially at first, it is in many
ways easier to understand and control
than multi-threaded programming.
So instead of using threads, use non-blocking input/output: collect the sockets in a list and use an event loop with select.select to know which socket has data to read. Do that in a single thread.
You could choose a python asynchronous networking framework like twisted to do that for you. That will save you a lot of headaches. Twisted's code has been improved for years, and covers some corner cases you'll take time to master.
EDIT: Any existing async IO libraries (like Twisted) are python code. You could have written it yourself, but it has already been written for you. I don't see why you wouldn't use one of those libraries and write your own worst code instead, since you are a beginner. Networing IO is hard to get right.

I'm not sure I understand the question, but don't call close() if you don't want to close the connection...

For an example of a client the keeps a TCP connection open and uses a familiar protocol,
look at the source of the telnetlib module. (sorry, someone else will have to answer your threading questions.)
An example of a server that keeps a TCP connection open is in the source for the SocketServer module (any standard Python installation includes the source).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.