How to kill a socket in unit tests for reconnect test - python

I'm trying to test some code that reconnects to a server after a disconnect. This works perfectly fine outside the tests, but it fails to acknowledge that the socket has disconnected when running the tests.
I'm using a Gevent Stream Server to mock a real listening server:
import gevent.server
from gevent import queue
class TestServer(gevent.server.StreamServer):
def __init__(self, *args, **kwargs):
super(TestServer, self).__init__(*args, **kwargs)
self.sockets = {}
def handle(self, socket, address):
self.sockets[address] = (socket, queue.Queue())
socket.sendall('testing the connection\r\n')
gevent.spawn(self.recv, address)
def recv(self, address):
socket = self.sockets[address][0]
queue = self.sockets[address][1]
print 'Connection accepted %s:%d' % address
try:
for data in socket.recv(1024):
queue.put(data)
except:
pass
def murder(self):
self.stop()
for sock in self.sockets.iteritems():
print sock
sock[1][0].shutdown(socket.SHUT_RDWR)
sock[1][0].close()
self.sockets = {}
def run_server():
test_server = TestServer(('127.0.0.1', 10666))
test_server.start()
return test_server
And my test looks like this:
def test_can_reconnect(self):
test_server = run_server()
client_config = {'host': '127.0.0.1', 'port': 10666}
client = Connection('test client', client_config, get_config())
client.connect()
assert client.socket_connected
test_server.murder()
#time.sleep(4) #tried sleeping. no dice.
assert not client.socket_connected
assert client.server_disconnect
test_server = run_server()
client.reconnect()
assert client.socket_connected
It fails at assert not client.socket_connected.
I detect for "not data" during recv. If it's None, then I set some variables so that other code can decide whether or not to reconnect (don't reconnect if it was a user_disconnect and so on). This behavior works and has always worked for me in the past, I've just never tried to make a test for it until now. Is there something odd with socket connections and local function scopes or something? it's like the connection still exists even after stopping the server.
The code I'm trying to test is open: https://github.com/kyleterry/tenyks.git
If you run the tests, you will see the one I'm trying to fix fail.

Trying to run a unit test with a real socket is a tough row to hoe. It's going to be tricky as only one set of tests can run at a time, as the server port will be used, and it's going to be slow as the sockets get set up and torn down. To top it off if this is really a unit test you don't want to test the socket, just the code that's using the socket.
If you mock the socket calls you can throw exceptions willy nilly from the mocked code and ensure that the code making use of the socket does the right thing. You don't need a real socket to ensure that the class under test does the right thing, you can fake it if you can wrap the socket calls in an object. Pass in a reference to the socket object when constructing your class and you're ready to go.
My suggestion is to wrap the socket calls in a class that supports sendall, recv, and all the methods you call on the socket. Then you can swap out the actual Socket class with a TestReconnectSocket (or whatever) and run your tests.
Take a look at mox, a python mocking framework.

Vague response, but my immediate reaction would be that your recv() call is blocking and keeping the socket alive - have you tried making the socket non-blocking, and catching the error on close instead?

One thing to keep in mind when testing sockets like this, is that operating systems don't like to reopen a socket soon after it has been in use. You can set a socket option to tell it to go ahead and reuse it anyways. Right after you create the socket set the socket's option:
mysocket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
Hopefully this will fix your issue. You may have to do it on both the server and client side depending on which one is giving you the problems.

you are calling shutdown(socket.SHUT_RDWR) so this doesn't seem like a problem with recv blocking.
however, you are using gevent.socket.socket.recv, so please check your gevent version, there is an issue with recv() that causes it to block if the underlying file descriptor is closed (version < v0.13.0)
you may still need gevent.sleep() to do cooperative yield and give the client an opportunity to exit the recv() call.

Related

connection to two RabbitMQ servers

I'm using python with pika, and have the following two similar use cases:
Connect to RabbitMQ server A and server B (at different IP addrs with different credentials), listen on exchange A1 on server A; when a message arrives, process it and send to an exchange on server B
Open an HTTP listener and connect to RabbitMQ server B; when a specific HTTP request arrives, process it and send to an exchange on server B
Alas, in both these cases using my usual techniques, by the time I get to sending to server B the connection throws ConnectionClosed or ChannelClosed.
I assume this is the cause: while waiting on the incoming messages, the connection to server B (its "driver") is starved of CPU cycles, and it never gets a chance to service is connection socket, thus it can't respond to heartbeats from server B, thus the servers shuts down the connection.
But I can't noodle out the fix. My current work around is lame: I catch the ConnectionClosed, reopen a connection to server B, and retry sending my message.
But what is the "right" way to do this? I've considered these, but don't really feel I have all the parts to solve this:
Don't just sit forever in server A's basic_consume (my usual pattern), but rather, use a timeout, and when I catch the timeout somehow "service" heartbeats on server B's driver, before returning to a "consume with timeout"... but how do I do that? How do I "let service B's connection driver service its heartbeats"?
I know the socket library's select() call can wait for messages on several sockets and once, then service the socket who has packets waiting. So maybe this is what pika's SelectConnection is for? a) I'm not sure, this is just a hunch. b) Even if right, while I can find examples of how to create this connection, I can't find examples of how to use it to solve my multiconnection case.
Set up the the two server connections in different processes... and use Python interprocess queues to get the processed message from one process to the next. The concept is "two different RabbitMQ connections in two different processes should thus then be able to independently service their heartbeats". Except... I think this has a fatal flaw: the process with "server B" is, instead, going to be "stuck" waiting on the interprocess queue, and the same "starvation" is going to happen.
I've checked StackOverflow and Googled this for an hour last night: I can't for the life of me find a blog post or sample code for this.
Any input? Thanks a million!
I managed to work it out, basing my solution on the documentation and an answer in the pika-python Google group.
First of all, your assumption is correct — the client process that's connected to server B, responsible for publishing, cannot reply to heartbeats if it's already blocking on something else, like waiting a message from server A or blocking on an internal communication queue.
The crux of the solution is that the publisher should run as a separate thread and use BlockingConnection.process_data_events to service heartbeats and such. It looks like that method is supposed to be called in a loop that checks if the publisher still needs to run:
def run(self):
while self.is_running:
# Block at most 1 second before returning and re-checking
self.connection.process_data_events(time_limit=1)
Proof of concept
Since proving the full solution requires having two separate RabbitMQ instances running, I have put together a Git repo with an appropriate docker-compose.yml, the application code and comments to test this solution.
https://github.com/karls/rabbitmq-two-connections
Solution outline
Below is a sketch of the solution, minus imports and such. Some notable things:
Publisher runs as a separate thread
The only "work" that the publisher does is servicing heartbeats and such, via Connection.process_data_events
The publisher registers a callback whenever the consumer wants to publish a message, using Connection.add_callback_threadsafe
The consumer takes the publisher as a constructor argument so it can publish the messages it receives, but it can work via any other mechanism as long as you have a reference to an instance of Publisher
The code is taken from the linked Git repo, which is why certain details are hardcoded, e.g the queue name etc. It will work with any RabbitMQ setup needed (direct-to-queue, topic exchange, fanout, etc).
class Publisher(threading.Thread):
def __init__(
self,
connection_params: ConnectionParameters,
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.daemon = True
self.is_running = True
self.name = "Publisher"
self.queue = "downstream_queue"
self.connection = BlockingConnection(connection_params)
self.channel = self.connection.channel()
self.channel.queue_declare(queue=self.queue, auto_delete=True)
self.channel.confirm_delivery()
def run(self):
while self.is_running:
self.connection.process_data_events(time_limit=1)
def _publish(self, message):
logger.info("Calling '_publish'")
self.channel.basic_publish("", self.queue, body=message.encode())
def publish(self, message):
logger.info("Calling 'publish'")
self.connection.add_callback_threadsafe(lambda: self._publish(message))
def stop(self):
logger.info("Stopping...")
self.is_running = False
# Call .process_data_events one more time to block
# and allow the while-loop in .run() to break.
# Otherwise the connection might be closed too early.
#
self.connection.process_data_events(time_limit=1)
if self.connection.is_open:
self.connection.close()
logger.info("Connection closed")
logger.info("Stopped")
class Consumer:
def __init__(
self,
connection_params: ConnectionParameters,
publisher: Optional["Publisher"] = None,
):
self.publisher = publisher
self.queue = "upstream_queue"
self.connection = BlockingConnection(connection_params)
self.channel = self.connection.channel()
self.channel.queue_declare(queue=self.queue, auto_delete=True)
self.channel.basic_qos(prefetch_count=1)
def start(self):
self.channel.basic_consume(
queue=self.queue, on_message_callback=self.on_message
)
try:
self.channel.start_consuming()
except KeyboardInterrupt:
logger.info("Warm shutdown requested...")
except Exception:
traceback.print_exception(*sys.exc_info())
finally:
self.stop()
def on_message(self, _channel: Channel, m, _properties, body):
try:
message = body.decode()
logger.info(f"Got: {message!r}")
if self.publisher:
self.publisher.publish(message)
else:
logger.info(f"No publisher provided, printing message: {message!r}")
self.channel.basic_ack(delivery_tag=m.delivery_tag)
except Exception:
traceback.print_exception(*sys.exc_info())
self.channel.basic_nack(delivery_tag=m.delivery_tag, requeue=False)
def stop(self):
logger.info("Stopping consuming...")
if self.connection.is_open:
logger.info("Closing connection...")
self.connection.close()
if self.publisher:
self.publisher.stop()
logger.info("Stopped")

python3.5: asyncio, How to wait for "transport.write(data)" to finish or to return an error?

I'm writing a tcp client in python3.5 using asyncio
After reading How to detect write failure in asyncio? that talk about the high-level streaming api, I've tried to implement using the low level protocol api.
class _ClientProtocol(asyncio.Protocol):
def connection_made(self, transport):
self.transport = transport
class Client:
def __init__(self, loop=None):
self.protocol = _ClientProtocol()
if loop is None:
loop = asyncio.get_event_loop()
self.loop = loop
loop.run_until_complete(self._connect())
async def _connect(self):
await self.loop.create_connection(
lambda: self.protocol,
'127.0.0.1',
8080,
)
# based on https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/#bug-3-closing-time
self.protocol.transport.set_write_buffer_limits(0)
def write(self, data):
self.protocol.transport.write(data)
def wait_all_data_have_been_written_or_throw():
pass
client = Client()
client.write(b"some bytes")
client.wait_all_data_have_been_written_or_throw()
As per the python documentation, I know write is non-blocking, and I would like the wait_all_data_have_been_written_or_throw to tell me if all data have been written or if something bad happened in the middle (like a connection lost, but I assume there's way more things that can go bad, and that the underlying socket already return exception about it?)
Does the standard library provide a way to do so ?
The question is mainly related to TCP sockets functionality, not asyncio implementation itself.
Let's look on the following code:
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((host, port))
s.send(b'data')
Successful send() call means the data was transferred into kernel space buffer for the socket, nothing more.
Data was not sent via wire, not received by peer and, obviously, not processed by received.
Actual sending is performed asynchronously by Operation System Kernel, user code has no control over it.
What's why wait_all_data_have_been_written_or_throw() make not much sense: writing without an error doesn't assume receiving these data by peer but only successful moving from user-space buffer to kernel-space one.

how to set connection timeout in tornado?

In my Tornado app in some situation some clients disconnect from server but my current code doesn't detect that client is disconnect from server. I currently use ping to find out if client is disconnected.
here is my ping pong code:
from threading import Timer
class SocketHandler(websocket.WebSocketHandler):
def __init__(self, application, request, **kwargs):
# some code here
Timer(5.0, self.do_ping).start()
def do_ping(self):
try:
self.ping_counter += 1
self.ping("")
if self.ping_counter > 2:
self.close()
Timer(60, self.do_ping).start()
except WebSocketClosedError:
pass
def on_pong(self, data):
self.ping_counter = 0
now I want to set SO_RCVTIMEO in tornado instead of using ping pong method.
something like this :
sock.setsockopt(socket.SO_RCVTIMEO)
Is it possible to set SO_RCVTIMEO in Tornado for close clients from server after specific time out ?
SO_RCVTIMEO does not do anything in an asynchronous framework like Tornado. You probably want to wrap your reads in tornado.gen.with_timeout. You'll still need to use pings to test the connection and make sure it is still working; if the connection is idle there are few guarantees about how long it will take for the system to notice. (TCP keepalives are a possibility, but these are not configurable on all platforms and generally use very long timeouts).

Twisted - How can I tell the reactor to dispose a Protocol object after using adoptStreamConnection in a subprocess?

I'm trying to pass a TCP connection to a Twisted subprocess with adoptStreamConnection, but I can't figure out how to get the Process disposed in the main process after doing that.
My desired flow looks like this:
Finish writing any data the Protocol transport has waiting
When we know the write buffer is empty send the AMP message to transfer the socket to the subprocess
Dispose the Protocol instance in the main process
I tried doing nothing, loseConnection, abortConnection, and monkey patching _socketClose out and using loseConnection. See code here:
import weakref
from twisted.internet import reactor
from twisted.internet.endpoints import TCP4ServerEndpoint
from twisted.python.sendmsg import getsockfam
from twisted.internet.protocol import Factory, Protocol
import twisted.internet.abstract
class EchoProtocol(Protocol):
def dataReceived(self, data):
self.transport.write(data)
class EchoFactory(Factory):
protocol = EchoProtocol
class TransferProtocol(Protocol):
def dataReceived(self, data):
self.transport.write('main process still listening!: %s' % (data))
def connectionMade(self):
self.transport.write('this message should make it to the subprocess\n')
# attempt 1: do nothing
# everything works fine in the adopt (including receiving the written message), but old protocol still exists (though isn't doing anything)
# attempt 1: try calling loseConnection
# we lose connection before the adopt opens the socket (presumably TCP disconnect message was sent)
#
# self.transport.loseConnection()
# attempt 2: try calling abortConnection
# result is same as loseConnection
#
# self.transport.abortConnection()
# attempt 3: try monkey patching the socket close out and calling loseConnection
# result: same as doing nothing-- adopt works (including receiving the written message), old protocol still exists
#
# def ignored(*args, **kwargs):
# print 'ignored :D'
#
# self.transport._closeSocket = ignored
# self.transport.loseConnection()
reactor.callLater(0, adopt, self.transport.fileno())
class ServerFactory(Factory):
def buildProtocol(self, addr):
p = TransferProtocol()
self.ref = weakref.ref(p)
return p
f = ServerFactory()
def adopt(fileno):
print "does old protocol still exist?: %r" % (f.ref())
reactor.adoptStreamConnection(fileno, getsockfam(fileno), EchoFactory())
port = 1337
endpoint = TCP4ServerEndpoint(reactor, port)
d = endpoint.listen(f)
reactor.run()
In all cases the Protocol object still exists in the main process after the socket has been transferred. How can I clean this up?
Thanks in advance.
Neither loseConnection nor abortConnection tell the reactor to "forget" about a connection; they close the connection, which is very different; they tell the peer that the connection has gone away.
You want to call self.transport.stopReading() and self.transport.stopWriting() to remove the references to it from the reactor.
Also, it's not valid to use a weakref to test for the remaining existence of an object unless you call gc.collect() first.
As far as making sure that all the data has been sent: the only reliable way to do that is to have an application-level acknowledgement of the data that you've sent. This is why protocols that need a handshake that involves changing protocols - say, for example, STARTTLS - have a specific handshake where the initiator says "I'm going to switch" (and then stops sending), then the peer says "OK, you can switch now". Another way to handle that in this case would be to hand the data you'd like to write to the subprocess via some other channel, instead of passing it to transport.write.

How to properly make unit tests cleanup a socket

I've been working with some sockets lately, and while writing some unit test cases with a listening socket I repeatedly get error: [Errno 98] Address already in use.
This is some example code that shows the error.
import unittest
import socket
class TestUnit(unittest.TestCase):
def setUp(self):
self.socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
self.socket.bind((socket.gethostname(), 10000))
self.socket.listen(10)
self.addCleanup(self.clean)
def test_nothing(self):
self.assertEqual(False, False)
def test_something(self):
self.assertEqual(True, True)
def clean(self):
self.socket.close()
It seems to occur when one of the tests throw an exception. Without an exception it works as expected. But that kinda makes the test useless since all tests after the first that throws an exception also throw an exception.
socket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
should help
Basically a closed socket is not immediately freed by the stack. Hence if you try to reuse it (even in the scenario when you have a single bind socket, but you close and restart application) immediately, you would see the same error. REUSEADDR allows binding the same socket again.
However, if your socket is in a timed wait state and you try the same destination, it would fail.
You should also read the man page for this socket option to understand it's limitations.
SO_REUSEADDR on SO

Categories