Writing a blocking wrapper around twisted's IRC client - python

I'm trying to write a dead-simple interface for an IRC library, like so:
import simpleirc
connection = simpleirc.Connect('irc.freenode.net', 6667)
channel = connection.join('foo')
find_command = re.compile(r'google ([a-z]+)').findall
for msg in channel:
for t in find_command(msg):
channel.say("http://google.com/search?q=%s" % t)
Working from their example, I'm running into trouble (code is a bit lengthy, so I pasted it here). Since the call to channel.__next__ needs to be returned when the callback <IRCClient instance>.privmsg is called, there doesn't seem to be a clean option. Using exceptions or threads seems like the wrong thing here, is there a simpler (blocking?) way of using twisted that would make this possible?

In general, if you're trying to use Twisted in a "blocking" way, you're going to run into a lot of difficulties, because that's neither the way it's intended to be used, nor the way in which most people use it.
Going with the flow is generally a lot easier, and in this case, that means embracing callbacks. The callback-style solution to your question would look something like this:
import re
from twisted.internet import reactor, protocol
from twisted.words.protocols import irc
find_command = re.compile(r'google ([a-z]+)').findall
class Googler(irc.IRCClient):
def privmsg(self, user, channel, message):
for text in find_command(message):
self.say(channel, "http://google.com/search?q=%s" % (text,))
def connect():
cc = protocol.ClientCreator(reactor, Googler)
return cc.connectTCP(host, port)
def run(proto):
proto.join(channel)
def main():
d = connect()
d.addCallback(run)
reactor.run()
This isn't absolutely required (but I strongly suggest you consider trying it). One alternative is inlineCallbacks:
import re
from twisted.internet import reactor, protocol, defer
from twisted.words.protocols import irc
find_command = re.compile(r'google ([a-z]+)').findall
class Googler(irc.IRCClient):
def privmsg(self, user, channel, message):
for text in find_command(message):
self.say(channel, "http://google.com/search?q=%s" % (text,))
#defer.inlineCallbacks
def run():
cc = protocol.ClientCreator(reactor, Googler)
proto = yield cc.connectTCP(host, port)
proto.join(channel)
def main():
run()
reactor.run()
Notice no more addCallbacks. It's been replaced by yield in a decorated generator function. This could get even closer to what you asked for if you had a version of Googler with a different API (the one above should work with IRCClient from Twisted as it is written - though I didn't test it). It would be entirely possible for Googler.join to return a Channel object of some sort, and for that Channel object to be iterable like this:
#defer.inlineCallbacks
def run():
cc = protocol.ClientCreator(reactor, Googler)
proto = yield cc.connectTCP(host, port)
channel = proto.join(channel)
for msg in channel:
msg = yield msg
for text in find_command(msg):
channel.say("http://google.com/search?q=%s" % (text,))
It's only a matter of implementing this API on top of the ones already present. Of course, the yield expressions are still there, and I don't know how much this will upset you. ;)
It's possible to go still further away from callbacks and make the context switches necessary for asynchronous operation to work completely invisible. This is bad for the same reason it would be bad for sidewalks outside your house to be littered with invisible bear traps. However, it's possible. Using something like corotwine, itself based on a third-party coroutine library for CPython, you can have the implementation of Channel do the context switching itself, rather than requiring the calling application code to do it. The result might look something like:
from corotwine import protocol
def run():
proto = Googler()
transport = protocol.gConnectTCP(host, port)
proto.makeConnection(transport)
channel = proto.join(channel)
for msg in channel:
for text in find_command(msg):
channel.say("http://google.com/search?q=%s" % (text,))
with an implementation of Channel that might look something like:
from corotwine import defer
class Channel(object):
def __init__(self, ircClient, name):
self.ircClient = ircClient
self.name = name
def __iter__(self):
while True:
d = self.ircClient.getNextMessage(self.name)
message = defer.blockOn(d)
yield message
This in turn depends on a new Googler method, getNextMessage, which is a straightforward feature addition based on existing IRCClient callbacks:
from twisted.internet import defer
class Googler(irc.IRCClient):
def connectionMade(self):
irc.IRCClient.connectionMade(self)
self._nextMessages = {}
def getNextMessage(self, channel):
if channel not in self._nextMessages:
self._nextMessages[channel] = defer.DeferredQueue()
return self._nextMessages[channel].get()
def privmsg(self, user, channel, message):
if channel not in self._nextMessages:
self._nextMessages[channel] = defer.DeferredQueue()
self._nextMessages[channel].put(message)
To run this, you create a new greenlet for the run function and switch to it, and then start the reactor.
from greenlet import greenlet
def main():
greenlet(run).switch()
reactor.run()
When run gets to its first asynchronous operation, it switches back to the reactor greenlet (which is the "main" greenlet in this case, but it doesn't really matter) to let the asynchronous operation complete. When it completes, corotwine turns the callback into a greenlet switch back into run. So run is granted the illusion of running straight through, like a "normal" synchronous program. Keep in mind that it is just an illusion, though.
So, it's possible to get as far away from the callback-oriented style that is most commonly used with Twisted as you want. It's not necessarily a good idea, though.

Related

Subscribers receive messages slowly

I have a pyzmq Publisher which sends around 1000 messages per second. I am trying to start around 10 Subscribers in an asyncio event_loop.
It works but around 2.5 times slower than speed of the only one Subscriber.
What could possibly be wrong with the code?
import asyncio
import zmq
import json
from zmq.backend.cython.constants import NOBLOCK
from zmq.asyncio import Context, Poller
from loop_ import Loop
class Client:
REQUEST_TIMEOUT = 35000
SERVER_ENDPOINT = "tcp://localhost:6666"
def __init__(self, id_):
self.id = id_
def get_task(self):
return asyncio.create_task(self.client_coroutine())
async def client_coroutine(self):
context = Context.instance()
socket = context.socket(zmq.SUB)
socket.connect(self.SERVER_ENDPOINT)
socket.setsockopt(zmq.SUBSCRIBE, b'4')
poller = Poller()
poller.register(socket, zmq.POLLIN)
while True:
event = dict(await poller.poll(self.REQUEST_TIMEOUT))
if event.get(socket) == zmq.POLLIN:
reply = await socket.recv_multipart(flags=NOBLOCK)
if not reply:
break
else:
print(eval(json.loads(reply[1].decode('utf-8'))))
else:
print("No response from server, retrying...")
socket.setsockopt(zmq.LINGER, 0)
socket.close()
poller.unregister(socket)
async def tasks():
_tasks = [Client(id_).get_task() for id_ in range(10)]
done, pending = await asyncio.wait(_tasks, return_when=asyncio.FIRST_EXCEPTION)
loop = asyncio.get_event_loop()
loop.run_until_complete(tasks())
Q : What could possibly be wrong with the code?
Given the code is using the same localhost ( as seen from using the address ), the suspect number one is, that having 10x more work to process, the such workload will always stress the localhost's O/S and the CPU, won't it?
Next comes the choice of the transport-class. Given all the SUB-s are co-located on the same localhost as the PUB, there is all the L3-stack-based TCP/IP protocol work going wasted. To compare the relative costs ( the add-on effect of using the tcp:// transport-class for this hardware-singular messaging ), test the very same with using inproc:// transport-class, where none of the protocol-related TCP/IP-stack add-on processing will take place.
Last, but not least, my code will never mix different event-loops ( using ZeroMQ since v2.11, so someone may consider my a bit old-fashioned in avoiding relying on async-decorated capabilities available in recent py3.6+ )
My code will use an explicit, non-blocking, zero-waiting test for a presence of a message per-aSocketINSTANCE, as in aSocketINSTANCE.poll( zmq.POLLIN, 0 ) rather than using any "externally" added decoration, which may report the same, but via some additional (expensive and outside of my code domain of control) event-handling. All real-time, low-latency use-cases strive to bear as minimum latency/overheads as possible, so using explicit control will always win in my Projects, to any "modern" syntax-sugar sweetened tricks.
Anyway, enjoy the Zen-of-Zero

Mixing Synchronous and A-sync code in Python

I'm trying to convert a synchronous flow in Python code which is based on callbacks to an A-syncronious flow using asyncio.
Basically the code interacts a lot with TCP/UNIX sockets. It reads data from the sockets, manipulates it to make decisions and writes stuff back to the other side. This is going on over multiple sockets at once and data is shared between the contexts to make decisions sometimes.
EDIT :: The code currently is mostly based on registering a callback to a central entity for a specific socket, and having that entity run the callback when the relevant socket is readable (something like "call this function when that socket has data to be read"). Once the callback is called - a bunch of stuff happens, and eventually a new callback is registered for when new data is available. The central entity runs a select over all sockets registered to figure out which callbacks should be called.
I'm trying to do this without refactoring my entire code and making this as seamless as possible to the programmer - so I was trying to think about it like so - all code should run the same way as it does today - but whenever the current code does a socket.recv() to get new data - the process would yield execution to other tasks. When the read returns, it should go back to handling the data from the same point using the new data it got.
To do this, I wrote a new class called AsyncSocket - which interacts with the IO streams of asyncIO and placed the Async/await statements almost solely in there - thinking that I would implement the recv method in my class to make it look like a "regular IO socket" to the rest of my code.
So far - this is my understanding of what A-sync programming should allow.
Now to the problem :
My code awaits for clients to connect - when it does, each client's context is allowed to read and write from it's own connection.
I've simplified to flow to the following to clarify the problem:
class AsyncSocket():
def __init__(self,reader,writer):
self.reader = reader
self.writer = writer
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
print("Awaiting of AsyncSocket.reader.read")
data = await self.reader.read(numBytes)
print("Done Awaiting of AsyncSocket.reader.read data is %s " % data)
return data
def mit2(aSock):
return mit3(aSock)
def mit3(aSock):
return aSock.recv(100)
async def echo_server(reader, writer):
print ("New Connection!")
aSock = AsyncSocket(reader,writer) # create a new A-sync socket class and pass it on the to regular code
while True:
data = await some_func(aSock) # this would eventually read from the socket
print ("Data read is %s" % (data))
if not data:
break
writer.write(data) # echo everything back
async def main(host, port):
server = await asyncio.start_server(echo_server, host, port)
await server.serve_forever()
asyncio.run(main('127.0.0.1', 5000))
mit2() and mit3() are synchronous functions that do stuff with the data on the way back before returning to the main client's loop - but here I'm just using them as empty functions.
The problem starts when I play with the implementation of some_func().
A pass through implementation (edit: kind-of-works) - but still has issues :
def some_func(aSock):
try:
return (mit2(aSock)) # works
except:
print("Error!!!!")
While an implementation which reads the data and does something with it - like adding a suffix before returning, throws an error:
def some_func(aSock):
try:
return (mit2(aSock) + "something") # doesn't work
except:
print("Error!!!!")
The error (as far as I understand it) means it's not really doing what it should:
New Connection!
called recv!
/Users/user/scripts/asyncServer.py:36: RuntimeWarning: coroutine 'AsyncSocket.read_mitigator' was never awaited
return (mit2(aSock) + "something") # doesn't work
RuntimeWarning: Enable tracemalloc to get the object allocation traceback
Error!!!!
Data read is None
And the echo server obviously doesn't work.
Obviously my code looks more like option #2 with a lot more stuff in some_func(),mit2() and mit3() - but I can't get this to work. I'm fairly new in using asyncio/async/await - so what (rather basic concept I guess) am I missing?
This code won't work as envisioned:
def recv(self,numBytes):
print("called recv!")
data = self.read_mitigator(numBytes)
return data
async def read_mitigator(self,numBytes):
...
You cannot call an async function from a sync function and get the result, you must await it, which ensures that you return to the event loop in case the data is not yet ready. This mismatch between async and sync code is sometimes referred to as the issue of function color.
Since your code is already using non-blocking sockets and an event loop, a good approach to porting it to asyncio might be to first switch to the asyncio event loop. You can use event loop methods like sock_recv to request data:
def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
future_data = loop.sock_recv(sock, 1024)
future_data.add_done_callback(continue_read)
# return to the event loop - when some data is ready
# continue_read will be invoked
def continue_read(future):
data = future.result()
print('got', data)
# ... do something with data, e.g. process it
# and call sock_sendall with the response
asyncio.get_event_loop().call_soon(start())
asyncio.get_event_loop().run_forever()
Once you have the program working in that mode, you can start moving to coroutines, which allow the code to look like sync code, but work in exactly the same way:
async def start():
loop = asyncio.get_event_loop()
sock = make_socket() # make sure it's non-blocking
data = await loop.sock_recv(sock, 1024)
# data is available "immediately", meaning the coroutine gets
# automatically suspended when awaiting data that is not yet
# ready, and automatically re-scheduled when the data is ready
print('got', data)
asyncio.run(start())
The next step can be eliminating make_socket and switching to asyncio streams.

Correct use of coroutine in Tornado web server

I'm trying to convert a simple syncronous server to an asyncronous version, the server receives post requestes and it retrieves the response from an external web service (amazon sqs). Here's the syncronous code
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
#....other stuff
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
self._sqs_send_queue.write(msg)
Reading Tornado documentation and some threads here I ended with this code using coroutines:
def post(self):
zoom_level = self.get_argument('zoom_level')
neLat = self.get_argument('neLat')
neLon = self.get_argument('neLon')
swLat = self.get_argument('swLat')
swLon = self.get_argument('swLon')
data = self._create_request_message(zoom_level, neLat, neLon, swLat, swLon)
self._send_parking_spots_request(data)
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
queue.write(msg)
Comparing the performances using siege I get that the second version is even worse than the original one, so probably there's something about coroutines and Torndado asyncronous programming that I didn't understand at all.
Could you please help me with this?
Edit: self._sqs_send_queue it's a queue object retrieved from boto interface and queue.write(msg) returns the message that has been written on the queue
tornado relies on you converting all your I/O to be non-blocking. Simply sticking the same code you were using before inside of a gen.Task will not improve performance at all, because the I/O itself is still going to block the event loop. Additionally, you need to make your post method a coroutine, and call _send_parking_spots_requests using yield for the code to behave properly. So, a "correct" solution would look something like this:
#gen.coroutine
def post(self):
...
yield self._send_parking_spots_request(data) # wait (without blocking the event loop) until the method is done
self.finish()
#gen.coroutine
def _send_parking_spots_request(self, data):
msg = Message()
msg.set_body(json.dumps(data))
yield gen.Task(write_msg, self._sqs_send_queue, msg)
def write_msg(queue, msg, callback=None):
yield queue.write(msg, callback=callback) # This has to do non-blocking I/O.
In this example, queue.write would need to be some API that sends your request using non-blocking I/O, and executes callback when a response is received. Without knowing exactly what queue in your original example is, I can't specify exactly how that can be implemented in your case.
Edit: Assuming you're using boto, you may want to check out bototornado, which implements the exact same API I described above:
def write(self, message, callback=None):
"""
Add a single message to the queue.
:type message: Message
:param message: The message to be written to the queue
:rtype: :class:`boto.sqs.message.Message`
:return: The :class:`boto.sqs.message.Message` object that was written.

Writing a synchronous test suite for an async tornado web socket server

I am trying to design a test suite for my tornado web socket server.
I am using a client to do this - connect to a server through a websocket, send a request and expect a certain response.
I am using python's unittest to run my tests, so I cannot (and do not want to really) enforce the sequence in which the tests are running.
This is how my base test class (after which all test cases inherit) is organized. (The logging and certain parts, irrelevant here are stripped).
class BaseTest(tornado.testing.AsyncTestCase):
ws_delay = .05
#classmethod
def setUpClass(cls):
cls.setup_connection()
return
#classmethod
def setup_connection(cls):
# start websocket threads
t1 = threading.Thread(target=cls.start_web_socket_handler)
t1.start()
# websocket opening delay
time.sleep(cls.ws_delay)
# this method initiates the tornado.ioloop, sets up the connection
cls.websocket.connect('localhost', 3333)
return
#classmethod
def start_web_socket_handler(cls):
# starts tornado.websocket.WebSocketHandler
cls.websocket = WebSocketHandler()
cls.websocket.start()
The scheme I came up with is to have this base class which inits the connection once for all tests (although this does not have to be the case - I am happy to set up and tear down the connection for each test case if it solves my problems). What is important that I do not want to have multiple connections open at the same time.
The simple test case looks like that.
class ATest(BaseTest):
#classmethod
def setUpClass(cls):
super(ATest, cls).setUpClass()
#classmethod
def tearDownClass(cls):
super(ATest, cls).tearDownClass()
def test_a(self):
saved_stdout = sys.stdout
try:
out = StringIO()
sys.stdout = out
message_sent = self.websocket.write_message(
str({'opcode': 'a_message'}})
)
output = out.getvalue().strip()
# the code below is useless
while (output is None or not len(output)):
self.log.debug("%s waiting for response." % str(inspect.stack()[0][3]))
output = out.getvalue().strip()
self.assertIn(
'a_response', output,
"Server didn't send source not a_response. Instead sent: %s" % output
)
finally:
sys.stdout = saved_stdout
It works fine most of the time, yet it is not fully deterministic (and therefore reliable). Since the websocket communication is performed async, and the unittest executes test synchronously, the server responses (which are received on the same websocket) get mixed up with the requests and the tests fail occasionally.
I know it should be callback based, but this won't solve the response mixing issue. Unless, all the tests are artifically sequenced in a series of callbacks (as in start test_2 inside a test_1_callback).
Tornado offers a testing library to help with synchronous testing, but I cannot seem to get it working with websockets (the tornado.ioloop has it's own thread which you cannot block).
I cannot find a python websocket synchronous client library which would work with tornado server and be RFC 6455 compliant. Pypi's websocket-client fails to meet the second demand.
My questions are:
Is there a reliable python synchronous websocket client library that meets the demands described above?
If not, what is the best way to organize a test suite like this (the tests cannot really be run in parallel)?
As far as I understand, since we're working with one websocket, the IOStreams for test cases cannot be separated, and therefore there is no way of determining to which request the response is coming (I have multiple tests for requests of the same type with different parameters). Am I wrong in this ?
Have you looked at the websocket unit tests included with tornado? They show you how you can do this:
from tornado.testing import AsyncHTTPTestCase, gen_test
from tornado.websocket import WebSocketHandler, websocket_connect
class MyHandler(WebSocketHandler):
""" This is the server code you're testing."""
def on_message(self, message):
# Put whatever response you want in here.
self.write_message("a_response\n")
class WebSocketTest(AsyncHTTPTestCase):
def get_app(self):
return Application([
('/', MyHandler, dict(close_future=self.close_future)),
])
#gen_test
def test_a(self):
ws = yield websocket_connect(
'ws://localhost:%d/' % self.get_http_port(),
io_loop=self.io_loop)
ws.write_message(str({'opcode': 'a_message'}}))
response = yield ws.read_message()
self.assertIn(
'a_response', response,
"Server didn't send source not a_response. Instead sent: %s" % response
)v
The gen_test decorator allows you to run asynchronous testcases as coroutines, which, when run inside tornado's ioloop, effectively makes them behave synchronously for testing purposes.

Wait for specific server response command code in IRC

I've made an IRC bot in Python and I've been trying to figure out a way to wait for an IRC command and return the message to a calling function for a while now. I refuse to use an external library for various reasons including I'm trying to learn to make these things from scratch. Also, I've been sifting through documentation for existing ones and they're way too comprehensive. I'me trying to make a simple one.
For example:
def who(bot, nick):
bot.send('WHO %s' % nick)
response = ResponseWaiter('352') # 352 - RPL_WHOREPLY
return response.msg
Would return a an object of my Message class that parses IRC messages to the calling function:
def check_host(bot, nick, host):
who_info = who(bot, nick)
if who_info.host == host:
return True
return False
I have looked at the reactor pattern, observer pattern, and have tried implementing a hundred different event system designs for this to no avail. I'm completely lost.
Please either provide a solution or point me in the right direction. There's got to be a simple way to do this.
So what I've done is use grab messages from my generator (a bot method) from the bot's who method. The generator looks like this:
def msg_generator(self):
''' Provides messages until bot dies '''
while self.alive:
for msg in self.irc.recv(self.buffer).split(('\r\n').encode()):
if len(msg) > 3:
try: yield Message(msg.decode())
except Exception as e:
self.log('%s %s\n' % (except_str, str(e)))
And now the bot's who method looks like this:
def who(self, nick):
self.send('WHO %s' % nick)
for msg in self.msg_generator():
if msg.command == '352':
return msg
However, it's now taking control of the messages, so I need some way of relinquishing the messages I'm not using for the who method to their appropriate handlers.
My bot generally handles all messages with this:
def handle(self):
for msg in self.msg_generator():
self.log('◀ %s' % (msg))
SpaghettiHandler(self, msg)
So any message that my SpaghettiHandler would be handling is not handled while the bot's who method uses the generator to receive messages.
It's working.. and works fast enough that it's hard to lose a message. But if my bot were to be taking many commands at the same time, this could become a problem. I'm pretty sure I'll find a solution in this direction, but I didn't create this as the answer because I'm not sure it's a good way, even when I have it set to relinquish messages that don't pertain to the listener.

Categories