python gRPC client disconnect while server streaming response

python gRPC client disconnect while server streaming response - python

Good day,
This is my first time posting so forgive me if I do something wrong with the post.
I am trying to get a subscription type service running, which works fine up until the client disconnects. Depending on the timing, this works fine or blocks indefinitely.
def Subscribe(self, request, context):
words = ["Please", "help", "me", "solve", "my", "problem", "!"]
while context.is_active():
try:
for word in words:
event = fr_pb2.Event(word=word)
if not context.is_active():
break
yield event
print (event)
except Exception as ex:
print(ex)
context.cancel()
print("Subscribe ended")
I am new to gRPC so there's possibly a few things I'm doing wrong, but my main issue is that if the client disconnects just before/while the yield occurs, the code hangs indefinitely. I've tried a few things to get out of this situation but they only work some times. A timeout set on the client side does count down, but the yield doesn't end when the countdown hits 0. The callback happens fine, but context.cancel and context.abort do not seem to help here either.
Is there anything I can do to prevent the yield from hanging or set a timeout of some sort so the the yield eventually ends? Any help/advice is greatly appreciated.

If anyone else comes across this issue, there isn't really a problem here. I erroneously thought that this was blocking since none of the code I put in to print progression was printing.
In actuality, when the client disconnects and the server tries to yield an exception is thrown... it just isn't a General "Exception", or "SystemExit", or "SystemError". Not exactly sure what the exception type is for for this, but the code does exit properly if you do whatever cleanup you need in a "finally".

You can catch it using GeneratorExit exception
def Subscribe(self, request, context):
try:
words = ["Please", "help", "me", "solve", "my", "problem", "!"]
while context.is_active():
for word in words:
event = fr_pb2.Event(word=word)
if not context.is_active():
break
yield event
print (event)
except GeneratorExit:
print("Client disconnected before function finished!")
finally:
print("Subscribe ended")
When client disconnected in middle of gRPC function process, it will raise GeneratorExit exception in server-side. It cannot catch by default Exception.

Related

How to restart a coroutine after a websocket stream stops receiving data?

I'm writing an asyncio application to monitor prices of crypto markets and trade/order events, but for an unknown reason some streams stop receiving data after few hours. I'm not familiar with the asyncio package and I would appreciate help in finding a solution.
Basically, the code below establishs websocket connections with a crypto exchange to listen streams of six symbols (ETH/USD, BTC/USD, BNB/USD,...) and trades events from two accounts (user1, user2). The application uses the library ccxtpro. The public method watch_ohlcv get price steams, while private methods watchMyTrades and watchOrders get new orders and trades events at account level.
The problem is that one or several streams are interrupted after few hours, and the object response get empty or None. I would like to detect and restart these streams after they stops working, how can I do that ?
# tasks.py
#app.task(bind=True, name='Start websocket loops')
def start_ws_loops(self):
ws_loops()
# methods.py
def ws_loops():
async def method_loop(client, exid, wallet, method, private, args):
exchange = Exchange.objects.get(exid=exid)
if private:
account = args['account']
else:
symbol = args['symbol']
while True:
try:
if private:
response = await getattr(client, method)()
if method == 'watchMyTrades':
do_stuff(response)
elif method == 'watchOrders':
do_stuff(response)
else:
response = await getattr(client, method)(**args)
if method == 'watch_ohlcv':
do_stuff(response)
# await asyncio.sleep(3)
except Exception as e:
print(str(e))
break
await client.close()
async def clients_loop(loop, dic):
exid = dic['exid']
wallet = dic['wallet']
method = dic['method']
private = dic['private']
args = dic['args']
exchange = Exchange.objects.get(exid=exid)
parameters = {'enableRateLimit': True, 'asyncio_loop': loop, 'newUpdates': True}
if private:
log.info('Initialize private instance')
account = args['account']
client = exchange.get_ccxt_client_pro(parameters, wallet=wallet, account=account)
else:
log.info('Initialize public instance')
client = exchange.get_ccxt_client_pro(parameters, wallet=wallet)
mloop = method_loop(client, exid, wallet, method, private, args)
await gather(mloop)
await client.close()
async def main(loop):
lst = []
private = ['watchMyTrades', 'watchOrders']
public = ['watch_ohlcv']
for exid in ['binance']:
for wallet in ['spot', 'future']:
# Private
for method in private:
for account in ['user1', 'user2']:
lst.append(dict(exid=exid,
wallet=wallet,
method=method,
private=True,
args=dict(account=account)
))
# Public
for method in public:
for symbol in ['ETH/USD', 'BTC/USD', 'BNB/USD']:
lst.append(dict(exid=exid,
wallet=wallet,
method=method,
private=False,
args=dict(symbol=symbol,
timeframe='5m',
limit=1
)
))
loops = [clients_loop(loop, dic) for dic in lst]
await gather(*loops)
loop = asyncio.new_event_loop()
loop.run_until_complete(main(loop))

let me share with you my experience since I am dealing with the same problem.
CCXT is not expected to get stalled streams after some time running it.
Unfortunately practice and theory are different and error 1006 happens quite often. I am using Binance, OKX, Bitmex and BTSE ( BTSE is not supported by CCXT) and my code runs on AWS server so I should not have any connection issue. Binance and OKX are the worst as far as error 1006 is concerned.. Honestly, after researching it on google, I have only understood 1006 is a NetworkError and I know CCXT tries to resubscribe the channel automatically. All other explanations I found online did not convince me. If somebody could give me more info about this error I would appreciate it.
In any case, every time an exception is raised, I put it in an exception_list as a dictionary containing info like time in mls, method, exchange, description ecc. The exception_list is then passed to a handle_exception method. In this case, if the list contains two 1006 exception within X time handle_exception returns we are not on sync with market data and trading must stop. I cancel all my limit order and I emit a beep ( calling human intervention).
As for your second question:
restart these streams after they stops working, how can I do that
remember that you are Running Tasks Concurrently
If return_exceptions is False (default), the first raised exception is
immediately propagated to the task that awaits on gather(). Other
awaitables in the aws sequence won’t be cancelled and will continue to
run.
here you can find info about restarting individual task in a a gather()
In your case, since you are using a single exchange (Binance) and unsubscribe is not implemented in CCXT, you will have to close the connection and restart all the task. You can still use the above example in the link for automating it. In case you are using more then one exchange you can design your code in a way that let you close and restart only the Exchange that failed.
Another option for you would be defining the tasks with more granularity in the main so that every task is related to a single and well defined exchange/user/method/symbol and every task subscribes a single channel. This will result in a more verbose and less elegant code but it will help you catching the exception and eventually restart only a specific coroutine.
I am obviously assuming that after error 1006 the channel status is unsubscribed
final thought:
never leave a robot unattended
Professional market makers with a team of engineers working in London do not go to the pub while their algos ( usually co-located within the exchange ) execute thousands of trades.
I hope this can help you or, at least, get you in the right directions for handling exceptions and restart tasks

You need to use callbacks.
For example:
ws = self.ws = await websockets.connect(END_POINTS, compression=None) # step 1
await self.ws.send(SEND_YOUR_SUBSCRIPTION_MESSAGES) # step 2
while True:
response = await self.ws.recv()
if response:
await handler(response)
In the last like await handler(response) you are sending the response to the handler().
This handler() is the callback, it is the function that actually consumes your data that you receive from the exchange server.
In this handler(), what you can do is you check if the response is your desired data (bid/ask price etc) or it throws an exception like ConnectionClosedError, in which case you restart the websocket by doing STEP 1 and STEP 2 from within your handler.
So basically in the callback method, you need to either process the data
or restart the websocket and pass the handler to it again to receive the responses.
Hope this helps. I could not share the complete code as i need to clean it for sensitive business logic.

Whats difference between websocket and flask-streaming from scenario aspect

I'm developing a BS kafka monitor tool. The program will listen to a kafka topic, and constantly output the new message from that topic. So which is the best approach to send those message constantly to browser side?
The program uses flask, so currently I'm using the stream_with_context to send new message to browser side. This works for now, but I wonder if this is the correct scenario to use stream_with_context since most usage case is for downloading and video streaming? or maybe I should use websocket?
#read_controller.route('/v1/listenkafka/<string:kafkaId>', methods=['GET'])
def start_stream(kafkaId):
try:
mykafka_json = eval(my_storage.get(kafkaId))
mykafka = kafkaserver(ip=mykafka_json['ip'], id=kafkaId, port=mykafka_json['port'])
return Response(stream_with_context(mykafka.consume_topic(mykafka_json['topic'])))
except Exception as e:
print(f"{e}")
return jsonify(f"{e}"), 400
#The generator listen to kafka and feed to stream
def consume_topic(self, topic, groupid='test-consumer-group'):
consumer = KafkaConsumer(topic,
group_id=groupid,
bootstrap_servers=[f"{self.ip}:{self.port}"])
print(f"Topic: {topic}#{self.ip}:{self.port} starts steaming at {datetime.now()}")
try:
for messages in consumer:
mykafka_json = eval(my_storage.get(self.id))
print(mykafka_json)
if mykafka_json['flag']:
my_storage.delete(self.id)
return
else:
message = {'topic':messages.topic,
'partition':messages.partition,
'offset':messages.offset,
'key':messages.key,
'value':messages.value}
print (message['value'])
yield message['value']
except StopIteration as e:
#TODO:: handle return
print(e)
finally:
print(f"Topic-{topic} finish at {datetime.now()}")
So, should I use stream_with_context in this scenario or should I switch to use websockt?
Thanks

Ok now I undertand。
The stream_with_context actually will return ALL contents from beginning at each time the front request.
So it is a tool for downloading, not for constantly pushing new data from server to client
Eventually, I chosed flask-socketIO, it is a better choice than websocket, but you need to study the sample to understand how it works...The doc miss some details...

How to use Tornado.gen.coroutine in TCP Server?

i write a Tcp Server with Tornado.
here is the code:
#! /usr/bin/env python
#coding=utf-8
from tornado.tcpserver import TCPServer
from tornado.ioloop import IOLoop
from tornado.gen import *
class TcpConnection(object):
def __init__(self,stream,address):
self._stream=stream
self._address=address
self._stream.set_close_callback(self.on_close)
self.send_messages()
def send_messages(self):
self.send_message(b'hello \n')
print("next")
self.read_message()
self.send_message(b'world \n')
self.read_message()
def read_message(self):
self._stream.read_until(b'\n',self.handle_message)
def handle_message(self,data):
print(data)
def send_message(self,data):
self._stream.write(data)
def on_close(self):
print("the monitored %d has left",self._address)
class MonitorServer(TCPServer):
def handle_stream(self,stream,address):
print("new connection",address,stream)
conn = TcpConnection(stream,address)
if __name__=='__main__':
print('server start .....')
server=MonitorServer()
server.listen(20000)
IOLoop.instance().start()
And i face some eorror assert self._read_callback is None, "Already reading",i guess the eorror is because multiple commands to read from socket at the same time.and then i change the function send_messages with tornado.gen.coroutine.here is code:
#gen.coroutine
def send_messages(self):
yield self.send_message(b'hello \n')
response1 = yield self.read_message()
print(response1)
yield self.send_message(b'world \n')
print((yield self.read_message()))
but there are some other errors. the code seem to stop after yield self.send_message(b'hello \n'),and the following code seem not to execute.
how should i do about it ? If you're aware of any Tornado tcpserver (not HTTP!) code with tornado.gen.coroutine,please tell me.I would appreciate any links!

send_messages() calls send_message() and read_message() with yield, but these methods are not coroutines, so this will raise an exception.
The reason you're not seeing the exception is that you called send_messages() without yielding it, so the exception has nowhere to go (the garbage collector should eventually notice and print the exception, but that can take a long time). Whenever you call a coroutine, you should either use yield to wait for it to finish, or IOLoop.current().spawn_callback() to run the coroutine in the "background" (this tells Tornado that you do not intend to yield the coroutine, so it will print the exception as soon as it occurs). Also, whenever you override a method you should read the documentation to see whether coroutines are allowed (when you override TCPServer.handle_stream() you can make it a coroutine, but __init__() may not be a coroutine).
Once the exception is getting logged, the next step is to fix it. You can either make send_message() and read_message() coroutines (getting rid of the handle_message() callback in the process), or you can use tornado.gen.Task() to call coroutine-style code from a coroutine. I generally recommend using coroutines everywhere.

Scheduling a corroutine for a later loop iteration

I borrowed this code of a simple chat:
import tornado.ioloop
import tornado.web
import tornado.websocket
import tornado.gen
clients = []
class IndexHandler(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(request):
request.render("index.html")
class WebSocketChatHandler(tornado.websocket.WebSocketHandler):
def open(self, *args):
print("open", "WebSocketChatHandler")
clients.append(self)
def check_origin(self, origin):
return True
#tornado.gen.coroutine
def on_message(self, message):
for client in clients:
client.write_message(message)
#tornado.gen.coroutine
def myroutine(m):
print "mensaje: "
c = (yield 123123123)
print ("mensaje", m, c)
yield myroutine(message)
def on_close(self):
clients.remove(self)
app = tornado.web.Application([(r'/chat', WebSocketChatHandler), (r'/', IndexHandler)])
app.listen(8888)
tornado.ioloop.IOLoop.instance().start()
The Chat application works well (i.e. I see the echoes using a websocket client), and I modified it a bit to test some custom code.
And, just for testing purposes, I wanted to insert a presumably heavy function call which I wanted to make asynchronous.
The actual intention here, is that myroutine will start a game-engine as a paralell task.
Perhaps I am missing something, but the intention in my code is to re-schedule the corroutine in two parts. This means: the corroutine should print "message", then yield the value 123123123 (actually, this is an immediate value which will be wrapped into an already-resolved future - the value will be in the result), thus rescheduling itself to the next iteration, and (in the latter iteration) print the given tuple ("message", message, c).
My issue is that the function is never rescheduled (i.e. only "message:" is printed by console).
What am I doing wrong? This is my first attempt at Tornado (and async programming in general). How can I tell the tornado loop something like "dude, this value is my corruotine, and those are the arguments for my corroutine. please, start it in paralell by scheduling it in the next loop"?

There are two problems going on: first, you can't yield every kind of object from a coroutine, you must yield a Future or other special yieldable object. So when your coroutine yields 123123, Tornado throws a "bad yield" exception. Unfortunately, Tornado's websocket code isn't built to catch exceptions from "on_message" if "on_message" is a coroutine, so the exception passes silently. See the warning at the bottom of the coroutine documentation.
The solution for you is to yield a valid object from "mycoroutine". If you just want to yield for a moment, yield "gen.moment":
print "one"
yield gen.moment
print "two"
If you want "mycoroutine" to run in parallel and not block "on_message", just call it without yielding:
mycoroutine(message)
But! Calling a coroutine this way means no one is listening to see if it throws an exception. Make sure you catch and log all exceptions within "mycoroutine", since otherwise they will pass silently.

Python 2.7: Thread hanging, no idea how to debug.

I made a script to download wallpapers as a learning exercise to better familiarize myself with Python/Threading. Everything works well unless there is an exception trying to request a URL. This is the function I hit the exception (not a method of the same class, if that matters).
def open_url(url):
"""Opens URL and returns html"""
try:
response = urllib2.urlopen(url)
link = response.geturl()
html = response.read()
response.close()
return(html)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
logging.debug('failed to reach a server.')
logging.debug('Reason: %s', e.reason)
logging.debug(url)
return None
elif hasattr(e, 'code'):
logging.debug('The server couldn\'t fulfill the request.')
logging.debug('Code: %s', e.reason)
logging.debug(url)
return None
else:
logging.debug('Shit fucked up2')
return None
At the end of my script:
main_thread = threading.currentThread()
for thread in threading.enumerate():
if thread is main_thread: continue
while thread.isAlive():
thread.join(2)
break
From my current understanding (which may be wrong) if the thread is not completed it's task within 2 seconds of reaching this it should time out. Instead it will stick in the last while. If I take that out it will just hang once the script is done executing.
Also, I decided it was time to man up and leave Notepad++ for a real IDE with debugging tools so I downloaded Wing. I'm a big fan of Wing, but the script doesn't hang there... What do you all use to write Python?

There is no thread interruption in Python and no way to cancel a thread. It can only finish execution by itself. The join method only waits 2 seconds or until termination, it does not kill anything. You need to implement timeout mechanism in the thread itself.

I hit the books and figured out enough to correct the issue I was having. I was able to remove that code that was near the end of my script completely. I corrected this issue by spawning the thread pool differently.
for i in range(queue.qsize()):
td = ThreadDownload(queue)
td.start()
queue.join()
I also was not using a try: for queue.get() during the thread's execution.
try:
img_url = self.queue.get()
...
except Queue.Empty:
...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.