I'm trying to connect to more than one server at the same time. I am currently using loop.create_connection but it freezes up at the first non-responding server.
gsock = loop.create_connection(lambda: opensock(sid), server, port)
transport, protocol = loop.run_until_complete(gsock)
I tried threading this but it created problems with the sid value being used as well as various errors such as RuntimeError: Event loop is running and RuntimeError: Event loop stopped before Future completed. Also, according my variables (tho were getting mixed up) the protocol's connection_made() method gets executed when transport, protocol = loop.run_until_complete(gsock) throws an exception.
I don't understand much about the asyncio module so please be as thorough as possible. I dont think I need reader/writer variables, as the reading should be done automatically and trigger data_received() method.
Thank You.
You can connect to many servers at the same time by scheduling all the coroutines concurrently, rather than using loop.run_until_complete to make each connection individually. One way to do that is to use asyncio.gather to schedule them all and wait for each to finish:
import asyncio
# define opensock somewhere
#asyncio.coroutine
def connect_serv(server, port):
try:
transport, protocol = yield from loop.create_connection(lambda: opensock(sid), server, port)
except Exception:
print("Connection to {}:{} failed".format(server, port))
loop = asyncio.get_event_loop()
loop.run_until_complete(
asyncio.gather(
connect_serv('1.2.3.4', 3333),
connect_serv('2.3.4.5', 5555),
connect_serv('google.com', 80),
))
loop.run_forever()
This will kick off all three coroutines listed in the call to gather concurrently, so that if one of them hangs, the others won't be affected; they'll be able to carry on with their work while the other connection hangs. Then, if all of them complete, loop.run_forever() gets executed, which will allow you program to continue running until you stop the loop or kill the program.
The reader/writer variables you mentioned would only be relevant if you used asyncio.open_connection to connect to the servers, rather than create_connection. It uses the Stream API, which is a higher-level API than the protocol/transport-based API that create_connection uses. It's really up to you to decide which you prefer to use. There are examples of both in the asyncio docs, if you want to see a comparison.
Related
I'm working on an application that uses LevelDB and that uses multiple long-lived processes for different tasks.
Since LevelDB does only allow a single process maintaining a database connection, all our database access is funneled through a special database process.
To access the database from another process we use a BaseProxy. But since we are using asyncio our proxy shouldn't block on these APIs that call into the db process which then eventually read from the db. Therefore we implement the APIs on the proxy using an executor.
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
thread_pool_executor,
self._callmethod,
method_name,
args,
)
And while that works just fine, I wonder if there's a better alternative to wrapping the _callmethod call of the BaseProxy in a ThreadPoolExecutor.
The way I understand it, the BaseProxy calling into the DB process is the textbook example of waiting on IO, so using a thread for this seems unnecessary wasteful.
In a perfect world, I'd assume an async _acallmethod to exist on the BaseProxy but unfortunately that API does not exist.
So, my question basically boils down to: When working with BaseProxy is there a more efficient alternative to running these cross process calls in a ThreadPoolExecutor?
Unfortunately, the multiprocessing library is not suited to conversion to asyncio, what you have is the best you can do if you must use BaseProxy to handle your IPC (Inter-Process communication).
While it is true that the library uses blocking I/O here you can't easily reach in and re-work the blocking parts to use non-blocking primitives instead. If you were to insist on going this route you'd have to patch or rewrite the internal implementation details of that library, but being internal implementation details these can differ from Python point release to point release making any patching fragile and prone to break with minor Python upgrades. The _callmethod method is part of a deep hierarchy of abstractions involving threads, socket or pipe connections, and serializers. See multiprocessing/connection.py and multiprocessing/managers.py.
So your options here are to stick with your current approach (using a threadpool executor to shove BaseProxy._callmethod() to another thread) or to implement your own IPC solution using asyncio primitives. Your central database-access process would act as a server for your other processes to connect to as a client, either using sockets or named pipes, using an agreed-upon serialisation scheme for client requests and server responses. This is what multiprocessing implements for you, but you'd implement your own (simpler) version, using asyncio streams and whatever serialisation scheme best suits your application patterns (e.g. pickle, JSON, protobuffers, or something else entirely).
A thread pool is what you want. aioprocessing provides some async functionality of multiprocessing, but it does it using threads as you have proposed. I suggest making an issue against python if there isn't one for exposing true async multiprocessing.
https://github.com/dano/aioprocessing
In most cases, this library makes blocking calls to multiprocessing methods asynchronous by executing the call in a ThreadPoolExecutor
Assuming you have the python and the database running in the same system (i.e. you are not looking to async any network calls), you have two options.
what you are already doing (run in executor). It blocks the db thread but main thread remains free to do other stuff. This is not pure non-blocking, but it is quite an acceptable solution for I/O blocking cases, with a small overhead of maintaining a thread.
For true non-blocking solution (that can be run in a single thread without blocking) you have to have #1. native support for async (callback) from the DB for each fetch call and #2 wrap that in your custom event loop implementation. Here you subclass the Base loop, and overwrite methods to integrate your db callbacks. For example you can create a base loop that implements a pipe server. the db writes to the pipe and python polls the pipe. See the implementation of Proactor event loop in the asyncio code base. Note: I have never implemented any custom event loop.
I am not familiar with leveldb, but for a key-value store, it is not clear if there will be any significant benefit for such a callback for fetch and pure non-blocking implementation. In case you are getting multiple fetches inside an iterator and that is your main problem you can make the loop async (with each fetch still blocking) and can improve your performance. Below is a dummy code that explains this.
import asyncio
import random
import time
async def talk_to_db(d):
"""
blocking db iteration. sleep is the fetch function.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async(d):
"""
real non-blocking db iteration. fetch (sleep) is native async here
"""
for k, v in d.items():
await asyncio.sleep(1)
yield (f"{k}:{v}")
async def talk_to_db_async_loop(d):
"""
semi-non-blocking db iteration. fetch is blocking, but the
loop is not.
"""
for k, v in d.items():
time.sleep(1)
yield (f"{k}:{v}")
await asyncio.sleep(0)
async def db_call_wrapper(db):
async for row in talk_to_db(db):
print(row)
async def db_call_wrapper_async(db):
async for row in talk_to_db_async(db):
print(row)
async def db_call_wrapper_async_loop(db):
async for row in talk_to_db_async_loop(db):
print(row)
async def func(i):
await asyncio.sleep(5)
print(f"done with {i}")
database = {i:random.randint(1,20) for i in range(20)}
async def main():
db_coro = db_call_wrapper(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async():
db_coro = db_call_wrapper_async(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
async def main_async_loop():
db_coro = db_call_wrapper_async_loop(database)
coros = [func(i) for i in range(20)]
coros.append(db_coro)
await asyncio.gather(*coros)
# run the blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# run the non-blocking db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async())
# run the non-blocking (loop only) db iteration
loop = asyncio.get_event_loop()
loop.run_until_complete(main_async_loop())
This is something you can try. Otherwise, I would say your current method is quite efficient. I do not think BaseProxy can give you an async acall API, it does not know how to handle the callback from your db.
I have a python server that is available through a websocket endpoint.
During serving a connection, it also communicates with some backend services. This communication is asynchronous and may trigger the send() method of the websocket.
When a single client is served, it seems to work ok. However, when multiple clients are served in parallel, some of the routines that handle the connections get stuck occasionally. More precisely, it seem to block in the recv() method.
The actual code is somehow complex and the issue is slightly more complicated than I have described, nevertheless, I provide a minimal skeleton of code that sketch the way in which I use he websockets:
class MinimalConversation(object):
def __init__(self, ws, worker_sck, messages, should_continue_conversation, should_continue_listen):
self.ws = ws
self.messages = messages
self.worker_sck = worker_sck
self.should_continue_conversation = should_continue_conversation
self.should_continue_listen = should_continue_listen
async def run_conversation(self):
serving_future = asyncio.ensure_future(self.serve_connection())
listening_future = asyncio.ensure_future(self.handle_worker())
await asyncio.wait([serving_future, listening_future], return_when=asyncio.ALL_COMPLETED)
async def serve_connection(self):
while self.should_continue_conversation():
await self.ws.recv()
logger.debug("Message received")
self.sleep_randomly(10, 5)
await self.worker_sck.send(b"Dummy")
async def handle_worker(self):
while self.should_continue_listen():
self.sleep_randomly(50, 40)
await self.worker_sck.recv()
await self.ws.send(self.messages.pop())
def sleep_randomly(self, mean, dev):
delta = random.randint(1, dev) / 1000
if random.random() < .5:
delta *= -1
time.sleep(mean / 1000 + delta)
Obviously, in the real code I do not sleep for random intervals and don't use given list of messages but this sketches the way I handle the websockets. In the real setting, some errors may occur that are sent over the websocket too, so parallel sends() may occur in theory but I have never encountered such a situation.
The code is run from a handler function which is passed as a parameter to websockets.serve(), initialize the MinimalConversation object and calls the run_conversation() method.
My questions are:
Is there something fundamentally wrong with such usage of the websockets?
Are concurrent calls of the send() methods dangerous?
Can you suggest some good practices regarding usage of websockets and asyncio?
Thak you.
recv function yields back only when a message is received, and it seems that there are 2 connections awaiting messages from each other, so there might be a situation similar to "deadlock" when they are waiting for each other's messages and can't send anything. Maybe you should try to rethink the overall algorithm to be safer from this.
And, of course, try adding more debug output and see what really happens.
are concurrent calls of the send() methods dangerous?
If by concurrent you mean in the same thread but in independently scheduled coroutines then parallel send is just fine. But be careful with "parallel" recv on the same connection, because order of coroutine scheduling might be far from obvious and it's what decides which call to recv will get a message first.
Can you suggest some good practices regarding usage of websockets and asyncio?
In my experience, the easiest way is to create a dedicated task for incoming connections which will repeatedly call recv on the connection, until connection is closed. You can store the connection somewhere and delete it in finally block, then it can be used from other coroutines to send something.
When using time.sleep(1) before sendMessage, the hole process stops (even the others connections).
def handleConnected(self):
print self.address, 'connected'
for client in clients:
time.sleep(1)
client.sendMessage(self.address[0] + u' - connected')
Server: https://github.com/dpallot/simple-websocket-server
How to solve it?
The server that you are using is a synchronous, "select" type server. These servers use a single process and a single thread, they achieve concurrency through the use of the select() function to efficiently wait for I/O on multiple socket connections.
The advantage of select servers is that they can easily scale to very large number of clients. The disadvantage is that when the server invokes an application handler (the handleConnected(), handleMessage() and handleClose() methods for this server), the server blocks on them, meaning that while the handlers are running the server is suspended, because both the handlers and the server run on the same thread. The only way for the server to be responsive in this type of architecture is to code the handlers in such a way that they do what they need to do quickly and return control back to the server.
Your handleConnected handler function is not a good match for this type of server, because it is a long running function. This function will run for several seconds (as many seconds as there are clients), so during all that time the server is going to be blocked.
You can maybe work around the limitations in this server by creating a background thread for your long running task. That way your handler can return back to the server after launching the thread. The server will then regain control and go back to work, while the background thread does that loop with the one second sleeps inside. The only problem you have to consider is that now you have sort of a home-grown multithreaded server, so you will not be able to scale as easily.
Another option for you to consider is to use a different server architecture. A coroutine based server will support your handler function as you coded it, for example. The two servers that I recommend in this category are eventlet and gevent. The eventlet server comes with native WebSocket support. For gevent you have to install an extension called gevent-websocket.
Good luck!
You are suspending the thread with sleep and the server which you are using seems to be using select to handle the requests not threads. So no other request will be able to be handled.
So you can't use time.sleep.
Why do you need to sleep? Can you solve it some other way?
Maybe you can use something like threading.Timer()
def sendHello(client):
client.sendMessage("hello, world")
for client in clients:
t = Timer(1.0, lambda: sendHello(client))
t.start() # after 30 seconds, "hello, world" will be printed
This is off the top of my head. You would also need a way to cancel each timer so I guess you would need to save each t in a list and call it when done.
I'm using SocketServer.ThreadingMixIn, pretty much as in the docs.
Other than having extracted the clients to run on their own script, I've also redefined the handle method as I want the connection to the client to keep alive and receive more messages:
def handle(self):
try:
while True:
data = self.request.recv(1024)
if not data:
break # Quits the thread if the client was disconnected
else:
print(cur_thread.name)
self.request.send(data)
except:
pass
The problem is that even when I try to terminate the server with server.shutdown() or by KeyboardInterrupt, it will still be blocked on the handle as long as the client maintains an open socket.
So how I can effectively stop the server even if there are still connected clients?
The best solution I found was to use SocketServer.ForkingMixIn instead of SocketServer.ThreadingMixIn.
This way the daemon actually works, even though using processes instead of threads was not exactly what I wanted.
I am wanting to create a RabbitMQ receiver/consumer in Python and am not sure how to check for messages. I am trying to do this in my own loop, not using the call-backs in pika.
If I understand things, in the Java client I can use getBasic() to check to see if there are any messages available without blocking. I don't mind blocking while getting messages, but I don't want to block until there is a message.
I don't find any clear examples and haven't yet figured out the corresponding call in pika.
If you want to do it synchronously then you will need to look at the pika BlockingConnection
The BlockingConnection creates a layer on top of Pika’s asynchronous
core providng methods that will block until their expected response
has returned. Due to the asynchronous nature of the Basic.Deliver and
Basic.Return calls from RabbitMQ to your application, you are still
required to implement continuation-passing style asynchronous methods
if you’d like to receive messages from RabbitMQ using basic_consume or
if you want to be notified of a delivery failure when using
basic_publish.
More info and an example here
https://pika.readthedocs.org/en/0.9.12/connecting.html#blockingconnection
You can periodically check the queue size using the example of this answer Get Queue Size in Pika (AMQP Python)
Queue processing loop can be done iteratively with the help of process_data_events():
import pika
# A stubborn callback that still wants to be in the code.
def mq_callback(ch, method, properties, body):
print(" Received: %r" % body)
connection = pika.BlockingConnection(pika.ConnectionParameters("localhost"))
channel = connection.channel()
queue_state = channel.queue_declare(queue="test")
# Configure a callback.
channel.basic_consume(mq_callback, queue="test")
try:
# My own loop here:
while(True):
# Do other processing
# Process message queue events, returning as soon as possible.
# Issues mq_callback() when applicable.
connection.process_data_events(time_limit=0)
finally:
connection.close()