The goal
I want to make regular HTTP requests to a REST API. The requests happen at an interval ranging from a few seconds up to multiple hours, depending on the user input. I want to keep a single connection alive and close it in a smart way.
The questions
Is there an existing library that provides features similar to what i wrote with the class SessionOneToN?
Would it be possible to use a single session of type aiohttp.ClientSession that runs forever and is never closed?
What i have tried
What i have tried does work, but i wonder if there is an established library that can achieve the goals.
My code
My application uses the event loop module asyncio, and the HTTP module aiohttp.
My application opens a single session of type aiohttp.ClientSession and shares it between multiple asynchronous callers.
The callers can make their requests concurrently and asynchronously.
Whenever all callers have received their responses at the same time, the aiohttp.ClientSession is closed. A new aiohttp.ClientSession is opened as necessary when a caller makes a new request.
import aiohttp
import asyncio
from contextlib import asynccontextmanager
import sys
url = 'http://stackoverflow.com/questions/64305548/sharing-an-aiohttp-clientsession-between-multiple-asynchronous-callers'
# The callers look like so
async def request():
async with session() as s:
async with s.get(url) as response:
return await response.text()
# The session manager looks like so:
class SessionOneToN:
""" Manage one session that is reused by multiple clients, instantiate a new
session object from the given session_class as required. Provide a context
manager for accessing the session. The session_class returns a context
manager when instantiated, this context manager is referred to as the
internal context manager, and is entered when the first
client enters the context manager returned by context_manager, and is exited
when the last client exits context_manager."""
def __init__(self, session_class):
self.n = 0
self.session_class = session_class
self.context = None
self.aenter_result = None
async def plus(self):
self.n += 1
if self.n == 1:
# self.context: The internal context manager
self.context = context = self.session_class()
# self.aenter_result: The result from entering the internal context
# manager
self.aenter_result = await context.__aenter__()
assert self.aenter_result is not None
return self.aenter_result
def minus(self):
self.n -= 1
if self.n == 0:
return True
def cleanup(self):
self.context = None
self.aenter_result = None
#asynccontextmanager
async def get_context(self):
try:
aenter_result = await self.plus()
yield aenter_result
except:
if self.minus():
if not await self.context.__aexit__(*sys.exc_info()):
self.cleanup()
raise
self.cleanup()
else:
if self.minus():
await self.context.__aexit__(None, None, None)
self.cleanup()
class SessionOneToNaiohttp:
def __init__(self):
self.sotn = SessionOneToN(aiohttp.ClientSession)
def get_context(self):
return self.sotn.get_context()
sotnaiohttp = SessionOneToNaiohttp()
def session():
return sotnaiohttp.get_context()
response_text = asyncio.run(request())
print(response_text[0:10])
Related
I created a simple Django Channels consumer that should connects to an external source, retrieve data and send it to the client. So, the user opens the page > the consumer connects to the external service and gets the data > the data is sent to the websocket.
Here is my code:
import json
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
class EchoConsumer(AsyncJsonWebsocketConsumer):
async def connect(self):
await self.accept()
await self.send_json('test')
bm.start_trade_socket('BNBBTC', self.process_message)
bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)
This code works when i open one page; if i try to open a new window with a new account, though, it will throw the following error:
File "C:\Users\User\Desktop\Heroku\github\master\myapp\consumers.py", line 54, in connect
bm.start()
File "C:\Users\User\lib\threading.py", line 843, in start
raise RuntimeError("threads can only be started once")
threads can only be started once
I'm new to Channels, so this is a noob question, but how can i fix this problem? What i wanted to do was: user opens the page and gets the data, another user opens the page and gets the data; is there no way to do that? Or am i simply misunderstanding how Django Channels and websockets works?
Do you really need a secondary thread ?
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbol = ''
async def connect(self):
self.symbol = 'BNBBTC'
# or, more probably, retrieve the value for "symbol" from query_string
# so the client can specify which symbol he's interested into:
# socket = new WebSocket("ws://.../?symbol=BNBBTC");
await self.accept()
def process_message(self, message):
# PSEUDO-CODE BELOW !
if self.symbol == message['symbol']:
await self.send({
'type': 'websocket.send',
'text': json.dumps(message),
})
For extra flexibility, you might also accept al list of symbols from the client, instead:
//HTML
socket = new WebSocket("ws://.../?symbols=XXX,YYY,ZZZ");
then (in the consumer):
class EchoConsumer(AsyncJsonWebsocketConsumer):
symbols = []
async def connect(self):
# here we need to parse "?symbols=XXX,YYY,ZZZ" ...
# the code below has been stolen from another project of mine and should be suitably adapted
params = urllib.parse.parse_qs(self.scope.get('query_string', b'').decode('utf-8'))
try:
self.symbols = json.loads(params.get('symbols', ['[]'])[0])
except:
self.symbols = []
def process_message(self, message):
if message['symbol'] in self.symbols:
...
I'm no Django developer, but if I understand correctly, the function connect is being called more than once-- and bm.start references the same thread most likely made in bm.start_trade_socket (or somewhere else in connect). In conclusion, when bm.start is called, a thread is started, and when it is done again, you get that error.
Here start() Start the thread’s activity.
It should be called at most once per thread object. You have made a global object of BinanceSocketManager as "bm".
It will always raise a RuntimeError if called more than once on the same thread object.
Please refer the below mentioned code, it may help you
from channels.generic.websocket import WebsocketConsumer, AsyncConsumer, AsyncJsonWebsocketConsumer
from binance.client import Client
import json
from binance.websockets import BinanceSocketManager
import time
import asyncio
class EchoConsumer(AsyncJsonWebsocketConsumer):
client = Client('', '')
trades = client.get_recent_trades(symbol='BNBBTC')
bm = BinanceSocketManager(client)
async def connect(self):
await self.accept()
await self.send_json('test')
self.bm.start_trade_socket('BNBBTC', self.process_message)
self.bm.start()
def process_message(self, message):
JSON1 = json.dumps(message)
JSON2 = json.loads(JSON1)
#define variables
Rate = JSON2['p']
Quantity = JSON2['q']
Symbol = JSON2['s']
Order = JSON2['m']
asyncio.create_task(self.send_json(Rate))
print(Rate)
So I'm trying to port a code to work with Asyncio.
My problem are the #propertys, which - if not yet cached - need a network request to complete.
While I can await them without a problem in most contexts using that property in the __str__ method fails which is kinda expected.
What is the most pythonic way to solve this, other than caching the value on instance creation or just not using Asyncio.
Here is a example code illustrating the issue:
import requests
from typing import Union
class Example(object):
_username: Union[str, None]
#property
def username(self):
if not _username:
r = request.get('https://api.example.com/getMe')
self._username = r.json()['username']
return self._username
def __str__(self):
return f"Example(username={self.username!r})"
e = Example()
str(e)
So far I could come up with the following.
I could substitute #property with #async_property (
PyPi, Github, Docs),
and requests with httpx (PyPi, Github, Docs).
However it is still not working.
My attempt at porting it to Asyncio:
import httpx
from typing import Union
from async_property import async_property
class Example(object):
_username: Union[str, None]
#async_property
async def username(self):
if not _username:
async with httpx.AsyncClient() as client:
r = await client.get('https://www.example.org/')
self._username = r.json()['username']
return self._username
async def __str__(self):
return f"Example(username={await self.username!r})"
e = Example()
str(e)
Fails with TypeError: __str__ returned non-string (type coroutine).
I'm following this Route_Guide sample.
The sample in question fires off and reads messages without replying to a specific message. The latter is what i'm trying to achieve.
Here's what i have so far:
import grpc
...
channel = grpc.insecure_channel(conn_str)
try:
grpc.channel_ready_future(channel).result(timeout=5)
except grpc.FutureTimeoutError:
sys.exit('Error connecting to server')
else:
stub = MyService_pb2_grpc.MyServiceStub(channel)
print('Connected to gRPC server.')
this_is_just_read_maybe(stub)
def this_is_just_read_maybe(stub):
responses = stub.MyEventStream(stream())
for response in responses:
print(f'Received message: {response}')
if response.something:
# okay, now what? how do i send a message here?
def stream():
yield my_start_stream_msg
# this is fine, i receive this server-side
# but i can't check for incoming messages here
I don't seem to have a read() or write() on the stub, everything seems to be implemented with iterators.
How do i send a message from this_is_just_read_maybe(stub)?
Is that even the right approach?
My Proto is a bidirectional stream:
service MyService {
rpc MyEventStream (stream StreamingMessage) returns (stream StreamingMessage) {}
}
What you're trying to do is perfectly possible and will probably involve writing your own request iterator object that can be given responses as they arrive rather than using a simple generator as your request iterator. Perhaps something like
class MySmarterRequestIterator(object):
def __init__(self):
self._lock = threading.Lock()
self._responses_so_far = []
def __iter__(self):
return self
def _next(self):
# some logic that depends upon what responses have been seen
# before returning the next request message
return <your message value>
def __next__(self): # Python 3
return self._next()
def next(self): # Python 2
return self._next()
def add_response(self, response):
with self._lock:
self._responses.append(response)
that you then use like
my_smarter_request_iterator = MySmarterRequestIterator()
responses = stub.MyEventStream(my_smarter_request_iterator)
for response in responses:
my_smarter_request_iterator.add_response(response)
. There will probably be locking and blocking in your _next implementation to handle the situation of gRPC Python asking your object for the next request that it wants to send and your responding (in effect) "wait, hold on, I don't know what request I want to send until after I've seen how the next response turned out".
Instead of writing a custom iterator, you can also use a blocking queue to implement send and receive like behaviour for client stub:
import queue
...
send_queue = queue.SimpleQueue() # or Queue if using Python before 3.7
my_event_stream = stub.MyEventStream(iter(send_queue.get, None))
# send
send_queue.push(StreamingMessage())
# receive
response = next(my_event_stream) # type: StreamingMessage
This makes use of the sentinel form of iter, which converts a regular function into an iterator that stops when it reaches a sentinel value (in this case None).
I have web-crawler and http interface for it.
Crawler gets grouped urls as dictionary. I need to return a result in the same format in JSON. But I was faced with a large memory usage, which is not returned to the operating system. How can I implement this solution without large memory usage?
Code:
#!/usr/bin/env python
# coding=utf-8
import collections
import tornado.web
import tornado.ioloop
import tornado.queues
import tornado.httpclient
class ResponseError(Exception):
pass
class Crawler(object):
client = tornado.httpclient.AsyncHTTPClient()
def __init__(self, groups, concurrency=10, retries=3, validators=None):
self.groups = groups
self.concurrency = concurrency
self.retries = retries
self.validators = validators or []
self.requests = tornado.queues.Queue()
self.responses = collections.defaultdict(list)
async def worker(self):
while True:
await self.consume()
async def validate(self, response):
for validator in self.validators:
validator(response)
async def save(self, response):
self.responses[response.request.group].append(response.body.decode('utf-8'))
async def consume(self):
async for request in self.requests:
try:
response = await self.client.fetch(request, raise_error=False)
await self.validate(response)
await self.save(response)
except ResponseError:
if request.retries < self.retries:
request.retries += 1
await self.requests.put(request)
finally:
self.requests.task_done()
async def produce(self):
for group, urls in self.groups.items():
for url in urls:
request = tornado.httpclient.HTTPRequest(url)
request.group = group
request.retries = 0
await self.requests.put(request)
async def fetch(self):
await self.produce()
for __ in range(self.concurrency):
tornado.ioloop.IOLoop.current().spawn_callback(self.worker)
await self.requests.join()
class MainHandler(tornado.web.RequestHandler):
async def get(self):
urls = []
with open('urls') as f: # mock
for line in f:
urls.append(line.strip())
crawler = Crawler({'default': urls})
await crawler.fetch()
self.write(crawler.responses)
if __name__ == '__main__':
app = tornado.web.Application(
(tornado.web.url(r'/', MainHandler),), debug=True
)
app.listen(8000)
tornado.ioloop.IOLoop.current().start()
It looks to me like most of the memory usage is devoted to self.responses. Since you seem to be ordering responses by "group" before writing them to a file, I can understand why you do it this way. One idea is to store them in a database (MySQL or MongoDB or whatever) with the "group" as column or field value in the database record.
The database might be the final destination of your data, or else it might be a temporary place to store the data until crawler.fetch completes. Then, query all the data from the database, ordered by "group", and write it to the file.
This doesn't solve the problem, it just means that the database process is responsible for most of your memory usage, instead of the Python process. This may be preferable for you, however.
I need to call a celery task for each GRPC request, and return the result.
In default GRPC implementation, each request is processed in a separate thread from a threadpool.
In my case, the server is supposed to process ~400 requests in batch mode per second. So one request may have to wait 1 second for the result due to the batch processing, which means the size of the threadpool must be larger than 400 to avoid blocking.
Can this be done asynchronously?
Thanks a lot.
class EventReporting(ss_pb2.BetaEventReportingServicer, ss_pb2.BetaDeviceMgtServicer):
def ReportEvent(self, request, context):
res = tasks.add.delay(1,2)
result = res.get() ->here i have to block
return ss_pb2.GeneralReply(message='Hello, %s!' % result.message)
As noted by #Michael in a comment, as of version 1.32, gRPC now supports asyncio in its Python API. If you're using an earlier version, you can still use the asyncio API via the experimental API: from grpc.experimental import aio. An asyncio hello world example has also been added to the gRPC repo. The following code is a copy of the example server:
import logging
import asyncio
from grpc import aio
import helloworld_pb2
import helloworld_pb2_grpc
class Greeter(helloworld_pb2_grpc.GreeterServicer):
async def SayHello(self, request, context):
return helloworld_pb2.HelloReply(message='Hello, %s!' % request.name)
async def serve():
server = aio.server()
helloworld_pb2_grpc.add_GreeterServicer_to_server(Greeter(), server)
listen_addr = '[::]:50051'
server.add_insecure_port(listen_addr)
logging.info("Starting server on %s", listen_addr)
await server.start()
await server.wait_for_termination()
if __name__ == '__main__':
logging.basicConfig(level=logging.INFO)
asyncio.run(serve())
See my other answer for how to implement the client.
It can be done asynchronously if your call to res.get can be done asynchronously (if it is defined with the async keyword).
While grpc.server says it requires a futures.ThreadPoolExecutor, it will actually work with any futures.Executor that calls the behaviors submitted to it on some thread other than the one on which they were passed. Were you to pass to grpc.server a futures.Executor implemented by you that only used one thread to carry out four hundred (or more) concurrent calls to EventReporting.ReportEvent, your server should avoid the kind of blocking that you describe.
In my opinion is good simple implementation async grpc server, same like http based on aiohttp.
import asyncio
from concurrent import futures
import functools
import inspect
import threading
from grpc import _server
def _loop_mgr(loop: asyncio.AbstractEventLoop):
asyncio.set_event_loop(loop)
loop.run_forever()
# If we reach here, the loop was stopped.
# We should gather any remaining tasks and finish them.
pending = asyncio.Task.all_tasks(loop=loop)
if pending:
loop.run_until_complete(asyncio.gather(*pending))
class AsyncioExecutor(futures.Executor):
def __init__(self, *, loop=None):
super().__init__()
self._shutdown = False
self._loop = loop or asyncio.get_event_loop()
self._thread = threading.Thread(target=_loop_mgr, args=(self._loop,),
daemon=True)
self._thread.start()
def submit(self, fn, *args, **kwargs):
if self._shutdown:
raise RuntimeError('Cannot schedule new futures after shutdown')
if not self._loop.is_running():
raise RuntimeError("Loop must be started before any function can "
"be submitted")
if inspect.iscoroutinefunction(fn):
coro = fn(*args, **kwargs)
return asyncio.run_coroutine_threadsafe(coro, self._loop)
else:
func = functools.partial(fn, *args, **kwargs)
return self._loop.run_in_executor(None, func)
def shutdown(self, wait=True):
self._loop.stop()
self._shutdown = True
if wait:
self._thread.join()
# --------------------------------------------------------------------------- #
async def _call_behavior(rpc_event, state, behavior, argument, request_deserializer):
context = _server._Context(rpc_event, state, request_deserializer)
try:
return await behavior(argument, context), True
except Exception as e: # pylint: disable=broad-except
with state.condition:
if e not in state.rpc_errors:
details = 'Exception calling application: {}'.format(e)
_server.logging.exception(details)
_server._abort(state, rpc_event.operation_call,
_server.cygrpc.StatusCode.unknown, _server._common.encode(details))
return None, False
async def _take_response_from_response_iterator(rpc_event, state, response_iterator):
try:
return await response_iterator.__anext__(), True
except StopAsyncIteration:
return None, True
except Exception as e: # pylint: disable=broad-except
with state.condition:
if e not in state.rpc_errors:
details = 'Exception iterating responses: {}'.format(e)
_server.logging.exception(details)
_server._abort(state, rpc_event.operation_call,
_server.cygrpc.StatusCode.unknown, _server._common.encode(details))
return None, False
async def _unary_response_in_pool(rpc_event, state, behavior, argument_thunk,
request_deserializer, response_serializer):
argument = argument_thunk()
if argument is not None:
response, proceed = await _call_behavior(rpc_event, state, behavior,
argument, request_deserializer)
if proceed:
serialized_response = _server._serialize_response(
rpc_event, state, response, response_serializer)
if serialized_response is not None:
_server._status(rpc_event, state, serialized_response)
async def _stream_response_in_pool(rpc_event, state, behavior, argument_thunk,
request_deserializer, response_serializer):
argument = argument_thunk()
if argument is not None:
# Notice this calls the normal `_call_behavior` not the awaitable version.
response_iterator, proceed = _server._call_behavior(
rpc_event, state, behavior, argument, request_deserializer)
if proceed:
while True:
response, proceed = await _take_response_from_response_iterator(
rpc_event, state, response_iterator)
if proceed:
if response is None:
_server._status(rpc_event, state, None)
break
else:
serialized_response = _server._serialize_response(
rpc_event, state, response, response_serializer)
print(response)
if serialized_response is not None:
print("Serialized Correctly")
proceed = _server._send_response(rpc_event, state,
serialized_response)
if not proceed:
break
else:
break
else:
break
_server._unary_response_in_pool = _unary_response_in_pool
_server._stream_response_in_pool = _stream_response_in_pool
if __name__ == '__main__':
server = grpc.server(AsyncioExecutor())
# Add Servicer and Start Server Here
link to original:
https://gist.github.com/seglberg/0b4487b57b4fd425c56ad72aba9971be