I'm working on a Discord bot in which I mainly process images. So far it's working but when multiple images are sent at once, I experience a lot of blocking and inconsistency.
It goes like this:
User upload image > Bot places 'eyes' emoji on the message > bot processes the image > bot responds with result.
However, sometimes it can handle multiple images at once (the bot places the eyes emoji on the first few images) but usually it just puts emoji on the first image and then after finishing that one it will process the next 2-3 images etc.
The process which takes most of the time is the OCR reading the image.
Here is some abstract code:
main.py
#client.event
async def on_message(message):
...
if len(message.attachments) > 0: await message_service.handle_image(message)
...
message_service.py
async def handle_image(self, message):
supported_attachments = filter_out_unsupported(message.attachments)
images = []
await message.reply(f"{random_greeting()} {message.author.mention}, I'm processing your upload(s) please wait a moment, this could take up to 30 seconds.")
await message.add_reaction('👀')
for a in supported_attachments:
async with aiohttp.ClientSession() as session:
async with session.get(a) as res:
if res.status == 200:
buffer = io.BytesIO(await res.read())
arr = np.asarray(bytearray(buffer.read()), dtype=np.uint8)
images.append(cv2.imdecode(arr, -1))
for image in images:
result = await self.image_handler.handle_image(image, message.author)
await message.remove_reaction('👀', message.author)
if result == None:
await message.reply(f"{message.author.mention} I can't process your image. It's incorrect, unclear or I'm just not smart enough... :(")
await message.add_reaction('❌')
else:
await message.reply(result)
image_handler
async def handle_image(self, image, author):
try:
if image is None: return None
governor_id = str(self.__get_governor_id_by_discord_id(author.id))
if governor_id == None:
return f"{author.mention} there was no account registered under your discord id, please register by using this format: `$register <governor_id> <in game name>`, for example: `$register ... ...`. After that repost the screenshot.\n As for now multiple accounts are not supported."
# This part is most likely the bottleneck !!
read_result = self.reader.read_image_task(image)
if self.__no_values_are_found(...):
return None
return self.sheets_client.update_player_row_in_sheets(...)
except:
return None
def __no_values_are_found(self, *args):
return all(v is None for v in [*args])
def __get_governor_id_by_discord_id(self, id):
return self.sheets_client.get_governor_id_by_discord_id(id)
I'm new to Python and Discord bots in general, but is there a clean way to handle this?
I was thinking about threading but can't seem to find many solutions within this context, which makes me believe I am missing something or doing something inefficiently.
There is actually a clean way, you can create your own to_thread decorator and decorate your blocking functions (though they cannot be coroutines, they must be normal, synchronous functions)
import asyncio
from functools import partial, wraps
def to_thread(func):
#wraps(func)
async def wrapper(*args, **kwargs):
loop = asyncio.get_event_loop()
callback = partial(func, *args, **kwargs)
return await loop.run_in_executor(None, callback) # if using python 3.9+ use `await asyncio.to_thread(callback)`
return wrapper
# usage
#to_thread
def handle_image(self, image, author): # notice how it's *not* an async function
...
# calling
await handle_image(...) # not a coroutine, yet I'm awaiting it (cause of the wrapper function)
Related
I have two async functions that both need to run constantly and one of them just hogs all of the CPU.
The first function handles receiving websocket messages from a client.
async def handle_message(self, ws):
"""Handles a message from the websocket."""
logger.info('awaiting message')
while True:
msg = await ws.receive()
logger.debug('received message: %s', msg)
jmsg = json.loads(msg['text'])
logger.info('received message: {}'.format(jmsg))
param = jmsg['parameter']
val = jmsg['data']['value']
logger.info('setting parameter {} to {}'.format(param, val))
self.camera.change_parameter(param, val)
The second function grabs images from a camera and sends them to the frontend client. This is the one that one that won't give the other guy any time.
async def send_image(self, ws):
"""Sends an image to the websocket."""
for im in self.camera:
await asyncio.sleep(1000)
h, w = im.shape[:2]
resized = cv2.resize(im, (w // 4, h // 4))
await ws.send_bytes(image_to_bytes(resized))
I'm executing these coroutines using asyncio.gather(). The decorator is from FastAPI and Backend() is my class that contains the two async coroutines.
#app.websocket('/ws')
async def websocket_endpoint(websocket: WebSocket):
"""Handle a WebSocket connection."""
backend = Backend()
logger.info('Started backend.')
await websocket.accept()
try:
aws = [backend.send_image(websocket), backend.handle_message(websocket)]
done, pending = await asyncio.gather(*aws)
except WebSocketDisconnect:
await websocket.close()
Both of these coroutines will operate seperately, but if I try to run them together send_image() never gives any time to handle_message and so none of the messages are ever received (or at least that's what I think is going on).
I thought this is what asyncio was trying to solve, but I'm probably using it wrong. I thought about using multiprocessing, but I'm pretty sure FastAPI expects awaitables here. I also read about using the return variables from gather(), but I didn't really understand. Something about canceling the pending tasks and adding them back to the event loop.
Can anyone show me the correct (and preferably modern pythonic) way to make these async coroutines run concurrently?
For every 10 minutes, do the following tasks.
- generate list of image urls to download
- (if previous download is not finished, we have to cancel the download)
- download images concurrently
I'm relatively new to coroutines..
Can I structure the above with coroutines?
I think coroutine is essentially sequential flow..
So having problem thinking about it..
Actually, come to think of it myself, following would work?
async def generate_urls():
await sleep(10)
result = _generate_urls()
return result
async def download_image(url):
# download images
image = await _download_image()
return image
async def main():
while True:
urls = await generate_urls()
for url in urls:
download_task = asyncio.create_task(download_image(url))
await download_task
asyncio.run(main())
You current code is quite close. Below are some modifications to make it more closely align with your original spec:
import asyncio
def generate_urls():
return _generate_urls() #no need to sleep in the URL generation function
async def download_image(url):
image = await _download_image()
return image
async def main():
tasks = []
while True:
tasks.extend(t:=[asyncio.create_task(download_image(url)) for url in generate_urls()])
await asyncio.gather(*t) #run downloads concurrently
await asyncio.sleep(10) #sleep after creating tasks
for i in d: #after 10 seconds, check if any of the downloads are still running
if not i.done():
i.cancel() #cancel if task is not complete
I was trying to be able to send/edit/delete messages and all that webhooks can't do (and i dont want to use them in this specific case).
I also wanted to see how to import some other module that's like looping in itself and blocking the entire script once i call the run function.
I don't know how I can run this in a multiprocess or thread. I mean yes I could start a process or thread with the run function.
But how am i supposed to call send_msg function afterwards? (or edit_msg and delete_msg, i didn't write them here since it will be similiar)
the bot.py:
import discord
def run(token):
client = discord.Client()
#client.event
async def on_ready():
print("Bot is ready.")
#client.event
async def on_message(message):
if message.author == client.user:
return
return #for now just trying to get the bot to send anything with the send_msg function from another process.
client.run(token)
def send_msg(client, channel_id, text):
channel = client.get_channel(channel_id)
if channel is None: return None
try:
msg = await channel.send(text)
except Exception as e:
print("Could not send:", repr(e))
return None
else:
return msg
I want to be able to use it like:
import this bot.py into main script
start the multiprocess/thread with the run function from main script
call send_msg from main script when i have a message to send (edit and delete are similiar so i only have to understand this)
get the return value from send_msg into main script and continue my way in main script without being blocked by the bot or anything (having to wait while sending/editing/deleting in the main script is okay, but I don't want to be blocked by the bot's async loop forever)
so is there any way to multiprocess/thread the entire module? or is there a solution to this? or maybe i should somehow make POST request myself instead of using discord.py (but I dont know how because of Gateways)?
I don't have an exact solution to your problem, but I would start by dropping it into one class. At this point, I don't see you using the same client in "send_msg" function as you declared in "run" function. Also remember to add async before defining "send_msg" method.
import discord
def DiscordBot():
def __init__(self, client):
self.client = discord.Client()
def run(token):
#self.client.event
async def on_ready():
print("Bot is ready.")
#self.client.event
async def on_message(message):
if message.author == self.client.user:
return
return #for now just trying to get the bot to send anything with the send_msg function from another process.
self.client.run(token)
async def send_msg(self, channel_id, text):
channel = self.client.get_channel(channel_id)
if channel is None: return None
try:
msg = await channel.send(text)
except Exception as e:
print("Could not send:", repr(e))
return None
else:
return msg
I'm having some issue using a throttler for telegram api.
The problem is basically that if number of requests goes over my throttler limit, when the min passes, the messages get sent randomly.
Here's the code for the throttler I'm using (Found it on some github)
class Throttler:
def __init__(self, rate_limit, period=1.0, retry_interval=0.01):
self.rate_limit = rate_limit
self.period = period
self.retry_interval = retry_interval
self._task_logs = deque()
def flush(self):
now = time.time()
while self._task_logs:
if now - self._task_logs[0] > self.period:
self._task_logs.popleft()
else:
break
async def acquire(self):
while True:
self.flush()
if len(self._task_logs) < self.rate_limit:
break
await asyncio.sleep(self.retry_interval)
self._task_logs.append(time.time())
async def __aenter__(self):
await self.acquire()
async def __aexit__(self, exc_type, exc, tb):
pass
I can use this as following
throttler = Throttler(rate_limit=30, period=10)
async with throttler:
await sendmessage(message)
Found out that the best way to get around this was using a different algorithm for the throttler.
The throttler I was using above would always deliver messages randomly because after an initial burst, messages will get stuck in the queue and when the time has passed, asyncio will release all messages at once.
I found out the best way around this is to use what's called a LeakyBucket algorithm. I used the following answer to implement a LeakyBucket myself https://stackoverflow.com/a/45502319/7055234
I'm making a small application that attempts to find company website URLs by searching for their names via Bing. It takes in a big list of company names, uses the Bing Search API to obtain the 1st URL, & saves those URLs back in the list.
I'm having a problem with aiohttp's ClientSession.get() method, specifically, it fails silently & I can't figure out why.
Here's how I'm initializing the script. Keep an eye out for worker.perform_mission():
async def _execute(workers,*, loop=None):
if not loop:
loop = asyncio.get_event_loop()
[asyncio.ensure_future(i.perform_mission(verbose=True), loop=loop) for i in workers]
def main():
filepth = 'c:\\SOME\\FILE\\PATH.xlsx'
cache = pd.read_excel(filepth)
# CHANGE THE NUMBER IN range(<here>) TO ADD MORE WORKERS.
workers = (Worker(cache) for i in range(1))
loop = asyncio.get_event_loop()
loop.run_until_complete(_execute(workers, loop=loop))
...<MORE STUFF>...
The worker.perform_mission() method does the following (scroll to the bottom and look at _split_up_request_like_they_do_in_the_docs()):
class Worker(object):
def __init__(self, shared_cache):
...<MORE STUFF>...
async def perform_mission(self, verbose=False):
while not self.mission_complete:
if not self.company_name:
await self.find_company_name()
if verbose:
print('Obtained Company Name')
if self.company_name and not self.website:
print('Company Name populated but no website found yet.')
data = await self.call_bing() #<<<<< THIS IS SILENTLY FAILING.
if self.website and ok_to_set_website(self.shared_cache, self):
await self.try_set_results(data)
self.mission_complete = True
else:
print('{} worker failed at setting website.'.format(self.company_name))
pass
else:
print('{} worker failed at obtaining data from Bing.'.format(self.company_name))
pass
async def call_bing(self):
async with aiohttp.ClientSession() as sesh:
sesh.headers = self.headers
sesh.params = self.params
return await self._split_up_request_like_they_do_in_the_docs(sesh)
async def _split_up_request_like_they_do_in_the_docs(self, session):
print('_bing_request() successfully called.') #<<<THIS CATCHES
async with session.get(self.search_url) as resp:
print('Session.get() successfully called.') #<<<THIS DOES NOT.
return await resp.json()
And finally my output is:
Obtained Company Name
Company Name populated but no website found yet.
_bing_request() successfully called.
Process finished with exit code 0
Can anyone help me figure out why print('Session.get() successfully called.'), isn't triggering?...or maybe help me ask this question better?
Take a look at this part:
async def _execute(workers,*, loop=None):
# ...
[asyncio.ensure_future(i.perform_mission(verbose=True), loop=loop) for i in workers]
You create a bunch of tasks, but you don't await these tasks are finished. It means _execute itself will be done right after tasks are created, long before these tasks are finished. And since you run event loop until _execute done, it will stop shortly after start.
To fix this, use asyncio.gather to wait multiple awaitables are finished:
async def _execute(workers,*, loop=None):
# ...
tasks = [asyncio.ensure_future(i.perform_mission(verbose=True), loop=loop) for i in workers]
await asyncio.gather(*tasks)