How to Upload to Google Drive Within an Asynchronous Method in Python

How to Upload to Google Drive Within an Asynchronous Method in Python - python

I'm currently building a Discord bot that uploads a file to Google Drive when a command is used. However, the command methods are Asynchronous and the files().create() method is Synchronous, and calling it simply causes the bot to get stuck.
#bot.command(pass_context = True)
#commands.has_role(name = 'Archivist')
async def archivechannel(ctx, channel : discord.Channel, filename):
await bot.say("Archiving....")
try:
with open("{}.txt".format(filename), "w") as openfile:
lines = []
async for message in bot.logs_from(channel, limit=500, reverse=True):
if not (message.author.bot or message.content.startswith("]")):
print ("<{}> {}#{}: {}".format(message.timestamp, message.author.name, message.author.discriminator, message.content))
lines.append("<{}> {}#{}: {}\n".format(message.timestamp, message.author.name, message.author.discriminator, message.content))
openfile.writelines(lines)
await bot.say("Archive Complete!")
except IOError:
await bot.say("Error: IOException")
await bot.say("Uploading....")
metadata = {'name' : "{}.txt".format(filename), 'mimetype' : 'application/vnd.google.apps.document', 'parents' : folderID}
media = MediaFileUpload('{}.txt'.format(filename), mimetype='text/plain')
res = service.files().create(body=metadata, media_body=media).execute()
print(res)
The line causing the problem is:
res = service.files().create(body=metadata, media_body=media).execute()
The bot just gets stuck after saying "Uploading...." and doesn't upload anything.
Does anyone know how I can fix this?
Edit: Using a ThreadPoolExecutor, nor a DefaultExecutor has worked, nor has setting up a synchronous function that runs the create and execute methods, taking in the metadata and media parameters
Edit 2: After doing some more screwing around, it turns out the problem is now in the following line:
media = MediaFileUpload('{}.txt'.format(filename), mimetype='text/plain')
However from my testing, for the question I asked, Patrick is correct and I have marked the question as answered.

You can run your blocking operation in another thread, while your asynchronous code waits for it to complete without blocking the event loop.
We'll create a new ThreadPoolExecutor, then use run_in_executor to use it to run the task.
from concurrent.futures import ThreadPoolExecutor
def upload_file(metadata, media):
return service.files().create(body=metadata, media_body=media).execute()
#bot.command(pass_context = True)
#commands.has_role(name = 'Archivist')
async def archivechannel(ctx, channel : discord.Channel, filename):
await bot.say("Archiving....")
try:
with open("{}.txt".format(filename), "w") as openfile:
lines = []
async for message in bot.logs_from(channel, limit=500, reverse=True):
if not (message.author.bot or message.content.startswith("]")):
print ("<{}> {}#{}: {}".format(message.timestamp, message.author.name, message.author.discriminator, message.content))
lines.append("<{}> {}#{}: {}\n".format(message.timestamp, message.author.name, message.author.discriminator, message.content))
openfile.writelines(lines)
await bot.say("Archive Complete!")
except IOError:
await bot.say("Error: IOException")
await bot.say("Uploading....")
metadata = {filename : "{}.txt".format(filename), 'mimetype' : 'application/vnd.google.apps.document', 'parents' : folderID}
media = MediaFileUpload('{}.txt'.format(filename), mimetype='text/plain')
with ThreadPoolExecutor() as pool:
res = await bot.loop.run_in_executor(
pool, upload_file, metadata, media
)
print(res)
You may also be able to use the default executor by removing the context manager and passing None instead of pool. I'm having trouble finding information about the default executor though, so you may want to experiment.

Related

Retry logic being ignored for server errors with Google Cloud Text-to-speech API

I'm trying to use Google Cloud's text-to-speech API. The problem I'm running into is that periodically the API returns a status of "500 Internal Server Error." The correct logic for these errors is usually to just retry the call. Unfortunately, I can't get any of Google Cloud's retry logic to work. As soon as I hit the exception my script exits.
My API function:
async def get_audio_from_google(input_text: str, output_file: str):
"""
Convert the provided text to audio using the Google text-to-speech API.
Args:
input_text: Text to conver to speech.
output_file: File path to write. File extension will be added automatically.
Returns: Writes the audio file to disk. Does not return a result.
"""
client = texttospeech.TextToSpeechAsyncClient()
# Create and configure the Synthesis object.
synthesis_input = texttospeech.SynthesisInput()
synthesis_input.text = input_text
voice_parameters = texttospeech.VoiceSelectionParams()
voice_parameters.language_code = VOICE_ENCODING
voice_parameters.name = VOICE
audio_parameters = texttospeech.AudioConfig()
if AUDIO_FORMAT == AudioFormat.MP3:
audio_parameters.audio_encoding = texttospeech.AudioEncoding.MP3
elif AUDIO_FORMAT == AudioFormat.OPUS:
audio_parameters.audio_encoding = texttospeech.AudioEncoding.OGG_OPUS
else:
print("Invalid audio format specified")
sys.exit(1)
logging.info(f"Synthesizing speech for {output_file}")
# Build our request.
request = texttospeech.SynthesizeSpeechRequest()
request.input = synthesis_input
request.voice = voice_parameters
request.audio_config = audio_parameters
# Get audio.
# Configure when to retry on error.
retry_object = retry.Retry(initial=5, timeout=90)
response = await client.synthesize_speech(request=request, retry=retry_object)
with open(f"{output_file}.{AUDIO_FORMAT}", "wb") as out:
# Write the response to the output file.
out.write(response.audio_content)
logging.info(f'Audio content written to file "{output_file}.{AUDIO_FORMAT}"')
TextToSpeechAsyncClient's synthesize_speech method accepts an instance of Retry, which is part of the Google Core API and can be used as a decorator or passed to some methods. Unfortunately, I can't seem to get the retry logic to work. By default it should retry on any error classed as transient, which includes Internal Server Error (error 500):
if_transient_error = if_exception_type(
exceptions.InternalServerError,
exceptions.TooManyRequests,
exceptions.ServiceUnavailable,
requests.exceptions.ConnectionError,
requests.exceptions.ChunkedEncodingError,
auth_exceptions.TransportError,
I've tried both passing retry to synthesize_speech and using it as a decorator for get_audio_from_google. In either case, as soon as my script gets an error response from the server, it exits.
How I'm calling get_audio_from_google:
def process_audio(text: List[str]):
"""
Process text asynchronously in segments and output a bunch of audio files ready for stitching.
Args:
text (List[str]): List of text snippets to process.
"""
async def gather_with_concurrency(max_tasks: int, *coroutines):
"""
Run tasks in parallel with a limit.
https://stackoverflow.com/questions/48483348/how-to-limit-concurrency-with-python-asyncio
Args:
max_tasks: Maximum number of tasks to run at once.
*coroutines: All async tasks that should be run.
"""
semaphore = asyncio.Semaphore(max_tasks)
async def sem_coro(coro):
async with semaphore:
return await coro
return await asyncio.gather(*(sem_coro(c) for c in coroutines))
async def main():
snippet_counter = 1
subtasks = []
for text_snippet in text:
snippet_filename = str(snippet_counter)
snippet_counter += 1
subtasks.append(
get_audio_from_google(input_text=text_snippet, output_file=snippet_filename)
)
await gather_with_concurrency(2, *subtasks)
logging.info("Starting audio processing tasks…")
# Begin execution.
asyncio.run(main())

how to queue fast api requests for subsequent processing without using message brokers?

There is a fast api service that receives an archive with files and a url for sending the result. Since speech recognition is a time-consuming process. Now I'm sending a request, waiting for it to process, return the result, and only then I can send the next request. It is necessary to receive a request for processing, return 200 that the process has started, after processing sends the result to the url, but during processing more requests may come and I need to store them somewhere and write them to the queue. And take requests from the queue. Of course, there are tools such as kafka, rabbitmq. But I wanted to do without them.There is an idea to use a queue from asyncio.Queue, but no idea how to implement it.
#app.post("/uprecognize", tags=["Upload and recognize"], status_code=status.HTTP_200_OK)
async def upload_recognize(
url_for_request: str,
background_tasks: BackgroundTasks,
file: UploadFile = File(...),
):
logger = logging.getLogger(__name__)
full_name = split_filename(file)
if not is_archive_file(file):
logger.error(f"File must be RAR or ZIP format")
return JSONResponse(content={'msg': 'File must be RAR or ZIP format'}, status_code=status.HTTP_400_BAD_REQUEST)
else:
start = time.time()
await save_file_to_uploads(file, full_name)
end = time.time()
if not os.path.exists(UPLOADED_FILES_PATH + '/' + os.path.splitext(full_name)[0]):
os.mkdir(UPLOADED_FILES_PATH + '/' + os.path.splitext(full_name)[0])
if os.path.exists(UPLOADED_FILES_PATH + '/' + full_name) and rarfile.is_rarfile(UPLOADED_FILES_PATH + '/' + full_name):
unrar_files(UPLOADED_FILES_PATH + '/' + full_name)
elif os.path.exists(UPLOADED_FILES_PATH + '/' + full_name) and zipfile.is_zipfile(UPLOADED_FILES_PATH + '/' + full_name):
unzip_files(UPLOADED_FILES_PATH + '/' + full_name)
else:
logger.error(f"File not found")
return JSONResponse(content={'msg': 'File not found'}, status_code=status.HTTP_404_NOT_FOUND)
background_tasks.add_task(recognition_wav, full_name, logger, model, url_for_request)
return JSONResponse(content={'msg':'Start recognition'},
status_code=status.HTTP_200_OK,
background=background_tasks)

If I've understood your question you are just looking for a queuing mechanism for receiving, buffering, and executing tasks that you can slot your own tasks into using asyncio.Queue.
This accepts and buffers actions in inq, taking 1 at a time and processing it via procq. This is a runnable snippet, the output should be self-explanatory.
Your i will no doubt be something larger, however given asyncio is a single threaded application shared variables are simple to handle and you might hand around a file name or a dictionary key for an object collection.
import asyncio
async def receive(inq, procq):
while True:
new = await inq.get()
print(f"receive and hand off {new}")
await procq.put(new)
async def process(procq):
while True:
proc = await procq.get()
print(f"process {proc}")
await asyncio.sleep(1) # process the action
async def create(inq):
for i in range(10):
print(f"create {i}")
await inq.put(i) # some pretend actions
pass # exit, would accept other sources in a real program
async def main():
inq = asyncio.Queue(5) # only hold up to five
procq = asyncio.Queue(1) # only process one at a time
await asyncio.gather(
process(procq),
receive(inq, procq),
create(inq)
)
if __name__ == "__main__":
asyncio.run(main())

what is the disadvantage of run asyncio.run multi times in Python code?

I'd like to embed some async code in my Python Project to make the http request part be asyncable . for example, I read params from Kafka, use this params to generate some urls and put the urls into a list. if the length of the list is greater than 1000, then I send this list to aiohttp to batch get the response.
I can not change the whole project from sync to async, so I could only change the http request part.
the code example is:
async def async_request(url):
async with aiohttp.ClientSession() as client:
resp = await client.get(url)
result = await resp.json()
return result
async def do_batch_request(url_list, result):
task_list = []
for url in url_list:
task = asyncio.create_task(async_request(url))
task_list.append(task)
batch_response = asyncio.gather(*task_list)
result.extend(batch_response)
def batch_request(url_list):
batch_response = []
asyncio.run(do_batch_request(url_list, batch_response))
return batch_response
url_list = []
for msg in kafka_consumer:
url = msg['url']
url_list.append(url)
if len(url_list) >= 1000:
batch_response = batch_request(url_list)
parse(batch_response)
....
As we know, asyncio.run will create an even loop to run the async function and then close the even loop. My problem is that, will my method influence the performance of the async code? And do you have some better way for my situation?

There's no serious problem with your approach and you'll get speed benefit from asyncio. Only possible problem here is that if later you'll want to do something async in other place in the code you'll not be able to do it concurrently with batch_request.
There's not much to do if you don't want to change the whole project from sync to async, but if in the future you'll want to run batch_request in parallel with something, keep in mind that you can run it in thread and wait for result asynchronously.

Asynchronous Download of Files

This downloads updated fasta files (protein sequences) from a database, I've gotten this to work faster using asyncio compared to requests, however I'm not convinced the downloads are actually happening asynchronously.
import os
import aiohttp
import aiofiles
import asyncio
folder = '~/base/fastas/proteomes/'
upos = {'UP000005640': 'Human_Homo_sapien',
'UP000002254': 'Dog_Boxer_Canis_Lupus_familiaris',
'UP000002311': 'Yeast_Saccharomyces_cerevisiae',
'UP000000589': 'Mouse_Mus_musculus',
'UP000006718': 'Monkey_Rhesus_macaque_Macaca_mulatta',
'UP000009130': 'Monkey_Cynomolgus_Macaca_fascicularis',
'UP000002494': 'Rat_Rattus_norvegicus',
'UP000000625': 'Escherichia_coli',
}
#https://www.uniprot.org/uniprot/?query=proteome:UP000005640&format=fasta Example link
startline = r'https://www.uniprot.org/uniprot/?query=proteome:'
endline = r'&format=fasta&include=False' #include is true to include isoforms, make false for only canonical sequences
async def fetch(session, link, folderlocation, name):
async with session.get(link, timeout=0) as response:
try:
file = await aiofiles.open(folderlocation, mode='w')
file.write(await response.text())
await file.close()
print(name, 'ended')
except FileNotFoundError:
loc = ''.join((r'/'.join((folderlocation.split('/')[:-1])), '/'))
command = ' '.join(('mkdir -p', loc))
os.system(command)
file = await aiofiles.open(folderlocation, mode='w')
file.write(await response.text())
await file.close()
print(name, 'ended')
async def rfunc():
async with aiohttp.ClientSession() as session:
for upo, name in upos.items():
print(name, 'started')
link = ''.join((startline, upo, endline))
folderlocation =''.join((folder, name, '.fasta'))
await fetch(session, link, folderlocation, name)
loop = asyncio.get_event_loop()
loop.run_until_complete(rfunc())
My output from running this:
In [5]: runfile('~/base/Fasta Proteome Updater.py')
Human_Homo_sapien started
Human_Homo_sapien ended
Dog_Boxer_Canis_Lupus_familiaris started
Dog_Boxer_Canis_Lupus_familiaris ended
Yeast_Saccharomyces_cerevisiae started
Yeast_Saccharomyces_cerevisiae ended
Mouse_Mus_musculus started
Mouse_Mus_musculus ended
Monkey_Rhesus_macaque_Macaca_mulatta started
Monkey_Rhesus_macaque_Macaca_mulatta ended
Monkey_Cynomolgus_Macaca_fascicularis started
Monkey_Cynomolgus_Macaca_fascicularis ended
Rat_Rattus_norvegicus started
Rat_Rattus_norvegicus ended
Escherichia_coli started
Escherichia_coli ended
The printed output seems to signify the downloads are happening one at a time, is there something wrong here?

You are looping the items to download and waiting (await) for each item to finish. To make them happen all at one time, you need to schedule all downloads for execution at once - e.g. using gather.
Then your code could look like this:
async def rfunc():
async with aiohttp.ClientSession() as session:
await gather(
*[
fetch(
session,
''.join((startline, upo, endline)),
''.join((folder, name, '.fasta')),
name,
) for upo, name in upos.items()
]
)
loop = asyncio.get_event_loop()
loop.run_until_complete(rfunc())

Too many open files error when using asyncio/pyppeteer

I'm trying to make requests with headless chrome using pyppeteer. But I keep getting "OSError: [Errno 24] Too many open files" after a certain amount of requests. I checked the open resources of the python process with losf and found out that with every new request there's a new line like the following
python3 14840 root 11r FIFO 0,8 0t0 64208510 pipe
Can someone tell me what resources aren't being closed? The code that's producing this error is below
def search(self, search_path):
async def main(url):
browser = await launch(args=['--no-sandbox'], handleSIGINT=False, handleSIGTERM=False, handleSIGHUP=False)
page = await browser.newPage()
await page.setJavaScriptEnabled(False)
try:
response = await page.goto(url, options={"timeout": 50000})
except pyppeteer.errors.TimeoutError:
pass
src = await page.content()
await browser.close()
return src
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
url = "https://www.example.com" + search_path
val = asyncio.get_event_loop().run_until_complete(main(url))
loop.close()
EDIT
I managed to close the open pipes by calling
browser.process.communicate()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to Upload to Google Drive Within an Asynchronous Method in Python - python

Related

Retry logic being ignored for server errors with Google Cloud Text-to-speech API

how to queue fast api requests for subsequent processing without using message brokers?

what is the disadvantage of run asyncio.run multi times in Python code?

Asynchronous Download of Files

Too many open files error when using asyncio/pyppeteer

Categories

Resources