Flask upload file, pass file to celery task - python

I am uploading file using flask rest-api and flask. As the file size is large I am using celery to upload the file on server. Below is the code.
Flask Rest API
#app.route('/upload',methods=['GET','POST'])
def upload():
file = request.files.get("file")
if not file:
return "some_error_msg"
elif file.filename == "":
return "some_error_msg"
if file:
filename = secure_filename(file.filename)
result = task_upload.apply_async(args=(filename, ABC, queue="upload")
return "some_task_id"
Celery task
#celery_app.task(bind=True)
def task_upload(self, filename: str, contents: Any) -> bool:
status = False
try:
status = save_file(filename, contents)
except exception as e:
print(f"Exception: {e}")
return status
Save method
def save_file(filename: str, contents: Any) -> bool:
file: Path = MEDIA_DIRPATH / filename
status: bool = False
# method-1 This code is using flask fileStorage, contents= is filestorage object
if contents:
contents.save(file)
status = True
# method-2 This code is using request.stream, contents= is IOBytes object
with open(file, "ab") as fp:
chunk = 4091
while True:
some code.
f.write(chunk)
status = True
return status
I am getting error while trying both methods
For Method-1, where I tried passing file variable(fileStorage type object) and getting error as
exc_info=(<class 'kombu.exceptions.EncodeError'>, EncodeError(TypeError('Object of type FileStorage is not JSON serializable'))
For Method-2, where I tried passing request.stream and getting error as
<gunicorn.http.body.Body object at some number>
TypeError: Object of type Body is not JSON serializable
How can I pass file(ABC) to celery task?
I am preferring method-1 but any will do. Please suggest.

You can use any way from the following like gevent, multiprocessing, multithreading, nginx upload module.
For my use-case thread was better fit. This is pseudo code structure.
class UploadWorker(Thread):
"""Create a upload worker background thread."""
def __init__(self, name: str, daemon: bool, filename: str, contents: str, read_length: int) -> None:
"""Initialize the defaults."""
self.filename: str = filename
self.contents: str = contents
self.read_length: int = read_length
self._kill: Event = Event()
super().__init__(name=name, daemon=daemon)
def run(self) -> None:
"""Run the target function."""
print(f"Thread is_set: {self._kill.is_set()=}")
while not self._kill.is_set():
save_file(self.filename, self.contents, self.read_length)
# Better copy the function code directly here instead of calling the function
def kill(self):
"""Revoke or abort the running thread."""
self._kill.set()
Then create object of background worker
upload_worker = UploadWorker("uploader", True, filename, contents, read_length)
To kill or cancel use
upload_worker.kill()
Reference Link Is there any way to kill a Thread?

Related

Handle files in media group using aiogram

I need a sustainable method to handle files wrapped in media group.
My function handle_files is waiting for media files. When user uploads media file it goes through the series of different checks. If it passes all tests (size restriction, formats restriction) the media file is downloaded and processed.
It looks like this:
async def handle_files(message: types.Message, state: FSMContext):
user_data = await state.get_data()
locale = user_data['locale']
list_of_files = user_data['list_of_files']
try:
file = message.document
file_name = file['file_name']
except Exception as e:
await message.answer('Error while downloading file')
return None
file_name = unidecode(file_name)
file_size = file['file_size']
if file_size >= 20971520:
await message.answer('File is too big')
return None
invalid_format, formats = check_invalid_format(file_name, function)
if invalid_format:
await message.answer(file_name + 'has unsupported format. Supported formats: ' + ', '.join(formats))
return None
output_folder = os.path.join('temp', str(message.from_user.id))
if not os.path.exists(output_folder):
os.makedirs(output_folder, exist_ok=True)
file_path = os.path.join(output_folder, file_name)
await file.download(destination_file=file_path)
list_of_files.append(file_path)
await state.update_data(list_of_files=list_of_files)
await message.answer('Added files: '.format('\n'.join(list_of_files)))
Working with separate files looks fine. After downloading files user gets a list of added files one by one.
But the problem is that when user uploads files in media group one file overrides another. It looks like that.
So, only one file is appending to the list list_of_files that prevents me from processing both files.
I tried to solve the problem by initializing user_data dictionary one more time:
...
await file.download(destination_file=file_path)
user_data = await state.get_data()
list_of_files = user_data['list_of_files']
list_of_files.append(file_path)
await state.update_data(list_of_files=list_of_files)
...
It solved one part of my problem but this solution is not elegant and, supposedly, not quite sustainable. The message is duplicated.
I need that after uploading media group user gets one message containing the list of all files from this media group.
I suppose that the problem here is linked with asyncio. But I've spent already a lot of time but the solution haven't been found. Looking for you help.
check this link.
Here you can find the middleware, which will return list of messages from media group, and then you could use for loop to handle them.
import asyncio
from typing import List, Union
from aiogram import Bot, Dispatcher, executor, types
from aiogram.dispatcher.handler import CancelHandler
from aiogram.dispatcher.middlewares import BaseMiddleware
bot = Bot(token="TOKEN_HERE") # Place your token here
dp = Dispatcher(bot)
class AlbumMiddleware(BaseMiddleware):
"""This middleware is for capturing media groups."""
album_data: dict = {}
def __init__(self, latency: Union[int, float] = 0.01):
"""
You can provide custom latency to make sure
albums are handled properly in highload.
"""
self.latency = latency
super().__init__()
async def on_process_message(self, message: types.Message, data: dict):
if not message.media_group_id:
return
try:
self.album_data[message.media_group_id].append(message)
raise CancelHandler() # Tell aiogram to cancel handler for this group element
except KeyError:
self.album_data[message.media_group_id] = [message]
await asyncio.sleep(self.latency)
message.conf["is_last"] = True
data["album"] = self.album_data[message.media_group_id]
async def on_post_process_message(self, message: types.Message, result: dict, data: dict):
"""Clean up after handling our album."""
if message.media_group_id and message.conf.get("is_last"):
del self.album_data[message.media_group_id]
#dp.message_handler(is_media_group=True, content_types=types.ContentType.ANY)
async def handle_albums(message: types.Message, album: List[types.Message]):
"""This handler will receive a complete album of any type."""
media_group = types.MediaGroup()
for obj in album:
if obj.photo:
file_id = obj.photo[-1].file_id
else:
file_id = obj[obj.content_type].file_id
try:
# We can also add a caption to each file by specifying `"caption": "text"`
media_group.attach({"media": file_id, "type": obj.content_type})
except ValueError:
return await message.answer("This type of album is not supported by aiogram.")
await message.answer_media_group(media_group)
if __name__ == "__main__":
dp.middleware.setup(AlbumMiddleware())
executor.start_polling(dp, skip_updates=True)

How to make query parameter take two types of inputs in fastapi?

I have a function like this,
async def predict(static: str = Form(...), file: UploadFile = File(...)):
return something
I have two parameters here, static and file and static is a string, file is buffer of uploaded file.
Now, is there a way to assign multiple types to one parameter, i.e., I want to make file parameter take either upload file or just string
Use Union type annotation
from fastapi import FastAPI, UploadFile, Form, File
from typing import Union
app = FastAPI()
#app.post(path="/")
async def predict(
static: str = Form(...),
file: Union[UploadFile, str] = File(...)
):
return {
"received_static": static,
"received_file": file if isinstance(file, str) else file.filename
}
AFAIK, OpenAPI schema doesn't support this config since it doesn't support multiple types. So, better to define multiple parameters to handle things differently. Also, IMHO, it is the better way
from fastapi import FastAPI, UploadFile, Form, File
app = FastAPI()
#app.post(path="/")
async def predict(
static: str = Form(...),
file: UploadFile = File(None), # making optional
file_str: str = Form(None) # making optional
):
return {
"received_static": static,
"received_file": file.filename if file else None,
"received_file_str": file_str if file_str else ''
}

How to return a file in flask from sqlalchemy database

I'm trying to create an app that will let me store files on a database and retrieve them (I know in most cases it's better not to store files on the database itself, but in this instance, that's not what I want to do). I can get the file (a jpg image) stored in the database with:
class File(db.Model):
id = db.Column(db.Integer, primary_key=True)
data = db.Column(db.LargeBinary, nullable=False)
def __init__(self, data):
self.data = data
#app.route("/file/add", methods=["POST"])
def add_file():
data = request.files.get("data")
record = File(data.read())
db.session.add(record)
db.session.commit()
Now how do I return the file?
#app.route("/file/get", methods=["GET"])
def get_file():
returned_file = db.session.query(File.data).first()
return # What goes here?
Some things I've tried (Many of which I didn't expect to work, but I think the error messages are helpful):
return returned_files
Gets me: TypeError: The view function did not return a valid response. The return type must be a string, dict, tuple, Response instance, or WSGI callable, but it was a File.
return jsonify(returned_file)
Gets me: TypeError: Object of type File is not JSON serializable
return send_file(returned_file, attachment_filename="Test.jpg")
Gets me: AttributeError: 'File' object has no attribute 'read'
Aha, I got it! I needed to send it as a buffered stream.
import io
return send_file(io.BytesIO(returned_file.data))

Accessing other form fields in a custom Django upload handler

I've written a custom Django file upload handler for my current project. It's a proof-of-concept which allows you to compute a hash of an uploaded file without storing that file on-disk. It's a proof of concept, to be sure, but if I can get it to work, I can get onto the real purpose of my work.
Essentially, here's what I have so far, which is working fine with one major exception:
from django.core.files.uploadhandler import *
from hashlib import sha256
from myproject.upload.files import MyProjectUploadedFile
class MyProjectUploadHandler(FileUploadHandler):
def __init__(self, *args, **kwargs):
super(MyProjectUploadHandler, self).__init__(*args, **kwargs)
def handle_raw_input(self, input_data, META, content_length, boundary,
encoding = None):
self.activated = True
def new_file(self, *args, **kwargs):
super(MyProjectUploadHandler, self).new_file(*args, **kwargs)
self.digester = sha256()
raise StopFutureHandlers()
def receive_data_chunk(self, raw_data, start):
self.digester.update(raw_data)
def file_complete(self, file_size):
return MyProjectUploadedFile(self.digester.hexdigest())
The custom upload handler works great. The hash is accurate and works without storing any of the uploaded file to disk and only uses 64kb of memory at any one time.
The only problem I'm having is that I need to access another field from the POST request before processing the file, a text salt input by the user. My form looks like this:
<form id="myForm" method="POST" enctype="multipart/form-data" action="/upload/">
<fieldset>
<input name="salt" type="text" placeholder="Salt">
<input name="uploadfile" type="file">
<input type="submit">
</fieldset>
</form>
The "salt" POST variable is only made available to me after the request has been processed and the file has been uploaded, which doesn't work for my use case. I can't seem to find a way to access this variable in any way, shape, or form in my upload handler.
Is there a way for me to access each multipart variable as it comes across instead of just accessing the filess which are uploaded?
My solution didn't come easy, but here it is:
class IntelligentUploadHandler(FileUploadHandler):
"""
An upload handler which overrides the default multipart parser to allow
simultaneous parsing of fields and files... intelligently. Subclass this
for real and true awesomeness.
"""
def __init__(self, *args, **kwargs):
super(IntelligentUploadHandler, self).__init__(*args, **kwargs)
def field_parsed(self, field_name, field_value):
"""
A callback method triggered when a non-file field has been parsed
successfully by the parser. Use this to listen for new fields being
parsed.
"""
pass
def handle_raw_input(self, input_data, META, content_length, boundary,
encoding = None):
"""
Parse the raw input from the HTTP request and split items into fields
and files, executing callback methods as necessary.
Shamelessly adapted and borrowed from django.http.multiparser.MultiPartParser.
"""
# following suit from the source class, this is imported here to avoid
# a potential circular import
from django.http import QueryDict
# create return values
self.POST = QueryDict('', mutable=True)
self.FILES = MultiValueDict()
# initialize the parser and stream
stream = LazyStream(ChunkIter(input_data, self.chunk_size))
# whether or not to signal a file-completion at the beginning of the loop.
old_field_name = None
counter = 0
try:
for item_type, meta_data, field_stream in Parser(stream, boundary):
if old_field_name:
# we run this test at the beginning of the next loop since
# we cannot be sure a file is complete until we hit the next
# boundary/part of the multipart content.
file_obj = self.file_complete(counter)
if file_obj:
# if we return a file object, add it to the files dict
self.FILES.appendlist(force_text(old_field_name, encoding,
errors='replace'), file_obj)
# wipe it out to prevent havoc
old_field_name = None
try:
disposition = meta_data['content-disposition'][1]
field_name = disposition['name'].strip()
except (KeyError, IndexError, AttributeError):
continue
transfer_encoding = meta_data.get('content-transfer-encoding')
if transfer_encoding is not None:
transfer_encoding = transfer_encoding[0].strip()
field_name = force_text(field_name, encoding, errors='replace')
if item_type == FIELD:
# this is a POST field
if transfer_encoding == "base64":
raw_data = field_stream.read()
try:
data = str(raw_data).decode('base64')
except:
data = raw_data
else:
data = field_stream.read()
self.POST.appendlist(field_name, force_text(data, encoding,
errors='replace'))
# trigger listener
self.field_parsed(field_name, self.POST.get(field_name))
elif item_type == FILE:
# this is a file
file_name = disposition.get('filename')
if not file_name:
continue
# transform the file name
file_name = force_text(file_name, encoding, errors='replace')
file_name = self.IE_sanitize(unescape_entities(file_name))
content_type = meta_data.get('content-type', ('',))[0].strip()
try:
charset = meta_data.get('content-type', (0, {}))[1].get('charset', None)
except:
charset = None
try:
file_content_length = int(meta_data.get('content-length')[0])
except (IndexError, TypeError, ValueError):
file_content_length = None
counter = 0
# now, do the important file stuff
try:
# alert on the new file
self.new_file(field_name, file_name, content_type,
file_content_length, charset)
# chubber-chunk it
for chunk in field_stream:
if transfer_encoding == "base64":
# base 64 decode it if need be
over_bytes = len(chunk) % 4
if over_bytes:
over_chunk = field_stream.read(4 - over_bytes)
chunk += over_chunk
try:
chunk = base64.b64decode(chunk)
except Exception as e:
# since this is anly a chunk, any error is an unfixable error
raise MultiPartParserError("Could not decode base64 data: %r" % e)
chunk_length = len(chunk)
self.receive_data_chunk(chunk, counter)
counter += chunk_length
# ... and we're done
except SkipFile:
# just eat the rest
exhaust(field_stream)
else:
# handle file upload completions on next iteration
old_field_name = field_name
except StopUpload as e:
# if we get a request to stop the upload, exhaust it if no con reset
if not e.connection_reset:
exhaust(input_data)
else:
# make sure that the request data is all fed
exhaust(input_data)
# signal the upload has been completed
self.upload_complete()
return self.POST, self.FILES
def IE_sanitize(self, filename):
"""Cleanup filename from Internet Explorer full paths."""
return filename and filename[filename.rfind("\\")+1:].strip()
Essentially, by subclassing this class, you can have a more... intelligent upload handler. Fields will be announced with the field_parsed method to subclasses, as I needed for my purposes.
I've reported this as a feature request to the Django team, hopefully this functionality becomes a part of the regular toolbox in Django, rather than monkey-patching the source code as done above.
Based on the code for FileUploadHandler, found here at line 62:
https://github.com/django/django/blob/master/django/core/files/uploadhandler.py
It looks like the request object is passed into the handler and stored as self.request
In that case you should be able to access the salt at any point in your upload handler by doing
salt = self.request.POST.get('salt')
Unless I'm misunderstanding your question.

flask+ftplib basic application

app = Flask(__name__)
#app.route("/")
def hello():
address="someserver"
global FTP
ftp = FTP(address)
ftp.login()
return ftp.retrlines("LIST")
if __name__ == "__main__":
app.run()
...this gives me a following output:
226-Options: -l 226 1 matches total
The question is - why does not this print the output of retrlines and how do I do so?
The documentation for the ftplib.FTP class says that retrlines takes an optional callback - if no callback is provided "The default callback prints the line to sys.stdout." This means that the method retrlines does not actually return the data provided - it simply passes each line as it receives it to a callable that may be passed to it. This leaves you with a couple of options:
Pass in a callable that can stores the results of being called multiple times:
def fetchlines(line=None):
if line is not None:
# As long as we are called with a line
# store the line in the array we added to this function
fetchlines.lines.append(line)
else:
# When we are called without a line
# we are retrieving the lines
# Truncate the array after copying it
# so we can re-use this function
lines = fetchlines.lines[:]
fetchlines.lines = []
return lines
fetchlines.lines = []
#app.route("/")
def hello():
ftp = FTP("someaddress")
ftp.login()
ftp.dir(fetchlines)
lines = fetchlines()
return "<br>".join(lines)
Replace sys.stdout with a file-like object (from cStringIO for example) and then simply read the file afterwards:
from cStringIO import StringIO
from sys import stdout
# Save a reference to stdout
STANDARD_OUT = stdout
#app.route("/")
def hello():
ftp = FTP("someaddress")
ftp.login()
# Change stdout to point to a file-like object rather than a terminal
file_like = StringIO()
stdout = file_like
ftp.dir()
# lines in this case will be a string, not a list
lines = file_like.getvalue()
stdout = STANDARD_OUT
file_like.close()
return lines
Neither of these techniques will hold up well under a lot of load - or even under any real concurrency. There are ways to solve for that, but I'll leave that for another day.

Categories