How can I upload files through aiohttp using response from get request?

How can I upload files through aiohttp using response from get request? - python

To start off, I am writing an async wrapper for the WordPress REST API. I have a Wordpress site hosted on Bluehost. I am working with the endpoint for media (image) uploads. I have successfully managed to upload an image but there are 2 changes I would like to make. The second change is what I really want, but out of curiosity, I would like to know how to implement change 1 too. I'll provide the code first and then some details.
Working code
async def upload_local_pic2(self, local_url, date, title):
url = f'{self.base_url}/wp-json/wp/v2/media'
with aiohttp.MultipartWriter() as mpwriter:
json = {'title': title, 'status':'publish'}
mpwriter.append_json(json)
with open(local_url, 'rb') as f:
print(f)
payload = mpwriter.append(f)
async with self.session.post(url, data=payload) as response:
x = await response.read()
print(x)
Change 1
The first change is uploading using aiofiles.open() instead of just using open() as I expect to be processing lots of files. The following code does not work.
async def upload_local_pic(self, local_url, date, title):
url = f'{self.base_url}/wp-json/wp/v2/media'
with aiohttp.MultipartWriter() as mpwriter:
json = {'title': title, 'status':'publish'}
mpwriter.append_json(json)
async with aiofiles.open(local_url, 'rb') as f:
print(f)
payload = mpwriter.append(f)
async with self.session.post(url, data=payload) as response:
x = await response.read()
print(x)
Change 2
My other change is that I would like to have another function that can upload the files directly to the WordPress server without downloading them locally. So instead of getting a local picture, I want to pass in the url of an image online. The following code also does not work.
async def upload_pic(self, image_url, date, title):
url = f'{self.base_url}/wp-json/wp/v2/media'
with aiohttp.MultipartWriter() as mpwriter:
json = {'title':title, 'status':'publish'}
mpwriter.append_json(json)
async with self.session.get(image_url) as image_response:
image_content = image_response.content
print(image_content)
payload = mpwriter.append(image_content)
async with self.session.post(url, data = payload) as response:
x = await response.read()
print(x)
Details/Debugging
I'm trying to figure out why each one won't work. I think the key is the calls to print(image_content)
and print(f) that show what exactly I am inputting to mpwriter.append
In the example that works where I just use the standard Python open() function, I am apparently passing in <_io.BufferedReader name='/redactedfilepath/index.jpeg'>
In the change 1 example with aiofile, I am passing in <aiofiles.threadpool.binary.AsyncBufferedReader object at 0x7fb803122250>
Wordpress will return this html:
b'<head><title>Not Acceptable!</title></head><body><h1>Not Acceptable!</h1><p>An appropriate representation of the requested resource could not be found on this server. This error was generated by Mod_Security.</p></body></html>'
And finally, in change 2 where I try to pass in what the get request to the url gives me I get
<StreamReader 292 bytes>. The response returned by WordPress is the same as above with Mod Security.
Any idea how I can make these examples work? It seems like they are all some type of io reader but I guess the underlying aiohttp code treats them differently.
Also this shouldn't really matter, but this is the url I am passing into the change 2 example.

Ok, so I figured out both changes.
For the first change when trying to read a file with aiofiles, I need to just read the whole file instead of passing in the file handler. Also, I need to set the content disposition manually.
async def upload_local_pic(self, local_url, date, title):
url = f'{self.base_url}/wp-json/wp/v2/media'
with aiohttp.MultipartWriter() as mpwriter:
json = {'status':'publish'}
mpwriter.append_json(json)
async with aiofiles.open(local_url, mode='rb') as f:
contents = await f.read()
payload = mpwriter.append(contents)
payload.set_content_disposition('attachment', filename= title+'.jpg')
async with self.session.post(url, data=payload) as response:
x = await response.read()
print(x)
For the second change, it's a similar concept with just uploading a file directly from the URL. Instead of passing in the handler that will read the content, I need to read the entire content first. I also need to set the content-disposition manually.
async def upload_pic(self, image_url, date, title):
url = f'{self.base_url}/wp-json/wp/v2/media'
with aiohttp.MultipartWriter() as mpwriter:
json = {'status':'publish'}
mpwriter.append_json(json)
async with self.session.get(image_url) as image_response:
image_content = await image_response.read()
payload = mpwriter.append(image_content)
payload.set_content_disposition('attachment', filename=title+'.jpg')
async with self.session.post(url, data = payload) as response:
x = await response.read()
print(x)

I will answer only to the title of the post (and not the questions that are in between).
The following code should give a short example of how to upload a file from URL#1 to URL#2 (without the need to download the file to the local machine and only then do the upload).
I will give two examples here:
Read all the content of the file into the memory (not downloading). This is of-course not so good when working with huge files...
Read and send the file in chunks (so we won't read all the file content at once).
Example #1: Reading all file content AT ONCE and uploading
import asyncio
import aiohttp
async def http_upload_from_url(src, dst):
async with aiohttp.ClientSession() as session:
src_resp = await session.get(src)
#print(src_resp)
dst_resp = await session.post(dst, data=src_resp.content)
#print(dst_resp)
try:
asyncio.run(http_upload_from_url(SRC_URL, DST_URL))
except Exception as e:
print(e)
Example #2: Reading file content IN CHUNKS and uploading
import asyncio
import aiohttp
async def url_sender(url=None, chunk_size=65536):
async with aiohttp.ClientSession() as session:
resp = await session.get(url)
#print(resp)
async for chunk in resp.content.iter_chunked(chunk_size):
#print(f"send chunk with size {len(chunk)}")
yield chunk
async def chunked_http_upload_from_url(src, dst):
async with aiohttp.ClientSession() as session:
resp = await session.post(dst, data=url_sender(src))
#print(resp)
#print(await resp.text())
try:
asyncio.run(chunked_http_upload_from_url(SRC_URL, DST_URL))
except Exception as e:
print(e)
Some notes:
You need to define SRC_URL and DST_URL.
I've only added the prints for debug (in-case you don't get a [200 OK] response).

Related

Upload multipart-form image as data in FastAPI

I am trying to write endpoint for image uploading, it will accept just image in base64 format. I have read how to get file from request in FastAPI and implement that in this way:
server side
#router.post("/v1/installation/{some_identifier}/image")
#db.create_connection
async def upload_installation_image(some_identifier: int, data: UploadFile = File(...)):
...
content = await data.read()
image_uuid = await save_image(installation.uuid, content, data.content_type)
return {"image_uuid": image_uuid}
And its works fine, but in this case I need to send data in this way:
client side
def test_upload_image(...):
...
response = session.post(
f"/v1/installation/{relink_serial}/image", files={"file": ("test_file_name.png", data, "image/png")}
)
...
But I want to be able to upload image like this:
client side
def test_upload_image(...):
...
response = session.post(
f"/v1/installation/{installation_uuid}/image", data=data, headers={"content-type": "image/png"}
)
I have read a lot of articles and other questions, but all of them suggest to use UploadFile and send data as json with file or body parameter.
Is this even possible, to load just 1 file like in second test I show?

I have find solution. Instead of using predefined FastAPI types, I can directly get all data from request object, so my code looks like this:
#router.post("/v1/installation/{some_identifier}/image")
#db.create_connection
async def upload_installation_image(some_identifier: int, request: Request):
...
content = await request.body()
image_uuid = await save_image(installation.uuid, content, content_type)
return {"image_uuid": image_uuid}
Thats exactly what I am looking for, because in my case client app have send to me whole file in body.

Python-Aiohttp/Asyncio API request returning ContentTypeError - JSON with unexpected mimetype, but not always

I am attempting to make an API request, pull down specific chunks of the response and ultimately save it into a file for later processing. I also first want to mention that the script works full, until I begin to pull larger sets of data.
When I open the params to a larger date range, I receive:
ContentTypeError(
aiohttp.client_exceptions.ContentTypeError: 0, message='Attempt to decode JSON with unexpected mimetype: text/html'
async def get_dataset(session, url):
async with session.get(url=url, headers=headers, params=params) as resp:
dataset = await resp.json()
return dataset['time_entries']
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for page in range(1, total_pages):
url = "https://api.harvestapp.com/v2/time_entries?page=" + str(page)
tasks.append(asyncio.ensure_future(get_dataset(session, url)))
dataset = await asyncio.gather(*tasks)
If I keep my params small enough, then it works without issue. But too large of a date range and the error pops up, and anything past the snippet I shared above does not run
More for reference:
url_address = "https://api.harvestapp.com/v2/time_entries/"
headers = {
"Content-Type": 'application/json',
"Authorization": authToken,
"Harvest-Account-ID": accountID
}
params = {
"from": StartDate,
"to": EndDate
}
Any ideas on what would cause this to work on certain data sizes but fail on larger sets? I am assuming the JSON is becoming malformed at some point, but I am unsure of how to examine that and/or prevent it from happening, since I am able to pull multiple pages from the API and successfully appending on the smaller data pulls.

OP: Thank you to the others who gave answers. I discovered the issue and implemented a solution. A friend pointed out that aiohttp can return that error message if the response is of an error page instead of the expected json content i.e. a html page giving a 429 HTTP too many requests. I looked up the API limits and found they do have it set to 100 requests per 15 seconds.
My solution was to implement the asyncio-throttle module which allowed me to directly limit the requests and time period. You can find this on the devs GitHub
Here is my updated code with the implementation, very simple! For my instance I needed to limit my requests to 100 per 15 seconds which you can see below as well.
async def get_dataset(session, url, throttler):
while True:
async with throttler:
async with session.get(url=url, headers=headers, params=params) as resp:
dataset = await resp.json()
return dataset['time_entries']
async def main():
tasks = []
throttler = Throttler(rate_limit=100, period=15)
async with aiohttp.ClientSession() as session:
try:
for page in range(1, total_pages):
url = "https://api.harvestapp.com/v2/time_entries?page=" + str(page)
tasks.append(asyncio.ensure_future(get_dataset(session, url, throttler)))
dataset = await asyncio.gather(*tasks)

FastAPI UploadFile is slow compared to Flask

I have created an endpoint, as shown below:
#app.post("/report/upload")
def create_upload_files(files: UploadFile = File(...)):
try:
with open(files.filename,'wb+') as wf:
wf.write(file.file.read())
wf.close()
except Exception as e:
return {"error": e.__str__()}
It is launched with uvicorn:
../venv/bin/uvicorn test_upload:app --host=0.0.0.0 --port=5000 --reload
I am performing some tests of uploading a file of around 100 MB using Python requests, and takes around 128 seconds:
f = open(sys.argv[1],"rb").read()
hex_convert = binascii.hexlify(f)
items = {"files": hex_convert.decode()}
start = time.time()
r = requests.post("http://192.168.0.90:5000/report/upload",files=items)
end = time.time() - start
print(end)
I tested the same upload script with an API endpoint using Flask and takes around 0.5 seconds:
from flask import Flask, render_template, request
app = Flask(__name__)
#app.route('/uploader', methods = ['GET', 'POST'])
def upload_file():
if request.method == 'POST':
f = request.files['file']
f.save(f.filename)
return 'file uploaded successfully'
if __name__ == '__main__':
app.run(host="192.168.0.90",port=9000)
Is there anything I am doing wrong?

You can write the file(s) using synchronous writing , after defining the endpoint with def, as shown in this answer, or using asynchronous writing (utilising aiofiles), after defining the endpoint with async def; UploadFile methods are async methods, and thus, you need to await them. Example is given below. For more details on def vs async def and how they may affect your API's performance (depending on the tasks performed inside the endpoints), please have a look at this answer.
Upload Single File
app.py
from fastapi import File, UploadFile
import aiofiles
#app.post("/upload")
async def upload(file: UploadFile = File(...)):
try:
contents = await file.read()
async with aiofiles.open(file.filename, 'wb') as f:
await f.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
await file.close()
return {"message": f"Successfuly uploaded {file.filename}"}
Read the File in chunks
Or, you can use async in the chunked manner, to avoid loading the entire file into memory. If, for example, you have 8GB of RAM, you can’t load a 50GB file (not to mention that the available RAM will always be less than the total amount that is installed, as the native OS and other applications running on your machine will use some of the RAM). Hence, in that case, you should rather load the file into memory in chunks and process the data one chunk at a time. This method, however, may take longer to complete, depending on the chunk size you choose; below, that is 1024 * 1024 bytes (= 1MB). You can adjust the chunk size as desired.
from fastapi import File, UploadFile
import aiofiles
#app.post("/upload")
async def upload(file: UploadFile = File(...)):
try:
async with aiofiles.open(file.filename, 'wb') as f:
while contents := await file.read(1024 * 1024):
await f.write(contents)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
await file.close()
return {"message": f"Successfuly uploaded {file.filename}"}
Alternatively, you could use shutil.copyfileobj(), which is used to copy the contents of a file-like object to another file-like object (see this answer as well). By default the data is read in chunks with the default buffer (chunk) size being 1MB (i.e., 1024 * 1024 bytes) for Windows and 64KB for other platforms (see source code here). You can specify the buffer size by passing the optional length parameter. Note: If negative length value is passed, the entire contents of the file will be read—see f.read() documentation as well, which .copyfileobj() uses under the hood. The source code of .copyfileobj() can be found here—there isn't really anything that different from the previous approach in reading/writing the file contents. However, .copyfileobj() uses blocking I/O operations behind the scenes, and this would result in blocking the entire server (if used inside an async def endpoint). Thus, to avoid that , you could use Starlette's run_in_threadpool() to run all the needed functions in a separate thread (that is then awaited) to ensure that the main thread (where coroutines are run) does not get blocked. The same exact function is used by FastAPI internally when you call the async methods of the UploadFile object, i.e., .write(), .read(), .close(), etc.—see source code here. Example:
from fastapi import File, UploadFile
from fastapi.concurrency import run_in_threadpool
import shutil
#app.post("/upload")
async def upload(file: UploadFile = File(...)):
try:
f = await run_in_threadpool(open, file.filename, 'wb')
await run_in_threadpool(shutil.copyfileobj, file.file, f)
except Exception:
return {"message": "There was an error uploading the file"}
finally:
if 'f' in locals(): await run_in_threadpool(f.close)
await file.close()
return {"message": f"Successfuly uploaded {file.filename}"}
test.py
import requests
url = 'http://127.0.0.1:8000/upload'
file = {'file': open('images/1.png', 'rb')}
resp = requests.post(url=url, files=file)
print(resp.json())
Upload Multiple Files
app.py
from fastapi import File, UploadFile
import aiofiles
#app.post("/upload")
async def upload(files: List[UploadFile] = File(...)):
for file in files:
try:
contents = await file.read()
async with aiofiles.open(file.filename, 'wb') as f:
await f.write(contents)
except Exception:
return {"message": "There was an error uploading the file(s)"}
finally:
await file.close()
return {"message": f"Successfuly uploaded {[file.filename for file in files]}"}
Read the Files in chunks
To read the file(s) in chunks instead, see the approaches described earlier in this answer.
test.py
import requests
url = 'http://127.0.0.1:8000/upload'
files = [('files', open('images/1.png', 'rb')), ('files', open('images/2.png', 'rb'))]
resp = requests.post(url=url, files=files)
print(resp.json())
Update
Digging into the source code, it seems that the latest versions of Starlette (which FastAPI uses underneath) use a SpooledTemporaryFile (for UploadFile data structure) with max_size attribute set to 1MB (1024 * 1024 bytes) - see here - in contrast to older versions where max_size was set to the default value, i.e., 0 bytes, such as the one here.
The above means, in the past, data used to be fully loaded into memory no matter the size of file (which could lead to issues when a file couldn't fit into RAM), whereas, in the latest version, data is spooled in memory until the file size exceeds max_size (i.e., 1MB), at which point the contents are written to disk; more specifically, to the OS's temporary directory (Note: this also means that the maximum size of file you can upload is bound by the storage available to the system's temporary directory.. If enough storage (for your needs) is available on your system, there's nothing to worry about; otherwise, please have a look at this answer on how to change the default temporary directory). Thus, the process of writing the file multiple times—that is, initially loading the data into RAM, then, if the data exceeds 1MB in size, writing the file to temporary directory, then reading the file from temporary directory (using file.read()) and finally, writing the file to a permanent directory—is what makes uploading file slow compared to using Flask framework, as OP noted in their question (though, the difference in time is not that big, but just a few seconds, depending on the size of file).
Solution
The solution (if one needs to upload files quite larger than 1MB and uploading time is important to them) would be to access the request body as a stream. As per Starlette documentation, if you access .stream(), then the byte chunks are provided without storing the entire body to memory (and later to temporary directory, if the body contains file data that exceeds 1MB). Example is given below, where time of uploading is recorded on client side, and which ends up being the same as when using Flask framework with the example given in OP's question.
app.py
from fastapi import Request
import aiofiles
#app.post('/upload')
async def upload(request: Request):
try:
filename = request.headers['filename']
async with aiofiles.open(filename, 'wb') as f:
async for chunk in request.stream():
await f.write(chunk)
except Exception:
return {"message": "There was an error uploading the file"}
return {"message": f"Successfuly uploaded {filename}"}
In case your application does not require saving the file to disk, and all you need is the file to be loaded directly into memory, you can just use the below (make sure your RAM has enough space available to accommodate the accumulated data):
from fastapi import Request
#app.post('/upload')
async def upload(request: Request):
body = b''
try:
filename = request.headers['filename']
async for chunk in request.stream():
body += chunk
except Exception:
return {"message": "There was an error uploading the file"}
#print(body.decode())
return {"message": f"Successfuly uploaded {filename}"}
test.py
import requests
import time
with open("images/1.png", "rb") as f:
data = f.read()
url = 'http://127.0.0.1:8000/upload'
headers = {'filename': '1.png'}
start = time.time()
resp = requests.post(url=url, data=data, headers=headers)
end = time.time() - start
print(f'Elapsed time is {end} seconds.', '\n')
print(resp.json())
For more details and code examples (on uploading multiple Files and Form/JSON data) using the approach above, please have a look at this answer.

Does aiofile write and read in background avoid blocking the executing thread?

I have recently worked with python and I am unsure about asyncio. The program requests a url, then parses a tag for each page and finally writes this to a local file. The program uses the aiofile library to write the tags into the file. I read that aiofile allows one to create an asynchronous file and use its methods like coroutines. Does this mean that while I write my tags into the local file in the background, I can continue to execute other tasks (like request other urls and parse those have been fetched) without having to wait while all tags are written in the local file?
Here is part of the code:
async def fetch():
async def parse():
async def write_one(file: IO, url: str, **kwargs) -> None:
"""Write the found HREFs from `url` to `file`."""
res = await parse(url=url, **kwargs)
if not res:
return None
async with aiofiles.open(file, "a") as f:
for p in res:
await f.write(f"{url}\t{p}\n")
logger.info("Wrote results for source URL: %s", url)

Unable to get the right file name using multipart headers

I'm working on a kind of personal cloud project which allows to upload and download files (with a google like search feature). The backend is written in python (using aiohttp) while I'm using a react.js website as a client. When a file is uploaded, the backend stores it on the filesystem and renames it with its sha256 hash. The original name and a few other metadata are stored next to it (description, etc). When the user downloads the file, I'm serving it using multipart and I want the user to get it with the original name and not the hash, indeed my-cool-image.png is more user friendly than a14e0414-b84c-4d7b-b0d4-49619b9edd8a. But I'm not able to do it (whatever I try, the download file is called with the hash).
Here is my code:
async def download(self, request):
if not request.username:
raise exceptions.Unauthorized("A valid token is needed")
data = await request.post()
hash = data["hash"]
file_path = storage.get_file(hash)
dotfile_path = storage.get_file("." + hash)
if not os.path.exists(file_path) or not os.path.exists(dotfile_path):
raise exceptions.NotFound("file <{}> does not exist".format(hash))
with open(dotfile_path) as dotfile:
dotfile_content = json.load(dotfile)
name = dotfile_content["name"]
headers = {
"Content-Type": "application/octet-stream; charset=binary",
"Content-Disposition": "attachment; filename*=UTF-8''{}".format(
urllib.parse.quote(name, safe="")
),
}
return web.Response(body=self._file_sender(file_path), headers=headers)
Here is what it looks like (according to the browser):
Seems right, yet it's not working.
One thing I want to make clear though: sometimes I get a warning (on the client side) saying Resource interpreted as Document but transferred with MIME type application/octet-stream. I don't know the MIME type of the files since they are provided by the users, but I tried to use image/png (I did my test with a png image stored on the server). The file didn't get downloaded (it was displayed in the browser, which is not something that I want) and the file name was still its hash so it didn't help with my issue.
Here is the entire source codes of the backend:
https://git.io/nexmind-node
And of the frontend:
https://git.io/nexmind-client
EDIT:
I received a first answer by Julien Castiaux so I tried to implement it, even though it looks better, it doesn't solve my issue (I still have the exact same behaviour):
async def download(self, request):
if not request.username:
raise exceptions.Unauthorized("A valid token is needed")
data = await request.post()
hash = data["hash"]
file_path = storage.get_file(hash)
dotfile_path = storage.get_file("." + hash)
if not os.path.exists(file_path) or not os.path.exists(dotfile_path):
raise exceptions.NotFound("file <{}> does not exist".format(hash))
with open(dotfile_path) as dotfile:
dotfile_content = json.load(dotfile)
name = dotfile_content["name"]
response = web.StreamResponse()
response.headers['Content-Type'] = 'application/octet-stream'
response.headers['Content-Disposition'] = "attachment; filename*=UTF-8''{}".format(
urllib.parse.quote(name, safe="") # replace with the filename
)
response.enable_chunked_encoding()
await response.prepare(request)
with open(file_path, 'rb') as fd: # replace with the path
for chunk in iter(lambda: fd.read(1024), b""):
await response.write(chunk)
await response.write_eof()
return response

From aiohttp3 documentation
StreamResponse is intended for streaming data, while Response contains HTTP BODY as an attribute and sends own content as single piece with the correct Content-Length HTTP header.
You rather use aiohttp.web.StreamResponse as you are sending (potentially very large) files. Using StreamResponse, you have full control of the outgoing http response stream : headers manipulation (including the filename) and chunked encoding.
from aiohttp import web
import urllib.parse
async def download(req):
resp = web.StreamResponse()
resp.headers['Content-Type'] = 'application/octet-stream'
resp.headers['Content-Disposition'] = "attachment; filename*=UTF-8''{}".format(
urllib.parse.quote(filename, safe="") # replace with the filename
)
resp.enable_chunked_encoding()
await resp.prepare(req)
with open(path_to_the_file, 'rb') as fd: # replace with the path
for chunk in iter(lambda: fd.read(1024), b""):
await resp.write(chunk)
await resp.write_eof()
return resp
app = web.Application()
app.add_routes([web.get('/', download)])
web.run_app(app)
Hope it helps !

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I upload files through aiohttp using response from get request? - python

Related

Upload multipart-form image as data in FastAPI

Python-Aiohttp/Asyncio API request returning ContentTypeError - JSON with unexpected mimetype, but not always

FastAPI UploadFile is slow compared to Flask

Does aiofile write and read in background avoid blocking the executing thread?

Unable to get the right file name using multipart headers

Categories

Resources