I would like to know the best practice for accessing the underlying file object when uploading a file with fastAPI.
When uploading a file with fastapi, the object we get is a starlette.datastructures.UploadFile.
We can access the underlying file attribute which is a tempfile.SpooledTemporaryFile.
We can then access the underlying private _file attribute, and pass it to libraries that expect a file-like object.
Below is an example of two routes with python-docx and pdftotext:
#router.post("/open-docx")
async def open_docx(upload_file: UploadFile = File(...)):
mydoc = docx.Document(upload_file.file._file) #
# do something
#router.post("/open-pdf")
async def open_pdf(upload_file: UploadFile = File(...)):
mypdf = pdftotext.PDF(upload_file.file._file)
# do something
However, I don't like the idea of accessing the private _file attribute. Is there a better way to pass the file object without saving it first?
Note: to reproduce the example above, put the following code in launch.py and run uvicorn launch:app --reload:
import docx
import pdftotext
from fastapi import FastAPI, File, UploadFile
app = FastAPI()
#app.post("/open-docx")
async def open_docx(upload_file: UploadFile = File(...)):
mydoc = docx.Document(upload_file.file._file)
# do something
return {"firstparagraph": mydoc.paragraphs[0].text}
#app.post("/open-pdf")
async def open_pdf(upload_file: UploadFile = File(...)):
mypdf = pdftotext.PDF(upload_file.file._file)
# do something
return {"firstpage": mypdf[0]}
See documentation for UploadFile here:
https://fastapi.tiangolo.com/tutorial/request-files/#uploadfile
It says it has an attribute named
file: A SpooledTemporaryFile (a file-like object).
This is the actual Python file that you can pass directly to
other functions or libraries that expect a "file-like" object.
So you dont need to access the private ._file attribute.
Just pass the
upload_file.file
Note that this is untested. Per the docs you can use the SpooledTemporaryFile object without accessing its underlying wrapper:
The returned object is a file-like object whose _file attribute is either an io.BytesIO or io.TextIOWrapper object (depending on whether binary or text mode was specified) or a true file object, depending on whether rollover() has been called. This file-like object can be used in a with statement, just like a normal file.
https://docs.python.org/3/library/tempfile.html#tempfile.SpooledTemporaryFile (Emphasis added).
Thus it would seem you can just do:
#router.post("/open-docx")
async def open_docx(upload_file: UploadFile = File(...)):
with upload_file.file as f:
mydoc = docx.Document(f)
Avoid the with if you don't want to close the file.
Edit: Seekable files
SpooledTemporaryFiles are not properly seekable, sadly. See this question. It boils down to the fact that they roll over and land on the disk after a certain point, at which point seeking is harder. Accessing the _file attribute is not safe because of this rollover. Thus you either need to save them to disk, or to read them explicitly into a virtual (ram) file.
If you are on Linux, a workaround might be to save the files to /dev/shm or /tmp, the former of which is in ram, the latter often a ramdisk, and let the OS handle swapping massive files to disk.
Related
In python2 I have this in my test method:
mock_file = MagicMock(spec=file)
I'm moving to python3, and I can't figure out how to do a similar mock. I've tried:
from io import IOBase
mock_file = MagicMock(spec=IOBase)
mock_file = create_autospec(IOBase)
What am I missing?
IOBase does not implement crucial file methods such as read and write and is therefore usually unsuitable as a spec to create a mocked file object with. Depending on whether you want to mock a raw stream, a binary file or a text file, you can use RawIOBase, BufferedIOBase or TextIOBase as a spec instead:
from io import BufferedIOBase
mock_file = MagicMock(spec=BufferedIOBase)
I'd like to pass an instance of my class to write() and have it written to a file. The underlying data is simply a bytearray.
mine = MyClass()
with open('Test.txt', 'wb') as f:
f.write(mine)
I tried implementing __bytes__ to no avail. I'm aware of the buffer protocol but I believe it can only be implemented via the C API (though I did see talk of delegation to an underlying object that implemented the protocol).
No, you can't, there are no special methods you can implement that'll make your Python class support the buffer protocol.
Yes, the io.RawIOBase.write() and io.BufferedIOBase.write() methods document that they accept a bytes-like object, but the buffer protocol needed to make something bytes-like is a C-level protocol only. There is an open Python issue to add Python hooks but no progress has been made on this.
The __bytes__ special method is only called if you passed an object to the bytes() callable; .write() does not do this.
So, when writing to a file, only actual bytes-like objects are accepted, everything else must be converted to such an object first. I'd stick with:
with open('Test.txt', 'wb') as f:
f.write(bytes(mine))
which will call the MyClass.__bytes__() method, provided it is defined, or provide a method on your class that causes it to write itself to a file object:
with open('Test.txt', 'wb') as f:
mine.dump(f)
I am trying to unit test a method that reads the lines from a file and process it.
with open([file_name], 'r') as file_list:
for line in file_list:
# Do stuff
I tried several ways described on another questions but none of them seems to work for this case. I don't quite understand how python uses the file object as an iterable on the lines, it internally use file_list.readlines() ?
This way didn't work:
with mock.patch('[module_name].open') as mocked_open: # also tried with __builtin__ instead of module_name
mocked_open.return_value = 'line1\nline2'
I got an
AttributeError: __exit__
Maybe because the with statement have this special attribute to close the file?
This code makes file_list a MagicMock. How do I store data on this MagicMock to iterate over it ?
with mock.patch("__builtin__.open", mock.mock_open(read_data="data")) as mock_file:
Best regards
The return value of mock_open (until Python 3.7.1) doesn't provide a working __iter__ method, which may make it unsuitable for testing code that iterates over an open file object.
Instead, I recommend refactoring your code to take an already opened file-like object. That is, instead of
def some_method(file_name):
with open([file_name], 'r') as file_list:
for line in file_list:
# Do stuff
...
some_method(file_name)
write it as
def some_method(file_obj):
for line in file_obj:
# Do stuff
...
with open(file_name, 'r') as file_obj:
some_method(file_obj)
This turns a function that has to perform IO into a pure(r) function that simply iterates over any file-like object. To test it, you don't need to mock open or hit the file system in any way; just create a StringIO object to use as the argument:
def test_it(self):
f = StringIO.StringIO("line1\nline2\n")
some_method(f)
(If you still feel the need to write and test a wrapper like
def some_wrapper(file_name):
with open(file_name, 'r') as file_obj:
some_method(file_obj)
note that you don't need the mocked open to do anything in particular. You test some_method separately, so the only thing you need to do to test some_wrapper is verify that the return value of open is passed to some_method. open, in this case, can be a plain old mock with no special behavior.)
I'm working with python-gnupg to decrypt a file and the decrypted file content is very large so loading the entire contents into memory is not feasible.
I would like to short-circuit the write method in order to to manipulate the decrypted contents as it is written.
Here are some failed attempts:
import gpg
from StringIO import StringIO
# works but not feasible due to memory limitations
decrypted_data = gpg_client.decrypt_file(decrypted_data)
# works but no access to the buffer write method
gpg_client.decrypt_file(decrypted_data, output=buffer())
# fails with TypeError: coercing to Unicode: need string or buffer, instance found
class TestBuffer:
def __init__(self):
self.buffer = StringIO()
def write(self, data):
print('writing')
self.buffer.write(data)
gpg_client.decrypt_file(decrypted_data, output=TestBuffer())
Can anyone think of any other ideas that would allow me to create a file-like str or buffer object to output the data to?
You can implement a subclass of one of the classes in the io module described in I/O Base Classes, presumably io.BufferedIOBase. The standard library contains an example of something quite similar in the form of the zipfile.ZipExtFile class. At least this way, you won't have to implement complex functions like readline yourself.
I have written a small web application, and with each request I should open and read a JSON file. I am using pickledb for this purpose.
What concerns me about it, is that the library passes open() as a parameter for the json.load() function . So it got me thinking ..
When I write code like this:
with open("filename.json", "rb") as json_data:
my_data = json.load(json_data)
or
json_data = open("filename.json", "rb")
my_data = json.load(json_data)
json_data.close()
I am pretty sure that the file handle is being closed.
But when I open it this way :
my_data = json.load(open("filename.json", "rb"))
The docs say that json.load() is expecting a .read()-supporting file-like object containing a JSON document.
So the question is, will this handle stay open and eat more memory with each request? Who is responsible for closing the handle in that case?
Close method of the file will be called when object is destroyed, as json.load expects only read method on input object.
What happens depends on garbage collection implementation then. You can read more in Is explicitly closing files important?
Generally speaking it's a good practice to take care of closing the file.
I tried to somehow fake file-like object with read() and close() methods, and stick it into json.load(). Then I observed, that close() is not being called upon leaving context. Hence, I would recommend to close the file object explicitly. Anyway, doc says that the loading method expects read() method, but does not say it expects close() method on the object.
In test.json:
{ "test":0 }
In test.py:
import json
class myf:
def __init__(self):
self.f = None
#staticmethod
def open(path, mode):
obj = myf()
obj.f = open(path, mode)
return obj
def read(self):
print ("READING")
return self.f.read()
def close(self):
print ("CLOSING")
return self.f.close()
def mytest():
s = json.load(myf.open("test.json","r"))
print (s)
mytest()
print("DONE")
Output:
$> python test.py
READING
{u'test': 0}
DONE
$>