I am trying to read-in a file from a Python request, form data. All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body. I see many examples out there like: if 'filename' in request.files:, however this never works for me. I know that the file does in fact live within the ImmutableMultiDict type. Here is my working code example:
if 'my_file.xls' in request.files:
# do something
else:
# return error
if 'file' in request.files:
This is looking for the field name 'file' which corresponds to the name attribute you set in the form:
<input type='file' name='file'>
You then need to do something like this to assign the FileStorage object to the variable mem:
mem = request.files['file']
See my recent answer for more details of how and why.
You can then access the filename itself with:
mem.filename # should give back 'my_file.xls'
To actually read the stream data:
mem.read()
The official flask docs have further info on this, and how to save to disk with secure_filename() etc. Probably worth a read.
All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body.
If you actually want to read the contents of that Excel file, then you'll need to use a library which has compatibility for this such as xlrd. this answer demonstrates how to open a workbook, passing it as a stream. Note that they have used fileobj as the variable name, instead of mem.
Related
I have designed a webpage that allows the user to upload a zip file. What I want to do is store this zip file directly into my sqlite database as a large binary object, then be able to read this binary object as a zipfile using the zipfile package. Unfortunately this doesn't work because attempting to pass the file as a binary string in io.BytesIO into zipfile.ZipFile gives the error detailed in the title.
For my MWE, I exclude the database to better demonstrate my issue.
views = Blueprint('views', __name__)
#views.route("/upload", methods=["GET", "SET"])
def upload():
# Assume that file in request is a zip file (checked already)
f = request.files['file']
zip_content = f.read()
# Store in database
# ...
# at some point retrieve the file from database
archive = zipfile.ZipFile(io.BytesIO(zip_content))
return ""
I have searched for days on-end how to fix this issue without success. I have even printed out zip_content and the contents of io.BytesIO(zip_content) after applying .read() and they are exactly the same string.
What am I doing wrong?
Solved. Using f.read() only gets the name of the zip file. I needed to use f.getvalue() instead to get the full file contents.
I'm trying to build an AWS lambda function that accepts a file upload and then parses it in memory. The file is an xlsx file, and the content comes in to the lambda function looking like this in the body key of the event:
Beginning:
----------------------------300017151060007960655534
Content-Disposition: form-data; name="tag_list"; filename="test-list.xlsx"
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PK
�y�N docProps/PK
And the end of the string looks like this:
[Content_Types].xmlPK�;
----------------------------475068475306850797919587--
If I do a head/tail of the actual file on my computer, it appears that the file starts at the PK and ends at the xmlPK�;. I've attempted to slice this section out and create a BytesIO object or a SpooledTemporaryFile, but none of these options work. They all give me something like invalid seek position, or bad zip file errors.
My goal is to load this xlsx file into memory and then parse it using openpyxl.
My current function looks a little something like this currently. I keep trying to format it differently, sometimes I decode it, sometimes not.
def lambda_handler(event, context):
file_index = event['body'].index('PK')
file_string = event['body'][file_index:]
file_end = file_string.index(';')
file = file_string[:file_end].encode('utf-8')
I then try to pass the file string into BytesIO or a SpooledTemporaryFile, but they all give me errors...
Note, I do NOT want to use S3.
I'm trying to detect whether an optional file has been uploaded or not when a cgi form has been submitted.
I've read here that I should be doing something like:
myfile = form["myfile"]
if myfile.file:
# It's an uploaded file; save it to disk
file = open('path_to_file','wb')
file.write(myfile.file.read())
file.close()
But this is not working for me. The file is always being written, whether it's been uploaded or not.
While with any other fields I can always use a default value to check it:
field = cgi.escape(data.getfirst('field','null'))
I can't see the approach to face this for files in the documentation. Any help there?
Thanks.
I tested with Firefox, and uploading a form with an empty file input results in the following post contents:
-----------------------------135438447855682763402193870
Content-Disposition: form-data; name="foo"; filename=""
Content-Type: application/octet-stream
-----------------------------135438447855682763402193870--
Thus a zero-length field is uploaded. However no filename is set so test for the value of the filename attribute instead.
If a field represents an uploaded file, accessing the value via the value attribute or the
getvalue() method reads the entire file in memory as a string. This may not be what you want.
You can test for an uploaded file by testing either the filename attribute or the file attribute.
I am using Bottle for uploading rather large files. The idea is that when the file is uploaded, the web app run (and forget) a system command with the uploaded file-path as an argument. Except for starting the system command with the correct file-path as an argument I do not need to save the file, but I need to be certain that the file will be available until the process completes the processing.
I use the exact code described here:
http://bottlepy.org/docs/dev/tutorial.html#post-form-data-and-file-uploads
My questions are:
Do bottle store uploaded file in memory or on a specific place on the disk (or perhaps like Flask, a bit of both)?
Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?
What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?
Ok, let's break this down.
The full code is:
HTML:
<form action="/upload" method="post" enctype="multipart/form-data">
<input type="text" name="name" />
<input type="file" name="data" />
</form>
PYTHON CODE:
from bottle import route, request
#route('/upload', method='POST')
def do_upload():
name = request.forms.name
data = request.files.data
if name and data and data.file:
raw = data.file.read() # This is dangerous for big files
filename = data.filename
return "Hello %s! You uploaded %s (%d bytes)." % (name, filename, len(raw))
return "You missed a field."
(From the doc's you provided)
So, first of all, we can see that we first pull the information from the name and the data in the html form, and assign them to the variables name and data. Thats pretty straight forward. However, next we assign the variable raw to data.file.read(). This is basically taking all of the file uploaded into the variable raw. This being said, the entire file is in memory, which is why they put "This is dangerous for big files" as a comment next to the line.
This being said, if you wanted to save the file out to disk, you could do so (but be careful) using something like:
with open(filename,'w') as open_file:
open_file.write(data.file.read())
As for your other questions:
1."What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?"
You should see the subprocess module, specifically Popen: http://docs.python.org/2/library/subprocess.html#popen-constructor
2."Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?"
Yes, you can pass the file data around without saving it to disk, however, be warned that memory consumption is something to watch. However, if these "tools" are not in python, you may be dealing with pipes or subprocesses to pass the data to these "tools".
with open(filename,'w') as open_file:
open_file.write(data.file.read())
dont work
you can use
data = request.files.data
data.save(Path,overwrite=True)
The file will be handled by the routine you use. That means your read handles the connection (the file should not be there, according to wsgi spec)
with open(filename, "wb") as file:
Data = data.file.read()
if type(Data) == bytes: file.write(Data)
elif type(Data) == str: file.write(Data.encode("utf-8"))
Easy :D
I need to validate the contents of an uploaded XML file in my Form clean method, but I'm unable to open the file for validation. It seams, in the clean method, the file hasn't yet been moved from memory (or the temporary directory) to the destination directory.
For example the following code doesn't work because the file hasn't been moved to that destination yet. It's still in memory (or the temporary directory):
xml_file = cleaned_data.get('xml_file')
xml_file_absolute = '%(1)s%(2)s' % {'1': settings.MEDIA_ROOT, '2': xml_file}
xml_size = str(os.path.getsize(xml_file_absolute))
When I look at the "cleaned_data" variable it shows this:
{'xml_file': <InMemoryUploadedFile: texting.nzb (application/octet-stream)>}
cleaned_data.get('xml_file') only returns "texting.nzb" as a string.
Is there another way to access the the file in memory (or the temporary directory)?
Again, this is in my Form's clean method that's tied into the default administration view. I've been told time and time again that all validation should be handled in a Form, not the view. Correct?
I'm assuming that you've bound your form to the files using:
my_form = MyFormClass(request.POST, request.FILES)
If you have, once the form has been validated, you can access the file content itself using the request.FILES dictionary:
if my_form.is_valid():
data = request.FILES['myfile'].read()
The request.FILES['myfile'] object is an UploadedFile object, so it supports file-like read/write operations.
If you need to access the file contents from within the form's clean method (or any method of the cleaning machinery), you are doing it right. cleaned_data.get('xml_file') returns an UploadedFile object. The __str__ method of that object just prints out the string, which is why you see only the file name. However, you can get access to the entire contents:
xml_file = myform.cleaned_data.get('xml_file')
print xml_file.read()
This section of the docs has some great examples: http://docs.djangoproject.com/en/dev/topics/http/file-uploads/