I'm trying to detect whether an optional file has been uploaded or not when a cgi form has been submitted.
I've read here that I should be doing something like:
myfile = form["myfile"]
if myfile.file:
# It's an uploaded file; save it to disk
file = open('path_to_file','wb')
file.write(myfile.file.read())
file.close()
But this is not working for me. The file is always being written, whether it's been uploaded or not.
While with any other fields I can always use a default value to check it:
field = cgi.escape(data.getfirst('field','null'))
I can't see the approach to face this for files in the documentation. Any help there?
Thanks.
I tested with Firefox, and uploading a form with an empty file input results in the following post contents:
-----------------------------135438447855682763402193870
Content-Disposition: form-data; name="foo"; filename=""
Content-Type: application/octet-stream
-----------------------------135438447855682763402193870--
Thus a zero-length field is uploaded. However no filename is set so test for the value of the filename attribute instead.
If a field represents an uploaded file, accessing the value via the value attribute or the
getvalue() method reads the entire file in memory as a string. This may not be what you want.
You can test for an uploaded file by testing either the filename attribute or the file attribute.
Related
I am trying to read-in a file from a Python request, form data. All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body. I see many examples out there like: if 'filename' in request.files:, however this never works for me. I know that the file does in fact live within the ImmutableMultiDict type. Here is my working code example:
if 'my_file.xls' in request.files:
# do something
else:
# return error
if 'file' in request.files:
This is looking for the field name 'file' which corresponds to the name attribute you set in the form:
<input type='file' name='file'>
You then need to do something like this to assign the FileStorage object to the variable mem:
mem = request.files['file']
See my recent answer for more details of how and why.
You can then access the filename itself with:
mem.filename # should give back 'my_file.xls'
To actually read the stream data:
mem.read()
The official flask docs have further info on this, and how to save to disk with secure_filename() etc. Probably worth a read.
All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body.
If you actually want to read the contents of that Excel file, then you'll need to use a library which has compatibility for this such as xlrd. this answer demonstrates how to open a workbook, passing it as a stream. Note that they have used fileobj as the variable name, instead of mem.
I'm trying to build an AWS lambda function that accepts a file upload and then parses it in memory. The file is an xlsx file, and the content comes in to the lambda function looking like this in the body key of the event:
Beginning:
----------------------------300017151060007960655534
Content-Disposition: form-data; name="tag_list"; filename="test-list.xlsx"
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
PK
�y�N docProps/PK
And the end of the string looks like this:
[Content_Types].xmlPK�;
----------------------------475068475306850797919587--
If I do a head/tail of the actual file on my computer, it appears that the file starts at the PK and ends at the xmlPK�;. I've attempted to slice this section out and create a BytesIO object or a SpooledTemporaryFile, but none of these options work. They all give me something like invalid seek position, or bad zip file errors.
My goal is to load this xlsx file into memory and then parse it using openpyxl.
My current function looks a little something like this currently. I keep trying to format it differently, sometimes I decode it, sometimes not.
def lambda_handler(event, context):
file_index = event['body'].index('PK')
file_string = event['body'][file_index:]
file_end = file_string.index(';')
file = file_string[:file_end].encode('utf-8')
I then try to pass the file string into BytesIO or a SpooledTemporaryFile, but they all give me errors...
Note, I do NOT want to use S3.
I am using Bottle for uploading rather large files. The idea is that when the file is uploaded, the web app run (and forget) a system command with the uploaded file-path as an argument. Except for starting the system command with the correct file-path as an argument I do not need to save the file, but I need to be certain that the file will be available until the process completes the processing.
I use the exact code described here:
http://bottlepy.org/docs/dev/tutorial.html#post-form-data-and-file-uploads
My questions are:
Do bottle store uploaded file in memory or on a specific place on the disk (or perhaps like Flask, a bit of both)?
Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?
What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?
Ok, let's break this down.
The full code is:
HTML:
<form action="/upload" method="post" enctype="multipart/form-data">
<input type="text" name="name" />
<input type="file" name="data" />
</form>
PYTHON CODE:
from bottle import route, request
#route('/upload', method='POST')
def do_upload():
name = request.forms.name
data = request.files.data
if name and data and data.file:
raw = data.file.read() # This is dangerous for big files
filename = data.filename
return "Hello %s! You uploaded %s (%d bytes)." % (name, filename, len(raw))
return "You missed a field."
(From the doc's you provided)
So, first of all, we can see that we first pull the information from the name and the data in the html form, and assign them to the variables name and data. Thats pretty straight forward. However, next we assign the variable raw to data.file.read(). This is basically taking all of the file uploaded into the variable raw. This being said, the entire file is in memory, which is why they put "This is dangerous for big files" as a comment next to the line.
This being said, if you wanted to save the file out to disk, you could do so (but be careful) using something like:
with open(filename,'w') as open_file:
open_file.write(data.file.read())
As for your other questions:
1."What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?"
You should see the subprocess module, specifically Popen: http://docs.python.org/2/library/subprocess.html#popen-constructor
2."Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?"
Yes, you can pass the file data around without saving it to disk, however, be warned that memory consumption is something to watch. However, if these "tools" are not in python, you may be dealing with pipes or subprocesses to pass the data to these "tools".
with open(filename,'w') as open_file:
open_file.write(data.file.read())
dont work
you can use
data = request.files.data
data.save(Path,overwrite=True)
The file will be handled by the routine you use. That means your read handles the connection (the file should not be there, according to wsgi spec)
with open(filename, "wb") as file:
Data = data.file.read()
if type(Data) == bytes: file.write(Data)
elif type(Data) == str: file.write(Data.encode("utf-8"))
Easy :D
I am trying to upload a zip file from Web2Py form and then read the contents:
form = FORM(TABLE(
TR(TD('Upload File:', INPUT(_type='file',
_name='myfile',
id='myfile',
requires=IS_NOT_EMPTY()))),
TR(TD(INPUT(_type='submit',_value='Submit')))
))
if form.accepts(request.vars):
data=StringIO.StringIO(request.vars.myfile)
import zipfile
zfile=zipfile.Zipfile(data)
For some reason this code does work and complains of file not being a zip file although the uploaded file is a zip file.
I am new to Web2Py. How can the data be represented as zip-file?
web2py form field uploads already are cgi.FieldStorage, you can get the raw uploaded bytes using:
data = request.vars.myfile.value
For a file-like object StringIO is not needed, use:
filelike = request.vars.myfile.file
zip = zipfile.Zipfile(filelike)
HTTP uploads aren't just raw binary, it's mixed-multipart-form encoded. Write request.vars.myfile out to disk and you'll see, it'll say something like
------------------BlahBlahBoundary
Content-Disposition: type="file"; name="myfile"
Content-Type: application/octet-stream
<binary data>
------------------BlahBlahBoundary--
The naive solution for this is, use cgi.FieldStorage(), the example I provide uses wsgi.input, which is part of mod_wsgi.
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
raw_filw = cStringIO.StringIO(form['myfile'].file.read())
Two things to point out here
Always use cStringIO if you have it,
it'll be faster than StringIO
If you allow uploads like this,
you're streaming the file into ram,
so, however big the file is is how
much ram you'll be using - this does
NOT scale. I had to write my own
custom MIME stream parser to stream
files to disk through python to avoid
this. But, if you're learning or this is
a proof of concept you should be fine.
I need to validate the contents of an uploaded XML file in my Form clean method, but I'm unable to open the file for validation. It seams, in the clean method, the file hasn't yet been moved from memory (or the temporary directory) to the destination directory.
For example the following code doesn't work because the file hasn't been moved to that destination yet. It's still in memory (or the temporary directory):
xml_file = cleaned_data.get('xml_file')
xml_file_absolute = '%(1)s%(2)s' % {'1': settings.MEDIA_ROOT, '2': xml_file}
xml_size = str(os.path.getsize(xml_file_absolute))
When I look at the "cleaned_data" variable it shows this:
{'xml_file': <InMemoryUploadedFile: texting.nzb (application/octet-stream)>}
cleaned_data.get('xml_file') only returns "texting.nzb" as a string.
Is there another way to access the the file in memory (or the temporary directory)?
Again, this is in my Form's clean method that's tied into the default administration view. I've been told time and time again that all validation should be handled in a Form, not the view. Correct?
I'm assuming that you've bound your form to the files using:
my_form = MyFormClass(request.POST, request.FILES)
If you have, once the form has been validated, you can access the file content itself using the request.FILES dictionary:
if my_form.is_valid():
data = request.FILES['myfile'].read()
The request.FILES['myfile'] object is an UploadedFile object, so it supports file-like read/write operations.
If you need to access the file contents from within the form's clean method (or any method of the cleaning machinery), you are doing it right. cleaned_data.get('xml_file') returns an UploadedFile object. The __str__ method of that object just prints out the string, which is why you see only the file name. However, you can get access to the entire contents:
xml_file = myform.cleaned_data.get('xml_file')
print xml_file.read()
This section of the docs has some great examples: http://docs.djangoproject.com/en/dev/topics/http/file-uploads/