Bottle file upload and process - python

I am using Bottle for uploading rather large files. The idea is that when the file is uploaded, the web app run (and forget) a system command with the uploaded file-path as an argument. Except for starting the system command with the correct file-path as an argument I do not need to save the file, but I need to be certain that the file will be available until the process completes the processing.
I use the exact code described here:
http://bottlepy.org/docs/dev/tutorial.html#post-form-data-and-file-uploads
My questions are:
Do bottle store uploaded file in memory or on a specific place on the disk (or perhaps like Flask, a bit of both)?
Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?
What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?

Ok, let's break this down.
The full code is:
HTML:
<form action="/upload" method="post" enctype="multipart/form-data">
<input type="text" name="name" />
<input type="file" name="data" />
</form>
PYTHON CODE:
from bottle import route, request
#route('/upload', method='POST')
def do_upload():
name = request.forms.name
data = request.files.data
if name and data and data.file:
raw = data.file.read() # This is dangerous for big files
filename = data.filename
return "Hello %s! You uploaded %s (%d bytes)." % (name, filename, len(raw))
return "You missed a field."
(From the doc's you provided)
So, first of all, we can see that we first pull the information from the name and the data in the html form, and assign them to the variables name and data. Thats pretty straight forward. However, next we assign the variable raw to data.file.read(). This is basically taking all of the file uploaded into the variable raw. This being said, the entire file is in memory, which is why they put "This is dangerous for big files" as a comment next to the line.
This being said, if you wanted to save the file out to disk, you could do so (but be careful) using something like:
with open(filename,'w') as open_file:
open_file.write(data.file.read())
As for your other questions:
1."What would be the best way to start the system command with the file as an argument? Is it possible to just pass the path to an existing file directly?"
You should see the subprocess module, specifically Popen: http://docs.python.org/2/library/subprocess.html#popen-constructor
2."Will the uploaded file be directly available to other tools without .read() and then manually saving the bytes to a specified file on disk?"
Yes, you can pass the file data around without saving it to disk, however, be warned that memory consumption is something to watch. However, if these "tools" are not in python, you may be dealing with pipes or subprocesses to pass the data to these "tools".

with open(filename,'w') as open_file:
open_file.write(data.file.read())
dont work
you can use
data = request.files.data
data.save(Path,overwrite=True)

The file will be handled by the routine you use. That means your read handles the connection (the file should not be there, according to wsgi spec)

with open(filename, "wb") as file:
Data = data.file.read()
if type(Data) == bytes: file.write(Data)
elif type(Data) == str: file.write(Data.encode("utf-8"))
Easy :D

Related

Read-in Files from Flask request module

I am trying to read-in a file from a Python request, form data. All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body. I see many examples out there like: if 'filename' in request.files:, however this never works for me. I know that the file does in fact live within the ImmutableMultiDict type. Here is my working code example:
if 'my_file.xls' in request.files:
# do something
else:
# return error
if 'file' in request.files:
This is looking for the field name 'file' which corresponds to the name attribute you set in the form:
<input type='file' name='file'>
You then need to do something like this to assign the FileStorage object to the variable mem:
mem = request.files['file']
See my recent answer for more details of how and why.
You can then access the filename itself with:
mem.filename # should give back 'my_file.xls'
To actually read the stream data:
mem.read()
The official flask docs have further info on this, and how to save to disk with secure_filename() etc. Probably worth a read.
All I want to do is read-in the incoming file in the request body, parse the contents and return the contents as a json body.
If you actually want to read the contents of that Excel file, then you'll need to use a library which has compatibility for this such as xlrd. this answer demonstrates how to open a workbook, passing it as a stream. Note that they have used fileobj as the variable name, instead of mem.

Uploading and Downloading Files with Flask

I'm trying to write a really simply webapp with PythonAnywhere and Flask that has lets the user upload a text file, generates a csv file, then lets the user download the csv file. It doesn't have to be fancy, it only has to work. I have already written the program for generating the csv from a txt file on the drive.
Right now, my function opens the file on the drive with:
with open(INPUTFILE, "r") as fname:
and writes the csv with:
with open(OUTPUTFILE, 'w') as fname:
with INPUTFILE and OUTPUTFILE being filename strings.
Would it be better for me to handle the files as objects, returned by the flask/html somehow?
I don't know how to do this. How should I structure this program? How many HTML Templates do I need? I would prefer to work on the files wihthout saving them anywhere but if I have to save them to the PythonAnywhere directory, I could. How can I do that?
PythonAnywhere dev here. This is a good question about Flask and web development in general rather than specific to our system, so I'll try to give a generic answer without anything specific to us :-)
There are a few things that I'd need to know to give a definitive answer to your question, so I'll start by listing the assumptions I'm making -- leave me a comment if I'm wrong with any of them and I'll update the answer appropriately.
I'm assuming that the files you're uploading aren't huge and can fit into a reasonable amount of memory -- let's say, smaller than a megabyte.
I'm assuming that the program that you've already written to generate the CSV from the text file is in Python, and that it has (or, perhaps more likely, could be easily changed to have) a function that takes a string containing the contents of the text file, and returns the contents that need to be written into the CSV.
If both of those are the case, then the best way to structure your Flask app would be to handle everything inside Flask. A code sample is worth a thousand words, so here's a simple one I put together that allows the user to upload a text file, runs it through a function called transform (which is where the function from your conversion program would slot in -- mine just replaces = with , throughout the file), and sends the results back to the browser. There's a live version of this app on PythonAnywhere here.
from flask import Flask, make_response, request
app = Flask(__name__)
def transform(text_file_contents):
return text_file_contents.replace("=", ",")
#app.route('/')
def form():
return """
<html>
<body>
<h1>Transform a file demo</h1>
<form action="/transform" method="post" enctype="multipart/form-data">
<input type="file" name="data_file" />
<input type="submit" />
</form>
</body>
</html>
"""
#app.route('/transform', methods=["POST"])
def transform_view():
request_file = request.files['data_file']
if not request_file:
return "No file"
file_contents = request_file.stream.read().decode("utf-8")
result = transform(file_contents)
response = make_response(result)
response.headers["Content-Disposition"] = "attachment; filename=result.csv"
return response
Regarding your other questions:
Templates: I didn't use a template for this example, because I wanted it all to fit into a single piece of code. If I were doing it properly then I'd put the stuff that's generated by the form view into a template, but that's all.
Can you do it by writing to files -- yes you can, and the uploaded file can be saved by using the save(filename) method on the file object that I'm using the stream property of. But if your files are pretty small (as per my assumption above) then it probably makes more sense to process them in-memory like the code above does.
I hope that all helps, and if you have any questions then just leave a comment.
Better to add
response.headers["Cache-Control"] = "must-revalidate"
response.headers["Pragma"] = "must-revalidate"
response.headers["Content-type"] = "application/csv"
If you don't add the content type, FF 48.0 reported it as html and opened Save dialog once for HTML and then for CSV. If you don't add Cache-Control your result may get cached, and if you serve active content this is not what you want. If you use must-revalidate with no age, it will effectively serve as no-cache - see here and here for an explanation.

Difference between opening a file and using a module

Below, I print out the html for a form defined externally. Is there a difference in the way each string is retrieved and used in foo.py, other than the syntax? If so, in what circumstances would one method be preferred over the other? For example, would I be better off defining a number of html files in a module as strings and access them that way, as opposed keeping them in separate .html files and using open over and over?
mod.py
form = """\
<form type="POST" action="test.py">
Enter something:<input type="text" name="somethign">
</form>
"""
form.html
<form type="POST" action="test.py">
Enter something:<input type="text" name="something">
</form>
foo.py
import mod
print mod.form
with open('form.html', 'r') as form:
print form.read()
Having .html files is better. Sure, you will have some overhead opening a file, reading its content and then closing it, but you'll have many advantages:
.html file can be edited by any person who knows HTML syntax.
.html files can be edited without restarting your program, it is very useful for services.
You can eliminate open/read/close overhead by introducing some caching technique.
It's a lot easier for designers to edit discrete HTML files than to deal with HTML embedded in code.

Web2Py - Upload a file and read the content as Zip file

I am trying to upload a zip file from Web2Py form and then read the contents:
form = FORM(TABLE(
TR(TD('Upload File:', INPUT(_type='file',
_name='myfile',
id='myfile',
requires=IS_NOT_EMPTY()))),
TR(TD(INPUT(_type='submit',_value='Submit')))
))
if form.accepts(request.vars):
data=StringIO.StringIO(request.vars.myfile)
import zipfile
zfile=zipfile.Zipfile(data)
For some reason this code does work and complains of file not being a zip file although the uploaded file is a zip file.
I am new to Web2Py. How can the data be represented as zip-file?
web2py form field uploads already are cgi.FieldStorage, you can get the raw uploaded bytes using:
data = request.vars.myfile.value
For a file-like object StringIO is not needed, use:
filelike = request.vars.myfile.file
zip = zipfile.Zipfile(filelike)
HTTP uploads aren't just raw binary, it's mixed-multipart-form encoded. Write request.vars.myfile out to disk and you'll see, it'll say something like
------------------BlahBlahBoundary
Content-Disposition: type="file"; name="myfile"
Content-Type: application/octet-stream
<binary data>
------------------BlahBlahBoundary--
The naive solution for this is, use cgi.FieldStorage(), the example I provide uses wsgi.input, which is part of mod_wsgi.
form = cgi.FieldStorage(fp=environ['wsgi.input'], environ=environ)
raw_filw = cStringIO.StringIO(form['myfile'].file.read())
Two things to point out here
Always use cStringIO if you have it,
it'll be faster than StringIO
If you allow uploads like this,
you're streaming the file into ram,
so, however big the file is is how
much ram you'll be using - this does
NOT scale. I had to write my own
custom MIME stream parser to stream
files to disk through python to avoid
this. But, if you're learning or this is
a proof of concept you should be fine.

python upload - where are tmp/FILES?

I'm running python 2.4 from cgi and I'm trying to upload to a cloud service using a python api. In php, the $_FILE array contains a "tmp" element which is where the file lives until you place it where you want it. What's the equivalent in python?
if I do this
fileitem = form['file']
fileitem.filename is the name of the file
if i print fileitem, the array simply contains the file name and what looks to be the file itself.
I am trying to stream things and it requires the tmp location when using the php api.
The file is a real file, but the cgi.FieldStorage unlinked it as soon as it was created so that it would exist only as long as you keep it open, and no longer has a real path on the file system.
You can, however, change this...
You can extend the cgi.FieldStorage and replace the make_file method to place the file wherever you want:
import os
import cgi
class MyFieldStorage(cgi.FieldStorage):
def make_file(self, binary=None):
return open(os.path.join('/tmp', self.filename), 'wb')
You must also keep in mind that the FieldStorage object only creates a real file if it recieves more than 1000B (otherwise it is a cStringIO.StringIO)
EDIT: The cgi module actually makes the file with the tempfile module, so check that out if you want lots of gooey details.
Here's a code snippet taken from my site:
h = open("user_uploaded_file", "wb")
while 1:
data = form["file"].file.read(4096)
if not data:
break
h.write(data)
h.close()
Hope this helps.

Categories