So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.
At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.
I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!
Here's the code I've used:
import cgi
from wsgiref.simple_server import make_server
def app(environ,start_response):
start_response('200 OK',[('Content-Type','text/html')])
output = """
<form action="" method="post" enctype="multipart/form-data">
<input type="file" name="failas" />
<input type="submit" value="Varom" />
</form>
"""
fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
f = fs.getfirst('failas')
print type(f)
return output
if __name__ == '__main__' :
httpd = make_server('',8000,app)
print 'Serving'
httpd.serve_forever()
Thanks in advance! :)
Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.
If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:
fileitem = form["userfile"]
if fileitem.file:
# It's an uploaded file; count lines
linecount = 0
while 1:
line = fileitem.file.readline()
if not line: break
linecount = linecount + 1
Regarding your example, getfirst() is just a version of getvalue().
try replacing
f = fs.getfirst('failas')
with
f = fs['failas'].file
This will return a file-like object that is readable "at leisure".
The best way is to NOT to read file (or even each line at a time as gimel suggested).
You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.
For your reference, default make_file looks like this:
def make_file(self, binary=None):
"""Overridable: return a readable & writable file.
The file will be used as follows:
- data is written to it
- seek(0)
- data is read from it
The 'binary' argument is unused -- the file is always opened
in binary mode.
This version opens a temporary file for reading and writing,
and immediately deletes (unlinks) it. The trick (on Unix!) is
that the file can still be used, but it can't be opened by
another process, and it will automatically be deleted when it
is closed or when the current process terminates.
If you want a more permanent file, you derive a class which
overrides this method. If you want a visible temporary file
that is nevertheless automatically deleted when the script
terminates, try defining a __del__ method in a derived class
which unlinks the temporary files you have created.
"""
import tempfile
return tempfile.TemporaryFile("w+b")
rather then creating temporaryfile, permanently create file wherever you want.
Using an answer by #hasanatkazmi (utilized in a Twisted app) I got something like:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile
class PredictableStorage(cgi.FieldStorage):
def __init__(self, *args, **kwargs):
self.path = kwargs.pop('path', None)
cgi.FieldStorage.__init__(self, *args, **kwargs)
def make_file(self, binary=None):
if not self.path:
file = tempfile.NamedTemporaryFile("w+b", delete=False)
self.path = file.name
return file
return open(self.path, 'w+b')
Be warned, that the file is not always created by the cgi module. According to these cgi.py lines it will only be created if the content exceeds 1000 bytes:
if self.__file.tell() + len(line) > 1000:
self.file = self.make_file('')
So, you have to check if the file was actually created with a query to a custom class' path field like so:
if file_field.path:
# Using an already created file...
else:
# Creating a temporary named file to store the content.
import tempfile
with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
f.write(file_field.value)
# You can save the 'f.name' field for later usage.
If the Content-Length is also set for the field, which seems rarely, the file should also be created by cgi.
That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.
Related
I have a form with an input tag and a submit button:
<input type="file" name="filename" size="25">
I have a python file that handles the post:
def post(self):
The file that I'm receiving in the form is an .xml file, in the python post function I want to send that 'foo.xml' to another function that is going to validate it (using minixsv)
My question is how do I retrieve the file? I tried:
form = cgi.FieldStorage()
inputfile = form.getvalue('filename')
but this puts the content in the inputfile, I don't have a 'foo.xml' file per se that I can pass to the minisxv function which request a .xml file not the text...
Update I found a function that accepts text instead of an input file, thanks anyway
Oftentimes, there's also a function to extract XML from a string. For example, minidom has parseString, and lxml etree.XML.
If you have the content, you can make a file-like object with StringIO:
from StringIO import StringIO
content = form.getvalue('filename')
fileh = StringIO(content)
# You can now call fileh.read, or iterate over it
If you must have a file on the disk, use tempfile.mkstemp:
import tempfile
content = form.getvalue('filename')
tmpf, tmpfn = tempfile.mkstemp()
tmpf.write(content)
tmpf.close()
# Now, give tmpfn to the function expecting a filename
os.unlink(tmpfn) # Finally, delete the file
This probably won't be the best answer, but why not consider using StringIO on your inputfile variable, and passing the StringIO object as the file handle to your minisxv function? Alternately, why not open an actual new file handle for foo.xml, save the contents of inputfile to it (i.e., via open), and then pass foo.xml to your minisxv function?
I'm having problems with the standard Django FileField and tempfile.TemporaryFile. Whenever I try to save a FileField with the TemporaryFile, I get the "Unable to determine the file's size" error.
For example, given a model named Model, a filefield named FileField, and a temporaryfile named TempFile:
Model.FileField.save('foobar', django.core.files.File(TempFile), save=True)
This will give me the aforementioned error. Any thoughts?
I had this problem with tempfile.TemporaryFile. When I switched to tempfile.NamedTemporaryFile it went away. I believe that TemporaryFile just simulates being a file (on some operating system at least), whereas NamedTemporaryFile really is a file.
I was having the same problem and was able to solve it for my case. This is the code that django uses to determine the size of a file:
def _get_size(self):
if not hasattr(self, '_size'):
if hasattr(self.file, 'size'):
self._size = self.file.size
elif os.path.exists(self.file.name):
self._size = os.path.getsize(self.file.name)
else:
raise AttributeError("Unable to determine the file's size.")
return self._size
Therefore, django will raise an AttributeError if the file does not exist on disk (or have a size attribute already defined). Since the TemporaryFile class attempts to create a file in memory instead of actually on disk, this _get_size method doesn't work. In order to get it to work, I had to do something like this:
import tempfile, os
# Use tempfile.mkstemp, since it will actually create the file on disk.
(temp_filedescriptor, temp_filepath) = tempfile.mkstemp()
# Close the open file using the file descriptor, since file objects
# returned by os.fdopen don't work, either
os.close(temp_filedescriptor)
# Open the file on disk
temp_file = open(temp_filepath, "w+b")
# Do operations on your file here . . .
modelObj.fileField.save("filename.txt", File(temp_file))
temp_file.close()
# Remove the created file from disk.
os.remove(temp_filepath)
Alternatively (and preferably), if you can calculate the size of the temporary file you're creating, you could set a size attribute on the TemporaryFile object directly. Due to the libraries I was using, this was not a possibility for me.
I had this issue on Heroku even with tempfile.NamedTemporaryFile and was quite disappointed ...
I solved it using Steven's tips by setting arbitrary size manually (yes, dirty, but work for me):
from django.core.files import File
from django.core.files.temp import NamedTemporaryFile
img_temp = NamedTemporaryFile()
# Do your stuffs ...
img_temp.flush()
img_temp.size = 1024
media.thumbnail.save('dummy', File(img_temp))
Thanks !
I know this is a bit old but I've managed to save a base64 file (without having the actual file saved on the disk) by using the ContentFile class provided by Django.
According to the docs:
The ContentFile class inherits from File, but unlike File it operates on string content (bytes also supported), rather than an actual file.
The snippet below receives a base64 string, extract it's data and file extension and save it to an ImageField using the ContentFile class
import uuid
from django.core.files.base import ContentFile
def convert_b64data(b64data, filename):
file_format, imgstr = b64data.split(';base64,')
ext = file_format.split('/')[-1]
return {
'obj': base64.b64decode(imgstr),
'extension': ext,
}
b64data = request.data['b64file']
filename = str(uuid.uuid4())
file_data = convert_b64data(b64data, filename)
file_path = 'media/{}/{}.{}'.format(
user.code,
filename,
file_data['extension']
)
user.banner.save(file_path, ContentFile(file_data['obj']))
In newer versions of Django (I checked on 3.2), you just may need to wrap the file in a ContentFile.
from django.core.files.base import ContentFile
Model.FileField.save('foobar', ContentFile(file))
https://docs.djangoproject.com/en/3.2/ref/files/file/
I know that it is possible to create a temporary file, and write the data of the file I wish to copy to it. I was just wondering if there was a function like:
create_temporary_copy(file_path)
There isn't one directly, but you can use a combination of tempfile and shutil.copy2 to achieve the same result:
import tempfile, shutil, os
def create_temporary_copy(path):
temp_dir = tempfile.gettempdir()
temp_path = os.path.join(temp_dir, 'temp_file_name')
shutil.copy2(path, temp_path)
return temp_path
You'll need to deal with removing the temporary file in the caller, though.
This isn't quite as concise, and I imagine there may be issues with exception safety, (e.g. what happens if 'original_path' doesn't exist, or the temporary_copy object goes out of scope while you have the file open) but this code adds a little RAII to the clean up. The difference here to using NamedTemporaryFile directly is that rather than ending up with a file object, you end up with a file, which is occasionally desirable (e.g. if you plan to call out to other code to read it, or some such.)
import os,shutil,tempfile
class temporary_copy(object):
def __init__(self,original_path):
self.original_path = original_path
def __enter__(self):
temp_dir = tempfile.gettempdir()
base_path = os.path.basename(self.original_path)
self.path = os.path.join(temp_dir,base_path)
shutil.copy2(self.original_path, self.path)
return self.path
def __exit__(self,exc_type, exc_val, exc_tb):
os.remove(self.path)
in your code you'd write:
with temporary_copy(path) as temporary_path_to_copy:
... do stuff with temporary_path_to_copy ...
# Here in the code, the copy should now have been deleted.
The following is more concise (OP's ask) than the selected answer. Enjoy!
import tempfile, shutil, os
def create_temporary_copy(path):
tmp = tempfile.NamedTemporaryFile(delete=True)
shutil.copy2(path, tmp.name)
return tmp.name
A variation on #tramdas's answer, accounting for the fact that the file cannot be opened twice on windows. This version ignores the preservation of the file extension.
import os, shutil, tempfile
def create_temporary_copy(src):
# create the temporary file in read/write mode (r+)
tf = tempfile.TemporaryFile(mode='r+b', prefix='__', suffix='.tmp')
# on windows, we can't open the the file again, either manually
# or indirectly via shutil.copy2, but we *can* copy
# the file directly using file-like objects, which is what
# TemporaryFile returns to us.
# Use `with open` here to automatically close the source file
with open(src,'r+b') as f:
shutil.copyfileobj(f,tf)
# display the name of the temporary file for diagnostic purposes
print 'temp file:',tf.name
# rewind the temporary file, otherwise things will go
# tragically wrong on Windows
tf.seek(0)
return tf
# make a temporary copy of the file 'foo.txt'
name = None
with create_temporary_copy('foo.txt') as temp:
name = temp.name
# prove that it exists
print 'exists', os.path.isfile(name) # prints True
# read all lines from the file
i = 0
for line in temp:
print i,line.strip()
i += 1
# temp.close() is implicit using `with`
# prove that it has been deleted
print 'exists', os.path.isfile(name) # prints False
A slight variation (in particular I needed the preserve_extension feature for my use case, and I like the "self-cleanup" feature):
import os, shutil, tempfile
def create_temporary_copy(src_file_name, preserve_extension=False):
'''
Copies the source file into a temporary file.
Returns a _TemporaryFileWrapper, whose destructor deletes the temp file
(i.e. the temp file is deleted when the object goes out of scope).
'''
tf_suffix=''
if preserve_extension:
_, tf_suffix = os.path.splitext(src_file_name)
tf = tempfile.NamedTemporaryFile(suffix=tf_suffix)
shutil.copy2(src_file_name, tf.name)
return tf
Hello
My error is produced in generating a zip file. Can you inform what I should do?
main.py", line 2289, in get
buf=zipf.read(2048)
NameError: global name 'zipf' is not defined
The complete code is as follows:
def addFile(self,zipstream,url,fname):
# get the contents
result = urlfetch.fetch(url)
# store the contents in a stream
f=StringIO.StringIO(result.content)
length = result.headers['Content-Length']
f.seek(0)
# write the contents to the zip file
while True:
buff = f.read(int(length))
if buff=="":break
zipstream.writestr(fname,buff)
return zipstream
def get(self):
self.response.headers["Cache-Control"] = "public,max-age=%s" % 86400
start=datetime.datetime.now()-timedelta(days=20)
count = int(self.request.get('count')) if not self.request.get('count')=='' else 1000
from google.appengine.api import memcache
memcache_key = "ads"
data = memcache.get(memcache_key)
if data is None:
a= Ad.all().filter("modified >", start).filter("url IN", ['www.koolbusiness.com']).filter("published =", True).order("-modified").fetch(count)
memcache.set("ads", a)
else:
a = data
dispatch='templates/kml.html'
template_values = {'a': a , 'request':self.request,}
path = os.path.join(os.path.dirname(__file__), dispatch)
output = template.render(path, template_values)
self.response.headers['Content-Length'] = len(output)
zipstream=StringIO.StringIO()
file = zipfile.ZipFile(zipstream,"w")
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
zipstream.seek(0)
# create and return the output stream
self.response.headers['Content-Type'] ='application/zip'
self.response.headers['Content-Disposition'] = 'attachment; filename="list.kmz"'
while True:
buf=zipf.read(2048)
if buf=="": break
self.response.out.write(buf)
That is probably zipstream and not zipf. So replace that with zipstream and it might work.
i don't see where you declare zipf?
zipfile? Senthil Kumaran is probably right with zipstream since you seek(0) on zipstream before the while loop to read chunks of the mystery variable.
edit:
Almost certainly the variable is zipstream.
zipfile docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or
a file-like object. The mode parameter
should be 'r' to read an existing
file, 'w' to truncate and write a new
file, or 'a' to append to an existing
file. If mode is 'a' and file refers
to an existing ZIP file, then
additional files are added to it. If
file does not refer to a ZIP file,
then a new ZIP archive is appended to
the file. This is meant for adding a
ZIP archive to another file (such as
python.exe).
your code:
zipsteam=StringIO.StringIO()
create a file-like object using StringIO which is essentially a "memory file" read more in docs
file = zipfile.ZipFile(zipstream,w)
opens the zipfile with the zipstream file-like object in 'w' mode
url = 'http://www.koolbusiness.com/list.kml'
# repeat this for every URL that should be added to the zipfile
file =self.addFile(file,url,"list.kml")
# we have finished with the zip so package it up and write the directory
file.close()
uses the addFile method to retrieve and write the retrieved data to the file-like object and returns it. The variables are slightly confusing because you pass a zipfile to the addFile method which aliases as zipstream (confusing because we are using zipstream as a StringIO file-like object). Anyways, the zipfile is returned, and closed to make sure everything is "written".
It was written to our "memory file", which we now seek to index 0
zipstream.seek(0)
and after doing some header stuff, we finally reach the while loop that will read our "memory-file" in chunks
while True:
buf=zipstream.read(2048)
if buf=="": break
self.response.out.write(buf)
You need to declare:
global zipf
right after your
def get(self):
line. you are modifying a global variable, and this is the only way python knows what you are doing.
I want to use tempfile.NamedTemporaryFile() to write some contents into it and then open that file. I have written following code:
tf = tempfile.NamedTemporaryFile()
tfName = tf.name
tf.seek(0)
tf.write(contents)
tf.flush()
but I am unable to open this file and see its contents in Notepad or similar application. Is there any way to achieve this? Why can't I do something like:
os.system('start notepad.exe ' + tfName)
at the end.
I don't want to save the file permanently on my system. I just want the contents to be opened as a text in Notepad or similar application and delete the file when I close that application.
This could be one of two reasons:
Firstly, by default the temporary file is deleted as soon as it is closed. To fix this use:
tf = tempfile.NamedTemporaryFile(delete=False)
and then delete the file manually once you've finished viewing it in the other application.
Alternatively, it could be that because the file is still open in Python Windows won't let you open it using another application.
Edit: to answer some questions from the comments:
As of the docs from 2 when using delete=False the file can be removed by using:
tf.close()
os.unlink(tf.name)
You can also use it with a context manager so that the file will be closed/deleted when it goes out of scope. It will also be cleaned up if the code in the context manager raises.
import tempfile
with tempfile.NamedTemporaryFile() as temp:
temp.write('Some data')
temp.flush()
# do something interesting with temp before it is destroyed
Here is a useful context manager for this.
(In my opinion, this functionality should be part of the Python standard library.)
# python2 or python3
import contextlib
import os
#contextlib.contextmanager
def temporary_filename(suffix=None):
"""Context that introduces a temporary file.
Creates a temporary file, yields its name, and upon context exit, deletes it.
(In contrast, tempfile.NamedTemporaryFile() provides a 'file' object and
deletes the file as soon as that file object is closed, so the temporary file
cannot be safely re-opened by another library or process.)
Args:
suffix: desired filename extension (e.g. '.mp4').
Yields:
The name of the temporary file.
"""
import tempfile
try:
f = tempfile.NamedTemporaryFile(suffix=suffix, delete=False)
tmp_name = f.name
f.close()
yield tmp_name
finally:
os.unlink(tmp_name)
# Example:
with temporary_filename() as filename:
os.system('echo Hello >' + filename)
assert 6 <= os.path.getsize(filename) <= 8 # depending on text EOL
assert not os.path.exists(filename)