Return file from python module - python

Edit: How to return/serve a file from a python controller (back end) over a web server, with the file_name? as suggested by #JV

You can either pass back a reference to the file itself i.e. the full path to the file. Then you can open the file or otherwise manipulate it.
Or, the more normal case is to pass back the file handle, and, use the standard read/write operations on the file handle.
It is not recommended to pass the actual data as files can be arbiterally large and the program could run out of memory.
In your case, you probably want to return a tuple containing the open file handle, the file name and any other meta data you are interested in.

Fully supported in CherryPy using
from cherrypy.lib.static import serve_file
As documented in the CherryPy docs - FileDownload:
import glob
import os.path
import cherrypy
from cherrypy.lib.static import serve_file
class Root:
def index(self, directory="."):
html = """<html><body><h2>Here are the files in the selected directory:</h2>
Up<br />
""" % os.path.dirname(os.path.abspath(directory))
for filename in glob.glob(directory + '/*'):
absPath = os.path.abspath(filename)
if os.path.isdir(absPath):
html += '' + os.path.basename(filename) + " <br />"
else:
html += '' + os.path.basename(filename) + " <br />"
html += """</body></html>"""
return html
index.exposed = True
class Download:
def index(self, filepath):
return serve_file(filepath, "application/x-download", "attachment")
index.exposed = True
if __name__ == '__main__':
root = Root()
root.download = Download()
cherrypy.quickstart(root)

For information on MIME types (which are how downloads happen), start here: Properly Configure Server MIME Types.
For information on CherryPy, look at the attributes of a Response object. You can set the content type of the response. Also, you can use tools.response_headers to set the content type.
And, of course, there's an example of File Download.

Related

Take uploaded files on plone and download them via a python script?

I created a document site on plone from which file uploads can be made. I saw that plone saves them in the filesystem in the form of a blob, now I need to take them through a python script that will process the pdfs downloaded with an OCR. Does anyone have any idea how to do it? Thank you
Not sure how to extract PDFs from BLOB-storage or if it's possible at all, but you can extract them from a running Plone-site (e.g. executing the script via a browser-view):
import os
from Products.CMFCore.utils import getToolByName
def isPdf(search_result):
"""Check mime_type for Plone >= 5.1, otherwise check file-extension."""
if mimeTypeIsPdf(search_result) or search_result.id.endswith('.pdf'):
return True
return False
def mimeTypeIsPdf(search_result):
"""
Plone-5.1 introduced the mime_type-attribute on files.
Try to get it, if it doesn't exist, fail silently.
Return True if mime_type exists and is PDF, otherwise False.
"""
try:
mime_type = search_result.mime_type
if mime_type == 'application/pdf':
return True
except:
pass
return False
def exportPdfFiles(context, export_path):
"""
Get all PDF-files of site and write them to export_path on the filessytem.
Remain folder-structure of site.
"""
catalog = getToolByName(context, 'portal_catalog')
search_results = catalog(portal_type='File', Language='all')
for search_result in search_results:
# For each PDF-file:
if isPdf(search_result):
file_path = export_path + search_result.getPath()
file_content = search_result.getObject().data
parent_path = '/'.join(file_path.split('/')[:-1])
# Create missing directories on the fly:
if not os.path.exists(parent_path):
os.makedirs(parent_path)
# Write PDF:
with open(file_path, 'w') as fil:
fil.write(file_content)
print 'Wrote ' + file_path
print 'Finished exporting PDF-files to ' + export_path
The example keeps the folder-structure of the Plone-site in the export-directory. If you want them flat in one directory, a handler for duplicate file-names is needed.

Python Import Statement bug?

**I solved this below**: I think it may be helpful to others in the future, so I'm keeping my question up vs. taking it down. It's a python vs. other language nested file import issue. However if anyone understands the intricacies of why this is so in python an explanatory answer would greatly be appreciated.
I had my code running fine with a file directory setup like this:
sniffer //folder
-__init__.py
-Sniffer.py
-database.py
I switched it to:
Main
-snifferLaunch.py
-flashy
--sniffer
---Sniffer.py
---database.py
In theory if I change the imports to find the folders it should still run the same way...
I was under the impression that importing a python file could be done even if it was nested. For example
import Sniffer // in snifferLaunch should go through each file and try to find a Sniffer.py file.
I however found this to be false, did I misunderstand this? So I tried looking at an example which imports files like this:
import flashy.sniffer.Sniffer as Sniffer
This does import a file I believe. When I run it it traces out an error on launch however:
Traceback (most recent call last):
File "snifferLaunch.py", line 19, in <module>
import flashy.sniffer.Sniffer
File "/Users/tai/Desktop/FlashY/flashy/sniffer/__init__.py", line 110, in <module>
File "/Users/tai/Desktop/FlashY/flashy/sniffer/__init__.py", line 107, in forInFile
File "/Users/tai/Desktop/FlashY/flashy/sniffer/__init__.py", line 98, in runFlashY
File "/Users/tai/Desktop/FlashY/flashy/sniffer/__init__.py", line 89, in db
AttributeError: 'module' object has no attribute 'getDecompiledFiles'
This would normally cause me to go look for a getDecompiledFiles function. The problem is that no where in the code is there a getDecompiledFiles. There is a get_Decompiled_Files function.
My code looks something like this (non essential parts removed). Do you see my bug? I searched the entire project and could not find a getDecompiledFiles function anywhere. I don't know why it is expecting to have an attribute of this...
snifferLaunch:
import flashy.sniffer.Sniffer as Sniffer
import flashy.sniffer.database as database
import flashy.sniffer.cleaner as cleaner
def open_websites(line):
#opens a list of websites from local file "urlIn.txt" and runs the Sniffer on them.
#It retrieves the swfs from each url and storing them in the local out/"the modified url" /"hashed swf.swf" and the file contains the decompiled swf
print( "opening websites")
newSwfFiles = [];
# reads in all of the lines in urlIn.txt
#for line in urlsToRead:
if line[0] !="#":
newLine = cleaner.remove_front(line);
# note the line[:9] is done to avoid the http// which creates an additional file to go into. The remaining part of the url is still unique.
outFileDirectory = decSwfsFolder + "/" + newLine
cleaner.check_or_create_dir(outFileDirectory)
try:
newSwfFiles = Sniffer.open_url(line, []);
except:
print " Sniffer.openURL failed"
pass
# for all of the files there it runs jpex on them. (in the future this will delete the file after jpex runs so we don't run jpex more than necessary)
for location in newSwfFiles:
cleaner.check_or_create_dir(outFileDirectory + "/" + location)
#creates the command for jpex flash decompiler, the command + file to save into + location of the swf to decompile
newCommand = javaCommand + "/" + newLine + "/" + location +"/ " + swfLoc +"/"+ location
os.system(newCommand)
print ("+++this is the command: " + newCommand+"\n")
# move the swf into a new swf file for db storage
oldLocation = swfFolder + location;
newLocation = decSwfsFolder + "/" + newLine + "/" + location + "/" + "theSwf"+ "/"
cleaner.check_or_create_dir(newLocation )
if(os.path.exists(oldLocation)):
# if the file already exists at that location do not move it simply delete it (the) duplicate
if(os.path.exists(newLocation +"/"+ location)):
os.remove(oldLocation)
else:
shutil.move(swfFolder + location, newLocation)
if cleanup:
cleaner.cleanSwf();
# newSwfFiles has the directory file location of each new added file: "directory/fileHash.swf"
def db():
database.get_decompiled_files()
def run_flashY(line):
#Run FlashY a program that decompiles all of the swfs found at urls defined in urlIn.txt.
#Each decompiled file will be stored in the PaperG Amazon S3 bucket: decompiled_swfs.
#run the program for each line
#open all of the websites in the url file urlIn.txt
open_websites(line)
#store the decompiled swfs in the database
db()
#remove all files from local storage
cleaner.clean_out()
#kill all instances of firefox
def for_in_file():
#run sniffer for each line in the file
#for each url, run then kill firefox to prevent firefox buildup
for line in urlsToRead:
run_flashY(line)
cleaner.kill_firefox()
#Main Functionality
if __name__ == '__main__':
#initialize and run the program on launch
for_in_file()
The Sniffer File:
import urllib2
from urllib2 import Request, urlopen, URLError, HTTPError
import shutil
import sys
import re
import os
import hashlib
import time
import datetime
from selenium import webdriver
import glob
import thread
import httplib
from collections import defaultdict
import cleaner
a=[];
b=[];
newSwfFiles=[];
theURL='';
curPath = os.path.dirname(os.path.realpath(__file__))
#firebug gets all network data
fireBugPath = curPath +'/firebug-1.12.8b1.xpi';
#netExport exports firebug's http archive (network req/res) in the form of a har file
netExportPath = curPath +'/netExport.xpi';
harLoc = curPath +"/har/";
swfLoc = curPath +"/swfs";
cleanThis=True
#remove har file(s) after reading them out to gather swf files
profile = webdriver.firefox.firefox_profile.FirefoxProfile();
profile.add_extension( fireBugPath);
profile.add_extension(netExportPath);
hashLib = hashlib.md5()
#firefox preferences
profile.set_preference("app.update.enabled", False)
profile.native_events_enabled = True
profile.set_preference("webdriver.log.file", curPath +"webFile.txt")
profile.set_preference("extensions.firebug.DBG_STARTER", True);
profile.set_preference("extensions.firebug.currentVersion", "1.12.8");
profile.set_preference("extensions.firebug.addonBarOpened", True);
profile.set_preference('extensions.firebug.consoles.enableSite', True)
profile.set_preference("extensions.firebug.console.enableSites", True);
profile.set_preference("extensions.firebug.script.enableSites", True);
profile.set_preference("extensions.firebug.net.enableSites", True);
profile.set_preference("extensions.firebug.previousPlacement", 1);
profile.set_preference("extensions.firebug.allPagesActivation", "on");
profile.set_preference("extensions.firebug.onByDefault", True);
profile.set_preference("extensions.firebug.defaultPanelName", "net");
#set net export preferences
profile.set_preference("extensions.firebug.netexport.alwaysEnableAutoExport", True);
profile.set_preference("extensions.firebug.netexport.autoExportToFile", True);
profile.set_preference("extensions.firebug.netexport.saveFiles", True);
profile.set_preference("extensions.firebug.netexport.autoExportToServer", False);
profile.set_preference("extensions.firebug.netexport.Automation", True);
profile.set_preference("extensions.firebug.netexport.showPreview", False);
profile.set_preference("extensions.firebug.netexport.pageLoadedTimeout", 15000);
profile.set_preference("extensions.firebug.netexport.timeout", 10000);
profile.set_preference("extensions.firebug.netexport.defaultLogDir",harLoc);
profile.update_preferences();
browser = webdriver.Firefox(firefox_profile=profile);
def open_url(url,s):
#open each url, find all of the har files with them and get those files.
theURL = url;
time.sleep(6);
#browser = webdriver.Chrome();
browser.get(url); #load the url in firefox
browser.set_page_load_timeout(30)
time.sleep(3); #wait for the page to load
browser.execute_script("window.scrollTo(0, document.body.scrollHeight/5);")
time.sleep(1); #wait for the page to load
browser.execute_script("window.scrollTo(0, document.body.scrollHeight/4);")
time.sleep(1); #wait for the page to load
browser.execute_script("window.scrollTo(0, document.body.scrollHeight/3);")
time.sleep(1); #wait for the page to load
browser.execute_script("window.scrollTo(0, document.body.scrollHeight/2);")
time.sleep(1); #wait for the page to load
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
searchText='';
time.sleep(20); #wait for the page to load
# print(browser.page_source);
#close the browser and get all the swfs from the created har file.
#uses the a & b arrays to find the swf files from generated har files
get_swfs_from_har()
#clean out the slashes
clean_f_slashes()
#get all files
get_all_files()
#ensure that some files were gained
assert a != []
assert b != []
assert newSwfFiles != []
#if the files (har, swf, out) should be cleaned out do so. This can be toggled for dubugging
if(cleanThis):
cleaner.clean_har()
return newSwfFiles;
def remove_non_url(t):
#remove matched urls that are not actually urls
a=[];
for b in t:
if(b.lower()[:4] !="http" and b.lower()[:4] != "www." ):
if(b[:2] == "//" and b.__len__() >10):
a.append(theURL+"/"+b[2:]);
else:
while((b.lower()[:4] !="http" or b.lower()[:4] !="www." or b.lower()[:1] !="//") and b.__len__() >10):
b=b[1:b.__len__()];
if( b.__len__() >10):
if(b[:1] == "//" ):
if not b in a:
a.append(theURL+b[2:b.__len__()]);
else:
if not b in a:
a.append(b);
else:
if not b in a:
a.append(b);
return a;
def get_swfs_from_har():
#validate that the files in the har are actual swf files
files = [f for f in os.listdir(harLoc) if re.match((theURL[7:]+ '.*.har'), f)]
for n in files:
with open (harLoc + n , "r") as theF:
textt = theF.read();
swfObjects= re.findall('\{[^\{]*(?:http:\/\/|https:\/\/|www\.|\/\/)[^}]*\.swf[^}]+', textt.lower())
#swfObjects = "".join(str(i) for i in swfObjects)
for obj in swfObjects:
l=[]
otherL=[]
links = re.findall('(?:http:\/\/|https:\/\/|www\.|\/\/)[^"]+', obj)
for url in links:
url=url[:url.__len__()-1]
ending = url[url.__len__()-6:];
if ".swf" in ending:
l.append(url);
elif "." not in ending:
otherL.append(url);
for c in l:
if not c in a and c.__len__() >20:
a.append(c);
if(otherL.__len__()>0):
theMostLikelyLink=otherL[0];
b.append(theMostLikelyLink);
##adds the 1st link after the swf
otherL.remove(theMostLikelyLink);
else:
b.append(None);
def clean_f_slashes():
#remove unrelated characters from swfs
for x in a:
newS='';
if(',' in x or ';' in x or '\\' in x):
for d in x:
if(d != '\\' and d != ',' and d != ';'):
newS+=d;
else:
newS=x;
if "http" not in newS.lower():
if "www" in newS:
newS= "http://" + newS;
else:
newS = "http://www."+newS
while(newS[:3]!="htt"):
newS=newS[1:];
a.remove(x);
if(newS.__len__() >15):
a.append(newS);
def get_all_files():
#get all of the files from the array of valid swfs
os.chdir(swfLoc);
for openUrl in a:
place = a.index(openUrl);
try:
req = Request(openUrl)
response = urlopen(req)
fData = urllib2.urlopen(openUrl)
iText = fData.read()
#get the hex hash of the file
hashLib.update(iText);
hashV =hashLib.hexdigest()+".swf";
outUrl= get_redirected_url(b[place]);
#check if file already exists, if it does do not add a duplicate
theFile = [f for f in os.listdir(swfLoc) if re.match((hashV), f)]
if hashV not in theFile:
lFile = open(outUrl+"," +hashV, "w")
lFile.write(iText)
lFile.close();
#except and then ignore are invalid urls.
except:
pass
#Remove all files less than 8kb, anything less than this size is unlikely to be an advertisement. Most flash ads seen so far are 25kb or larger
sFiles = [f for f in os.listdir(swfLoc)]
for filenames in sFiles:
sizeF = os.path.getsize(filenames);
#if the file is smaller remove it
if(sizeF<8000):
cleaner.remove_file(filenames)
else:
newSwfFiles.append(filenames);
def x_str(s):
#check if a unicode expression exists and convert it to a string
if s is None:
return ''
return str(s)
def get_redirected_url(s):
#get the url that another url will redirect to
if s is None:
return "";
if ".macromedia" in s:
return ""
browser.get(s);
time.sleep(20);
theredirectedurl=cleaner.removeFront(browser.current_url);
aUrl= re.findall("[^/]+",theredirectedurl)[0].encode('ascii','ignore')
return aUrl;
Interesting... so I actually realized I was going about it wrong.
I still don't know why it was expecting a function that didn't exist but I do have a guess.
I had pulled the __init__.py file to use as the snifferLaunch file. This was due to my original misunderstanding of __init__.py and assuming it was similar to a main in other languages.
I believe the __init__.pyc file was holding an old function that had been outdated. Essentially I believe there was a file that should never have been run, it was outdated and somehow getting called. It was the only file that existed that had that function in it, I overlooked it because I thought it shouldn't be called.
The solution is the following, and the bug was caused by my misuse of __init__.
I changed my import statements:
from flashy.sniffer import Sniffer
import flashy.sniffer.database as database
import flashy.sniffer.cleaner as cleaner
I created new blank __init__.py, and __init__.pyc files in flashy/sniffer/.
This prevented the false expectation for getDecompiledFiles, and also allowed the code to be run. I was getting a "cannot find this file" error because it wasn't correctly being identified as a module. Additional information on this would be appreciated if anyone can explain what was going on there. I thought you could run a python file without an init statement however when nested in other folders it appears that it must be opened as a python module.
My file structure looks like this now:
Main
-snifferLaunch.py //with changed import statements
-flashy
--sniffer
---Sniffer.py
---database.py
---__init__.py //blank
---__init__.pyc // blank
It appears to be python vs. other languages issue. Has anyone else experienced this?

How to Display Image with web.py

I am trying to let a user upload an image, save the image to disk, and then have it display on a webpage, but I can't get the image to display properly. Here is my bin/app.py:
import web
urls = (
'/hello', 'index'
)
app = web.application(urls, globals())
render = web.template.render('templates/', base="layout")
class index:
def GET(self):
return render.hello_form()
def POST(self):
form = web.input(greet="Hello", name="Nobody", datafile={})
greeting = "%s, %s" % (form.greet, form.name)
filedir = 'absolute/path/to/directory'
filename = None
if form.datafile:
# replaces the windows-style slashes with linux ones.
filepath = form.datafile.filename.replace('\\','/')
# splits the and chooses the last part (the filename with extension)
filename = filepath.split('/')[-1]
# creates the file where the uploaded file should be stored
fout = open(filedir +'/'+ filename,'w')
# writes the uploaded file to the newly created file.
fout.write(form.datafile.file.read())
# closes the file, upload complete.
fout.close()
filename = filedir + "/" + filename
return render.index(greeting, filename)
if __name__ == "__main__":
app.run()
and here is templates/index.html:
$def with (greeting, datafile)
$if greeting:
I just wanted to say <em style="color: green; font-size: 2em;">$greeting</em>
$else:
<em>Hello</em>, world!
<br>
$if datafile:
<img src=$datafile alt="your picture">
<br>
Go Back
When I do this, I get a broken link for the image. How do I get the image to display properly? Ideally, I wouldn't have to read from disk to display it, although I'm not sure if that's possible. Also, is there a way to write the file to the relative path, instead of the absolute path?
You can also insert a path to all images in a folder by adding an entry to your URL.
URL = ('/hello','Index',
'/hello/image/(.*)','ImageDisplay'
)
...
class ImageDisplay(object):
def GET(self,fileName):
imageBinary = open("/relative/path/from/YourApp}"+fileName,'rb').read()
return imageBinary
Not the ../YourApp, not ./YourApp. It looks up one directory from where your prgram is. Now, in the html, you can use
<img src="/image/"+$datafile alt="your picture">
I would recommend using with or try with the "imageBinary = open("{..." line.
Let me know if more info is needed. This is my first response.
Sorry to ask a question in a responce, but is there a way to use a regular expression, like (.jpg) in place of the (.) I have in the URL definition?
web.py doesn't automatically serve all of the files from the directory your application is running in — if it did, anyone could be able to read your application's source code. It does, however, have a directory it serves files out of: static.
To answer your other question: yes, there is a way to avoid using an absolute path: give it a relative path!
Here's how your code might look afterwards:
filename = form.datafile.filename.replace('\\', '/').split('/')[-1]
# It might be a good idea to sanitize filename further.
# A with statement ensures that the file will be closed even if an exception is
# thrown.
with open(os.path.join('static', filename), 'wb') as f:
# shutil.copyfileobj copies the file in chunks, so it will still work if the
# file is too large to fit into memory
shutil.copyfileobj(form.datafile.file, f)
Do omit the filename = filedir + "/" + filename line. Your template need not include the absolute path: in fact, it should not; you must include static/; no more, no less:
<img src="static/$datafile" alt="your picture">

How does cgi.FieldStorage store files?

So I've been playing around with raw WSGI, cgi.FieldStorage and file uploads. And I just can't understand how it deals with file uploads.
At first it seemed that it just stores the whole file in memory. And I thought hm, that should be easy to test - a big file should clog up the memory!.. And it didn't. Still, when I request the file, it's a string, not an iterator, file object or anything.
I've tried reading the cgi module's source and found some things about temporary files, but it returns a freaking string, not a file(-like) object! So... how does it fscking work?!
Here's the code I've used:
import cgi
from wsgiref.simple_server import make_server
def app(environ,start_response):
start_response('200 OK',[('Content-Type','text/html')])
output = """
<form action="" method="post" enctype="multipart/form-data">
<input type="file" name="failas" />
<input type="submit" value="Varom" />
</form>
"""
fs = cgi.FieldStorage(fp=environ['wsgi.input'],environ=environ)
f = fs.getfirst('failas')
print type(f)
return output
if __name__ == '__main__' :
httpd = make_server('',8000,app)
print 'Serving'
httpd.serve_forever()
Thanks in advance! :)
Inspecting the cgi module description, there is a paragraph discussing how to handle file uploads.
If a field represents an uploaded file, accessing the value via the value attribute or the getvalue() method reads the entire file in memory as a string. This may not be what you want. You can test for an uploaded file by testing either the filename attribute or the file attribute. You can then read the data at leisure from the file attribute:
fileitem = form["userfile"]
if fileitem.file:
# It's an uploaded file; count lines
linecount = 0
while 1:
line = fileitem.file.readline()
if not line: break
linecount = linecount + 1
Regarding your example, getfirst() is just a version of getvalue().
try replacing
f = fs.getfirst('failas')
with
f = fs['failas'].file
This will return a file-like object that is readable "at leisure".
The best way is to NOT to read file (or even each line at a time as gimel suggested).
You can use some inheritance and extend a class from FieldStorage and then override make_file function. make_file is called when FieldStorage is of type file.
For your reference, default make_file looks like this:
def make_file(self, binary=None):
"""Overridable: return a readable & writable file.
The file will be used as follows:
- data is written to it
- seek(0)
- data is read from it
The 'binary' argument is unused -- the file is always opened
in binary mode.
This version opens a temporary file for reading and writing,
and immediately deletes (unlinks) it. The trick (on Unix!) is
that the file can still be used, but it can't be opened by
another process, and it will automatically be deleted when it
is closed or when the current process terminates.
If you want a more permanent file, you derive a class which
overrides this method. If you want a visible temporary file
that is nevertheless automatically deleted when the script
terminates, try defining a __del__ method in a derived class
which unlinks the temporary files you have created.
"""
import tempfile
return tempfile.TemporaryFile("w+b")
rather then creating temporaryfile, permanently create file wherever you want.
Using an answer by #hasanatkazmi (utilized in a Twisted app) I got something like:
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
# -*- indent: 4 spc -*-
import sys
import cgi
import tempfile
class PredictableStorage(cgi.FieldStorage):
def __init__(self, *args, **kwargs):
self.path = kwargs.pop('path', None)
cgi.FieldStorage.__init__(self, *args, **kwargs)
def make_file(self, binary=None):
if not self.path:
file = tempfile.NamedTemporaryFile("w+b", delete=False)
self.path = file.name
return file
return open(self.path, 'w+b')
Be warned, that the file is not always created by the cgi module. According to these cgi.py lines it will only be created if the content exceeds 1000 bytes:
if self.__file.tell() + len(line) > 1000:
self.file = self.make_file('')
So, you have to check if the file was actually created with a query to a custom class' path field like so:
if file_field.path:
# Using an already created file...
else:
# Creating a temporary named file to store the content.
import tempfile
with tempfile.NamedTemporaryFile("w+b", delete=False) as f:
f.write(file_field.value)
# You can save the 'f.name' field for later usage.
If the Content-Length is also set for the field, which seems rarely, the file should also be created by cgi.
That's it. This way you can store the file predictably, decreasing the memory usage footprint of your app.

Create zip archive for instant download

In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:
files = torrent[0].files
zipfile = z.ZipFile(zipname, 'w')
output = ""
for f in files:
zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)
downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download " + torrent_name + ""
return HttpResponse(output)
But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?
I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.
Check this Serving dynamically generated ZIP archives in Django
As mandrake says, constructor of HttpResponse accepts iterable objects.
Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:
(Picture from Wikipedia)
And luckily, zipfile indeed doesn't do any seeks as long as you only add files.
Here is the code I came up with. Some notes:
I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
Serving the ZIP straight to user like this means that resume on downloads won't work.
So, here goes:
import zipfile
class ZipBuffer(object):
""" A file-like object for zipfile.ZipFile to write into. """
def __init__(self):
self.data = []
self.pos = 0
def write(self, data):
self.data.append(data)
self.pos += len(data)
def tell(self):
# zipfile calls this so we need it
return self.pos
def flush(self):
# zipfile calls this so we need it
pass
def get_and_clear(self):
result = self.data
self.data = []
return result
def generate_zipped_stream():
sink = ZipBuffer()
archive = zipfile.ZipFile(sink, "w")
for filename in ["file1.txt", "file2.txt"]:
archive.writestr(filename, "contents of file here")
for chunk in sink.get_and_clear():
yield chunk
archive.close()
# close() generates some more data, so we yield that too
for chunk in sink.get_and_clear():
yield chunk
def my_django_view(request):
response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=archive.zip'
return response
Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.
from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply
def somezip(request):
file = StringIO()
zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
for fn in os.listdir("/tmp"):
path = os.path.join("/tmp", fn)
if os.path.isfile(path):
try:
zf.write(path)
except IOError:
pass
zf.close()
response = HttpResponse(file.getvalue(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
return response
Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.
Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.
It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).
The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.

Categories