use django to serve downloading big zip file with some data appended

use django to serve downloading big zip file with some data appended - python

I have a views snippet like below, which get a zip filename form a request, and I want to append some string sign after the end of zip file
#require_GET
def download(request):
... skip
response = HttpResponse(readFile(abs_path, sign), content_type='application/zip')
response['Content-Length'] = os.path.getsize(abs_path) + len(sign)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
return response
and the readFile function as below:
def readFile(fn, sign, buf_size=1024<<5):
f = open(fn, "rb")
logger.debug("started reading %s" % fn)
while True:
c = f.read(buf_size)
if c:
yield c
else:
break
logger.debug("finished reading %s" % fn)
f.close()
yield sign
It works fine when using runserver mode, but failed on big zip file when I use uwsgi + nginx or apache + mod_wsgi.
It seems timeout because need too long time to read a big file.
I don't understand why I use yield but the browser start to download after whole file read finished.(Because I see the browser wait until the log finished reading %s appeared)
Shouldn't it start to download right after the first chunk read?
Is any better way to serve a file downloading function that I need to append a dynamic string after the file?

Django doesn't allow streaming responses by default so it buffers the entire response. If it didn't, middlewares couldn't function the way they do right now.
To get the behaviour you are looking for you need to use the StreamingHttpResponse instead.
Usage example from the docs:
import csv
from django.utils.six.moves import range
from django.http import StreamingHttpResponse
class Echo(object):
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse((writer.writerow(row) for row in rows),
content_type="text/csv")
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
return response

This is a use case for StreamingHttpResponse instead of HttpResponse.

It's better to use FileRespose, is a subclass of StreamingHttpResponse optimized for binary files. It uses wsgi.file_wrapper if provided by the wsgi server, otherwise it streams the file out in small chunks.
import os
from django.http import FileResponse
from django.core.servers.basehttp import FileWrapper
def download_file(request):
_file = '/folder/my_file.zip'
filename = os.path.basename(_file)
response = FileResponse(FileWrapper(file(filename, 'rb')), content_type='application/x-zip-compressed')
response['Content-Disposition'] = "attachment; filename=%s" % _file
return response

Related

How to save image which sent via flask send_file

I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?

Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.

You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)

Delete file when file download is complete on Python x Django [duplicate]

I'm using the following django/python code to stream a file to the browser:
wrapper = FileWrapper(file(path))
response = HttpResponse(wrapper, content_type='text/plain')
response['Content-Length'] = os.path.getsize(path)
return response
Is there a way to delete the file after the reponse is returned? Using a callback function or something?
I could just make a cron to delete all tmp files, but it would be neater if I could stream files and delete them as well from the same request.

You can use a NamedTemporaryFile:
from django.core.files.temp import NamedTemporaryFile
def send_file(request):
newfile = NamedTemporaryFile(suffix='.txt')
# save your data to newfile.name
wrapper = FileWrapper(newfile)
response = HttpResponse(wrapper, content_type=mime_type)
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(modelfile.name)
response['Content-Length'] = os.path.getsize(modelfile.name)
return response
temporary file should be deleted once the newfile object is evicted.

For future references:
I just had the case in which I couldn't use temp files for downloads.
But I still needed to delete them after it; so here is how I did it (I really didn't want to rely on cron jobs or celery or wossnames, its a very small system and I wanted it to stay that way).
def plug_cleaning_into_stream(stream, filename):
try:
closer = getattr(stream, 'close')
#define a new function that still uses the old one
def new_closer():
closer()
os.remove(filename)
#any cleaning you need added as well
#substitute it to the old close() function
setattr(stream, 'close', new_closer)
except:
raise
and then I just took the stream used for the response and plugged into it.
def send_file(request, filename):
with io.open(filename, 'rb') as ready_file:
plug_cleaning_into_stream(ready_file, filename)
response = HttpResponse(ready_file.read(), content_type='application/force-download')
# here all the rest of the heards settings
# ...
return response
I know this is quick and dirty but it works. I doubt it would be productive for a server with thousands of requests a second, but that's not my case here (max a few dozens a minute).
EDIT: Forgot to precise that I was dealing with very very big files that could not fit in memory during the download. So that is why I am using a BufferedReader (which is what is underneath io.open())

Mostly, we use periodic cron jobs for this.
Django already has one cron job to clean up lost sessions. And you're already running it, right?
See http://docs.djangoproject.com/en/dev/topics/http/sessions/#clearing-the-session-table
You want another command just like this one, in your application, that cleans up old files.
See this http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
Also, you may not really be sending this file from Django. Sometimes you can get better performance by creating the file in a directory used by Apache and redirecting to a URL so the file can be served by Apache for you. Sometimes this is faster. It doesn't handle the cleanup any better, however.

One way would be to add a view to delete this file and call it from the client side using an asynchronous call (XMLHttpRequest). A variant of this would involve reporting back from the client on success so that the server can mark this file for deletion and have a periodic job clean it up.

This is just using the regular python approach (very simple example):
# something generates a file at filepath
from subprocess import Popen
# open file
with open(filepath, "rb") as fid:
filedata = fid.read()
# remove the file
p = Popen("rm %s" % filepath, shell=True)
# make response
response = HttpResponse(filedata, content-type="text/plain")
return response

Python 3.7 , Django 2.2.5
from tempfile import NamedTemporaryFile
from django.http import HttpResponse
with NamedTemporaryFile(suffix='.csv', mode='r+', encoding='utf8') as f:
f.write('\uFEFF') # BOM
f.write('sth you want')
# ref: https://docs.python.org/3/library/tempfile.html#examples
f.seek(0)
data=f.read()
response = HttpResponse(data, content_type="text/plain")
response['Content-Disposition'] = 'inline; filename=export.csv'

Best way to create a download link for a file in Flask?

In my project, when a user clicks a link, an AJAX request sends the information required to create a CSV. The CSV takes a long time to generate and so I want to be able to include a download link for the generated CSV in the AJAX response. Is this possible?
Most of the answers I've seen return the CSV in the following way:
return Response(
csv,
mimetype="text/csv",
headers={"Content-disposition":
"attachment; filename=myplot.csv"})
However, I don't think this is compatible with the AJAX response I'm sending with:
return render_json(200, {'data': params})
Ideally, I'd like to be able to send the download link in the params dict. But I'm also not sure if this is secure. How is this problem typically solved?

I think one solution may the futures library (pip install futures). The first endpoint can queue up the task and then send the file name back, and then another endpoint can be used to retrieve the file. I also included gzip because it might be a good idea if you are sending larger files. I think more robust solutions use Celery or Rabbit MQ or something along those lines. However, this is a simple solution that should accomplish what you are asking for.
from flask import Flask, jsonify, Response
from uuid import uuid4
from concurrent.futures import ThreadPoolExecutor
import time
import os
import gzip
app = Flask(__name__)
# Global variables used by the thread executor, and the thread executor itself
NUM_THREADS = 5
EXECUTOR = ThreadPoolExecutor(NUM_THREADS)
OUTPUT_DIR = os.path.dirname(os.path.abspath(__file__))
# this is your long running processing function
# takes in your arguments from the /queue-task endpoint
def a_long_running_task(*args):
time_to_wait, output_file_name = int(args[0][0]), args[0][1]
output_string = 'sleeping for {0} seconds. File: {1}'.format(time_to_wait, output_file_name)
print(output_string)
time.sleep(time_to_wait)
filename = os.path.join(OUTPUT_DIR, output_file_name)
# here we are writing to a gzipped file to save space and decrease size of file to be sent on network
with gzip.open(filename, 'wb') as f:
f.write(output_string)
print('finished writing {0} after {1} seconds'.format(output_file_name, time_to_wait))
# This is a route that starts the task and then gives them the file name for reference
#app.route('/queue-task/<wait>')
def queue_task(wait):
output_file_name = str(uuid4()) + '.csv'
EXECUTOR.submit(a_long_running_task, [wait, output_file_name])
return jsonify({'filename': output_file_name})
# this takes the file name and returns if exists, otherwise notifies it is not yet done
#app.route('/getfile/<name>')
def get_output_file(name):
file_name = os.path.join(OUTPUT_DIR, name)
if not os.path.isfile(file_name):
return jsonify({"message": "still processing"})
# read without gzip.open to keep it compressed
with open(file_name, 'rb') as f:
resp = Response(f.read())
# set headers to tell encoding and to send as an attachment
resp.headers["Content-Encoding"] = 'gzip'
resp.headers["Content-Disposition"] = "attachment; filename={0}".format(name)
resp.headers["Content-type"] = "text/csv"
return resp
if __name__ == '__main__':
app.run()

How to save file data from POST variable and load it back to response in Python Django?

I have a such problem - I am using Python 2.6 / Django 1.3 and I need to accept as POST variable with key 'f', which contains a binary data. After that, I need to save data in a file.
POST
T$topX$objectsX$versionY$archiverО©ҐR$0О©ҐО©ҐО©Ґull_=<---------------------- content of file -------------------->О©ҐО©Ґ_NSKeyedArchive(258:=CО©ҐО©Ґ
Code
from django.core.files.storage import default_storage
from django.core.files.base import ContentFile
def save(request):
upload_file = request.POST['f']
save_path = default_storage.save('%s%s' % (save_dir, filename),
ContentFile(upload_file))
When I am trying to do
nano /tmp/myfile.zip
It returns data like
T^#^#^#$^#^#^#t^#^#^#o^#^#^#p^#^#^#X^#^#^#$^#^#^#o^#^#^#b^#^#^#j^#^#^#e^#^#^#c^#^#^#t^#^#^#s^#^#^#X^#^#^#$^#^#^#v^#^#^#e^#^#^#r^#^#^#s^#^#^#i^#^#$
When its done, I am going to read saved file
def read(request):
user_file = default_storage.open(file_path).read()
file_name = get_filename(file_path)
response = HttpResponse(user_file, content_type = 'text/plain',
mimetype = 'application/force-download')
response['Content-Disposition'] = 'attachment; filename=%s' % file_name
response['Content-Length'] = default_storage.size(file_path)
return response
In case, when I am writing
print user_file
It returns a correct data, but when I am returning a HttpResponse it has a different data from a source

It would probably be easier, and more memory efficient if you just save the data into a file, and like #keckse said, let a browser stream it. Django is very inefficient in streaming data. It will all depend on the size of the data. If you want to stream it with django anyways, it can be done like this:
from django.http import HttpResponse
import os.path
import mimetypes
def stream(request, document, type=None):
doc = Document.objects.get(pk=document)
fsock = open(doc.file.path,"r")
file_name = os.path.basename(doc.file.path)
mime_type_guess = mimetypes.guess_type(file_name)
if mime_type_guess is not None:
response = HttpResponse(fsock, mimetype=mime_type_guess[0])
response['Content-Disposition'] = 'attachment; filename=' + file_name
return response
In your case you might want to set the mime type manually, you can try out application/octet-stream too. The mainpassing iterators difference is that you pass the "string" from file.read(), instead of the handle to the file directly. Please note: if you use read(), you will be loading the whole file into memory.
More on passing iterators to HttpResonse. And I might be wrong, but I think you can drop the content-type.

Create zip archive for instant download

In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:
files = torrent[0].files
zipfile = z.ZipFile(zipname, 'w')
output = ""
for f in files:
zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)
downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download " + torrent_name + ""
return HttpResponse(output)
But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?
I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.

Check this Serving dynamically generated ZIP archives in Django

As mandrake says, constructor of HttpResponse accepts iterable objects.
Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:
(Picture from Wikipedia)
And luckily, zipfile indeed doesn't do any seeks as long as you only add files.
Here is the code I came up with. Some notes:
I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
Serving the ZIP straight to user like this means that resume on downloads won't work.
So, here goes:
import zipfile
class ZipBuffer(object):
""" A file-like object for zipfile.ZipFile to write into. """
def __init__(self):
self.data = []
self.pos = 0
def write(self, data):
self.data.append(data)
self.pos += len(data)
def tell(self):
# zipfile calls this so we need it
return self.pos
def flush(self):
# zipfile calls this so we need it
pass
def get_and_clear(self):
result = self.data
self.data = []
return result
def generate_zipped_stream():
sink = ZipBuffer()
archive = zipfile.ZipFile(sink, "w")
for filename in ["file1.txt", "file2.txt"]:
archive.writestr(filename, "contents of file here")
for chunk in sink.get_and_clear():
yield chunk
archive.close()
# close() generates some more data, so we yield that too
for chunk in sink.get_and_clear():
yield chunk
def my_django_view(request):
response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=archive.zip'
return response

Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.
from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply
def somezip(request):
file = StringIO()
zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
for fn in os.listdir("/tmp"):
path = os.path.join("/tmp", fn)
if os.path.isfile(path):
try:
zf.write(path)
except IOError:
pass
zf.close()
response = HttpResponse(file.getvalue(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
return response
Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.

Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.

It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).
The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

use django to serve downloading big zip file with some data appended - python

This is a use case for StreamingHttpResponse instead of HttpResponse.

Related

How to save image which sent via flask send_file

Delete file when file download is complete on Python x Django [duplicate]

Best way to create a download link for a file in Flask?

How to save file data from POST variable and load it back to response in Python Django?

Create zip archive for instant download

Categories

Resources