I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r
Related
I am trying to convert json to csv and download a file from my flask application. The function does not work correctly, I always get the same csv, even if I delete the json file. Why?
button:
Download
My method:
#app.route("/download/<file_id>")
def get_csv(file_id):
try:
file_id = f"{file_id}"
filename_jsonl = f"{file_id}.jsonl"
filename_csv = f"{file_id}.csv"
file_id = ''
with open(filename_jsonl, 'r') as f:
for line in f.read():
file_id += line
file_id = [json.loads(item + '\n}') for item in file_id.split('}\n')[0:-1]]
with open(filename_csv, 'a') as f:
writer = csv.DictWriter(f, file_id[0].keys(), delimiter=";")
writer.writeheader()
for profile in file_id:
writer.writerow(profile)
return send_from_directory(directory='', filename=filename_csv, as_attachment=True)
except FileNotFoundError:
abort(404)
The problem you are having is that the first generated file has been cached.
Official documentation says that send_from_directory() send a file from a given directory with send_file(). send_file() sets the cache_timeout option.
You must configure this option to disable caching, like this:
return send_from_directory(directory='', filename=filename_csv, as_attachment=True, cache_timeout=0)
#app.route('/download')
def download():
return send_from_directory('static', 'files/cheat_sheet.pdf')
Note: First parameter give it the directory name like static if your file is inside static (the file only could be in the project directory),
and for the second parameter write the right path of the file. The file will be automatically downloaded, if the route got download.
I'm using the following django/python code to stream a file to the browser:
wrapper = FileWrapper(file(path))
response = HttpResponse(wrapper, content_type='text/plain')
response['Content-Length'] = os.path.getsize(path)
return response
Is there a way to delete the file after the reponse is returned? Using a callback function or something?
I could just make a cron to delete all tmp files, but it would be neater if I could stream files and delete them as well from the same request.
You can use a NamedTemporaryFile:
from django.core.files.temp import NamedTemporaryFile
def send_file(request):
newfile = NamedTemporaryFile(suffix='.txt')
# save your data to newfile.name
wrapper = FileWrapper(newfile)
response = HttpResponse(wrapper, content_type=mime_type)
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(modelfile.name)
response['Content-Length'] = os.path.getsize(modelfile.name)
return response
temporary file should be deleted once the newfile object is evicted.
For future references:
I just had the case in which I couldn't use temp files for downloads.
But I still needed to delete them after it; so here is how I did it (I really didn't want to rely on cron jobs or celery or wossnames, its a very small system and I wanted it to stay that way).
def plug_cleaning_into_stream(stream, filename):
try:
closer = getattr(stream, 'close')
#define a new function that still uses the old one
def new_closer():
closer()
os.remove(filename)
#any cleaning you need added as well
#substitute it to the old close() function
setattr(stream, 'close', new_closer)
except:
raise
and then I just took the stream used for the response and plugged into it.
def send_file(request, filename):
with io.open(filename, 'rb') as ready_file:
plug_cleaning_into_stream(ready_file, filename)
response = HttpResponse(ready_file.read(), content_type='application/force-download')
# here all the rest of the heards settings
# ...
return response
I know this is quick and dirty but it works. I doubt it would be productive for a server with thousands of requests a second, but that's not my case here (max a few dozens a minute).
EDIT: Forgot to precise that I was dealing with very very big files that could not fit in memory during the download. So that is why I am using a BufferedReader (which is what is underneath io.open())
Mostly, we use periodic cron jobs for this.
Django already has one cron job to clean up lost sessions. And you're already running it, right?
See http://docs.djangoproject.com/en/dev/topics/http/sessions/#clearing-the-session-table
You want another command just like this one, in your application, that cleans up old files.
See this http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
Also, you may not really be sending this file from Django. Sometimes you can get better performance by creating the file in a directory used by Apache and redirecting to a URL so the file can be served by Apache for you. Sometimes this is faster. It doesn't handle the cleanup any better, however.
One way would be to add a view to delete this file and call it from the client side using an asynchronous call (XMLHttpRequest). A variant of this would involve reporting back from the client on success so that the server can mark this file for deletion and have a periodic job clean it up.
This is just using the regular python approach (very simple example):
# something generates a file at filepath
from subprocess import Popen
# open file
with open(filepath, "rb") as fid:
filedata = fid.read()
# remove the file
p = Popen("rm %s" % filepath, shell=True)
# make response
response = HttpResponse(filedata, content-type="text/plain")
return response
Python 3.7 , Django 2.2.5
from tempfile import NamedTemporaryFile
from django.http import HttpResponse
with NamedTemporaryFile(suffix='.csv', mode='r+', encoding='utf8') as f:
f.write('\uFEFF') # BOM
f.write('sth you want')
# ref: https://docs.python.org/3/library/tempfile.html#examples
f.seek(0)
data=f.read()
response = HttpResponse(data, content_type="text/plain")
response['Content-Disposition'] = 'inline; filename=export.csv'
I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r
Am using Bottle to create an upload API. The script below is able to upload a file to a directory but got two issues which I need to address. One is how can I avoid loading the whole file to memory the other is how to set a maximum size for upload file?
Is it possible to continuously read the file and dump what has been read to file till the upload is complete? the upload.save(file_path, overwrite=False, chunk_size=1024) function seems to load the whole file into memory. In the tutorial, they have pointed out that using .read() is dangerous.
from bottle import Bottle, request, run, response, route, default_app, static_file
app = Bottle()
#route('/upload', method='POST')
def upload_file():
function_name = sys._getframe().f_code.co_name
try:
upload = request.files.get("upload_file")
if not upload:
return "Nothing to upload"
else:
#Get file_name and the extension
file_name, ext = os.path.splitext(upload.filename)
if ext in ('.exe', '.msi', '.py'):
return "File extension not allowed."
#Determine folder to save the upload
save_folder = "/tmp/{folder}".format(folder='external_files')
if not os.path.exists(save_folder):
os.makedirs(save_folder)
#Determine file_path
file_path = "{path}/{time_now}_{file}".\
format(path=save_folder, file=upload.filename, timestamp=time_now)
#Save the upload to file in chunks
upload.save(file_path, overwrite=False, chunk_size=1024)
return "File successfully saved {0}{1} to '{2}'.".format(file_name, ext, save_folder)
except KeyboardInterrupt:
logger.info('%s: ' %(function_name), "Someone pressed CNRL + C")
except:
logger.error('%s: ' %(function_name), exc_info=True)
print("Exception occurred111. Location: %s" %(function_name))
finally:
pass
if __name__ == '__main__':
run(host="localhost", port=8080, reloader=True, debug=True)
else:
application = default_app()
I also tried doing a file.write but same case. File is getting read to memory and it hangs the machine.
file_to_write = open("%s" %(output_file_path), "wb")
while True:
datachunk = upload.file.read(1024)
if not datachunk:
break
file_to_write.write(datachunk)
Related to this, I've seen the property MEMFILE_MAX where several SO posts claim one could set the maximum file upload size. I've tried setting it but it seems not to have any effect as all files no matter the size are going through.
Note that I want to be able to receive office document which could be plain with their extensions or zipped with a password.
Using Python3.4 and bottle 0.12.7
Basically, you want to call upload.read(1024) in a loop. Something like this (untested):
with open(file_path, 'wb') as dest:
chunk = upload.read(1024)
while chunk:
dest.write(chunk)
chunk = upload.read(1024)
(Do not call open on upload; it's already open for you.)
This SO answer includes more example sof how to read a large file without "slurping" it.
In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:
files = torrent[0].files
zipfile = z.ZipFile(zipname, 'w')
output = ""
for f in files:
zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)
downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download " + torrent_name + ""
return HttpResponse(output)
But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?
I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.
Check this Serving dynamically generated ZIP archives in Django
As mandrake says, constructor of HttpResponse accepts iterable objects.
Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:
(Picture from Wikipedia)
And luckily, zipfile indeed doesn't do any seeks as long as you only add files.
Here is the code I came up with. Some notes:
I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
Serving the ZIP straight to user like this means that resume on downloads won't work.
So, here goes:
import zipfile
class ZipBuffer(object):
""" A file-like object for zipfile.ZipFile to write into. """
def __init__(self):
self.data = []
self.pos = 0
def write(self, data):
self.data.append(data)
self.pos += len(data)
def tell(self):
# zipfile calls this so we need it
return self.pos
def flush(self):
# zipfile calls this so we need it
pass
def get_and_clear(self):
result = self.data
self.data = []
return result
def generate_zipped_stream():
sink = ZipBuffer()
archive = zipfile.ZipFile(sink, "w")
for filename in ["file1.txt", "file2.txt"]:
archive.writestr(filename, "contents of file here")
for chunk in sink.get_and_clear():
yield chunk
archive.close()
# close() generates some more data, so we yield that too
for chunk in sink.get_and_clear():
yield chunk
def my_django_view(request):
response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=archive.zip'
return response
Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.
from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply
def somezip(request):
file = StringIO()
zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
for fn in os.listdir("/tmp"):
path = os.path.join("/tmp", fn)
if os.path.isfile(path):
try:
zf.write(path)
except IOError:
pass
zf.close()
response = HttpResponse(file.getvalue(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
return response
Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.
Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.
It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).
The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.