Am using Bottle to create an upload API. The script below is able to upload a file to a directory but got two issues which I need to address. One is how can I avoid loading the whole file to memory the other is how to set a maximum size for upload file?
Is it possible to continuously read the file and dump what has been read to file till the upload is complete? the upload.save(file_path, overwrite=False, chunk_size=1024) function seems to load the whole file into memory. In the tutorial, they have pointed out that using .read() is dangerous.
from bottle import Bottle, request, run, response, route, default_app, static_file
app = Bottle()
#route('/upload', method='POST')
def upload_file():
function_name = sys._getframe().f_code.co_name
try:
upload = request.files.get("upload_file")
if not upload:
return "Nothing to upload"
else:
#Get file_name and the extension
file_name, ext = os.path.splitext(upload.filename)
if ext in ('.exe', '.msi', '.py'):
return "File extension not allowed."
#Determine folder to save the upload
save_folder = "/tmp/{folder}".format(folder='external_files')
if not os.path.exists(save_folder):
os.makedirs(save_folder)
#Determine file_path
file_path = "{path}/{time_now}_{file}".\
format(path=save_folder, file=upload.filename, timestamp=time_now)
#Save the upload to file in chunks
upload.save(file_path, overwrite=False, chunk_size=1024)
return "File successfully saved {0}{1} to '{2}'.".format(file_name, ext, save_folder)
except KeyboardInterrupt:
logger.info('%s: ' %(function_name), "Someone pressed CNRL + C")
except:
logger.error('%s: ' %(function_name), exc_info=True)
print("Exception occurred111. Location: %s" %(function_name))
finally:
pass
if __name__ == '__main__':
run(host="localhost", port=8080, reloader=True, debug=True)
else:
application = default_app()
I also tried doing a file.write but same case. File is getting read to memory and it hangs the machine.
file_to_write = open("%s" %(output_file_path), "wb")
while True:
datachunk = upload.file.read(1024)
if not datachunk:
break
file_to_write.write(datachunk)
Related to this, I've seen the property MEMFILE_MAX where several SO posts claim one could set the maximum file upload size. I've tried setting it but it seems not to have any effect as all files no matter the size are going through.
Note that I want to be able to receive office document which could be plain with their extensions or zipped with a password.
Using Python3.4 and bottle 0.12.7
Basically, you want to call upload.read(1024) in a loop. Something like this (untested):
with open(file_path, 'wb') as dest:
chunk = upload.read(1024)
while chunk:
dest.write(chunk)
chunk = upload.read(1024)
(Do not call open on upload; it's already open for you.)
This SO answer includes more example sof how to read a large file without "slurping" it.
Related
I'm using this to connect to Azure File Share and upload a file. I would like to chose what extension file will have, but I can't. I got an error shown below. If I remove .txt everything works fine. Is there a way to specify file extension while uploading it?
Error:
Exception: ResourceNotFoundError: The specified parent path does not exist.
Code:
def main(blobin: func.InputStream):
file_client = ShareFileClient.from_connection_string(conn_str="<con_string>",
share_name="data-storage",
file_path="outgoing/file.txt")
f = open('/home/temp.txt', 'w+')
f.write(blobin.read().decode('utf-8'))
f.close()
# Operation on file here
f = open('/home/temp.txt', 'rb')
string_to_upload = f.read()
f.close()
file_client.upload_file(string_to_upload)
I believe the reason you're getting this error is because outgoing folder doesn't exist in your file service share. I took your code and ran it with and without extension and in both situation I got the same error.
Then I created a folder and tried to upload the file and I was able to successfully do so.
Here's the final code I used:
from azure.storage.fileshare import ShareFileClient, ShareDirectoryClient
conn_string = "DefaultEndpointsProtocol=https;AccountName=myaccountname;AccountKey=myaccountkey;EndpointSuffix=core.windows.net"
share_directory_client = ShareDirectoryClient.from_connection_string(conn_str=conn_string,
share_name="data-storage",
directory_path="outgoing")
file_client = ShareFileClient.from_connection_string(conn_str=conn_string,
share_name="data-storage",
file_path="outgoing/file.txt")
# Create folder first.
# This operation will fail if the directory already exists.
print "creating directory..."
share_directory_client.create_directory()
print "directory created successfully..."
# Operation on file here
f = open('D:\\temp\\test.txt', 'rb')
string_to_upload = f.read()
f.close()
#Upload file
print "uploading file..."
file_client.upload_file(string_to_upload)
print "file uploaded successfully..."
I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r
In my project, when a user clicks a link, an AJAX request sends the information required to create a CSV. The CSV takes a long time to generate and so I want to be able to include a download link for the generated CSV in the AJAX response. Is this possible?
Most of the answers I've seen return the CSV in the following way:
return Response(
csv,
mimetype="text/csv",
headers={"Content-disposition":
"attachment; filename=myplot.csv"})
However, I don't think this is compatible with the AJAX response I'm sending with:
return render_json(200, {'data': params})
Ideally, I'd like to be able to send the download link in the params dict. But I'm also not sure if this is secure. How is this problem typically solved?
I think one solution may the futures library (pip install futures). The first endpoint can queue up the task and then send the file name back, and then another endpoint can be used to retrieve the file. I also included gzip because it might be a good idea if you are sending larger files. I think more robust solutions use Celery or Rabbit MQ or something along those lines. However, this is a simple solution that should accomplish what you are asking for.
from flask import Flask, jsonify, Response
from uuid import uuid4
from concurrent.futures import ThreadPoolExecutor
import time
import os
import gzip
app = Flask(__name__)
# Global variables used by the thread executor, and the thread executor itself
NUM_THREADS = 5
EXECUTOR = ThreadPoolExecutor(NUM_THREADS)
OUTPUT_DIR = os.path.dirname(os.path.abspath(__file__))
# this is your long running processing function
# takes in your arguments from the /queue-task endpoint
def a_long_running_task(*args):
time_to_wait, output_file_name = int(args[0][0]), args[0][1]
output_string = 'sleeping for {0} seconds. File: {1}'.format(time_to_wait, output_file_name)
print(output_string)
time.sleep(time_to_wait)
filename = os.path.join(OUTPUT_DIR, output_file_name)
# here we are writing to a gzipped file to save space and decrease size of file to be sent on network
with gzip.open(filename, 'wb') as f:
f.write(output_string)
print('finished writing {0} after {1} seconds'.format(output_file_name, time_to_wait))
# This is a route that starts the task and then gives them the file name for reference
#app.route('/queue-task/<wait>')
def queue_task(wait):
output_file_name = str(uuid4()) + '.csv'
EXECUTOR.submit(a_long_running_task, [wait, output_file_name])
return jsonify({'filename': output_file_name})
# this takes the file name and returns if exists, otherwise notifies it is not yet done
#app.route('/getfile/<name>')
def get_output_file(name):
file_name = os.path.join(OUTPUT_DIR, name)
if not os.path.isfile(file_name):
return jsonify({"message": "still processing"})
# read without gzip.open to keep it compressed
with open(file_name, 'rb') as f:
resp = Response(f.read())
# set headers to tell encoding and to send as an attachment
resp.headers["Content-Encoding"] = 'gzip'
resp.headers["Content-Disposition"] = "attachment; filename={0}".format(name)
resp.headers["Content-type"] = "text/csv"
return resp
if __name__ == '__main__':
app.run()
I have a Flask view that generates data and saves it as a CSV file with Pandas, then displays the data. A second view serves the generated file. I want to remove the file after it is downloaded. My current code raises a permission error, maybe because after_request deletes the file before it is served with send_from_directory. How can I delete a file after serving it?
def process_data(data)
tempname = str(uuid4()) + '.csv'
data['text'].to_csv('samo/static/temp/{}'.format(tempname))
return file
#projects.route('/getcsv/<file>')
def getcsv(file):
#after_this_request
def cleanup(response):
os.remove('samo/static/temp/' + file)
return response
return send_from_directory(directory=cwd + '/samo/static/temp/', filename=file, as_attachment=True)
after_request runs after the view returns but before the response is sent. Sending a file may use a streaming response; if you delete it before it's read fully you can run into errors.
This is mostly an issue on Windows, other platforms can mark a file deleted and keep it around until it not being accessed. However, it may still be useful to only delete the file once you're sure it's been sent, regardless of platform.
Read the file into memory and serve it, so that's it's not being read when you delete it later. In case the file is too big to read into memory, use a generator to serve it then delete it.
#app.route('/download_and_remove/<filename>')
def download_and_remove(filename):
path = os.path.join(current_app.instance_path, filename)
def generate():
with open(path) as f:
yield from f
os.remove(path)
r = current_app.response_class(generate(), mimetype='text/csv')
r.headers.set('Content-Disposition', 'attachment', filename='data.csv')
return r
I am downloading a binary file via ftp, and it works:
target = open(my_file, mode='wb')
ftp.retrbinary('RETR ' + my_file, target.write)
target.close()
However, when I am trying to improve my code, using a context manager, it creates a zero length file, and fails to download the contents:
with open(my_file, mode='wb') as target:
ftp.retrbinary('RETR ' + my_file, target.write)
What is wrong with my attempt to use a context manager?
I would say that nothing is wrong with your attempt to use a context manager.
I used your exact code (filling in a site and file name) to download a file from a public ftp site (below). Give it a try.
You likely changed something else (that you haven't shown us), when you changed your code to use a context manager.
import ftplib
def main():
ftp = ftplib.FTP("speedtest.tele2.net", user='anonymous', passwd='anonymous')
my_file = "5MB.zip"
with open(my_file, mode='wb') as target:
ftp.retrbinary('RETR ' + my_file, target.write)
if __name__ == '__main__':
main()