I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?
Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.
You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)
Related
If I have a URL that, when submitted in a web browser, pops up a dialog box to save a zip file, how would I go about catching and downloading this zip file in Python?
As far as I can tell, the proper way to do this is:
import requests, zipfile, StringIO
r = requests.get(zip_file_url, stream=True)
z = zipfile.ZipFile(StringIO.StringIO(r.content))
z.extractall()
of course you'd want to check that the GET was successful with r.ok.
For python 3+, sub the StringIO module with the io module and use BytesIO instead of StringIO: Here are release notes that mention this change.
import requests, zipfile, io
r = requests.get(zip_file_url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("/path/to/destination_directory")
Most people recommend using requests if it is available, and the requests documentation recommends this for downloading and saving raw data from a url:
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
Since the answer asks about downloading and saving the zip file, I haven't gone into details regarding reading the zip file. See one of the many answers below for possibilities.
If for some reason you don't have access to requests, you can use urllib.request instead. It may not be quite as robust as the above.
import urllib.request
def download_url(url, save_path):
with urllib.request.urlopen(url) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
Finally, if you are using Python 2 still, you can use urllib2.urlopen.
from contextlib import closing
def download_url(url, save_path):
with closing(urllib2.urlopen(url)) as dl_file:
with open(save_path, 'wb') as out_file:
out_file.write(dl_file.read())
With the help of this blog post, I've got it working with just requests.
The point of the weird stream thing is so we don't need to call content
on large requests, which would require it to all be processed at once,
clogging the memory. The stream avoids this by iterating through the data
one chunk at a time.
url = 'https://www2.census.gov/geo/tiger/GENZ2017/shp/cb_2017_02_tract_500k.zip'
response = requests.get(url, stream=True)
with open('alaska.zip', "wb") as f:
for chunk in response.iter_content(chunk_size=512):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
Here's what I got to work in Python 3:
import zipfile, urllib.request, shutil
url = 'http://www....myzipfile.zip'
file_name = 'myzip.zip'
with urllib.request.urlopen(url) as response, open(file_name, 'wb') as out_file:
shutil.copyfileobj(response, out_file)
with zipfile.ZipFile(file_name) as zf:
zf.extractall()
Super lightweight solution to save a .zip file to a location on disk (using Python 3.9):
import requests
url = r'https://linktofile'
output = r'C:\pathtofolder\downloaded_file.zip'
r = requests.get(url)
with open(output, 'wb') as f:
f.write(r.content)
Either use urllib2.urlopen, or you could try using the excellent Requests module and avoid urllib2 headaches:
import requests
results = requests.get('url')
#pass results.content onto secondary processing...
I came here searching how to save a .bzip2 file. Let me paste the code for others who might come looking for this.
url = "http://api.mywebsite.com"
filename = "swateek.tar.gz"
response = requests.get(url, headers=headers, auth=('myusername', 'mypassword'), timeout=50)
if response.status_code == 200:
with open(filename, 'wb') as f:
f.write(response.content)
I just wanted to save the file as is.
Thanks to #yoavram for the above solution,
my url path linked to a zipped folder, and encounter an error of BADZipfile
(file is not a zip file), and it was strange if I tried several times it
retrieve the url and unzipped it all of sudden so I amend the solution a little
bit. using the is_zipfile method as per here
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
while not check:
r = requests.get(url, stream =True)
check = zipfile.is_zipfile(io.BytesIO(r.content))
else:
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
Use requests, zipfile and io python packages.
Specially BytesIO function is used to keep the unzipped file in memory rather than saving it into the drive.
import requests
from zipfile import ZipFile
from io import BytesIO
r = requests.get(zip_file_url)
z = ZipFile(BytesIO(r.content))
file = z.extract(a_file_to_extract, path_to_save)
with open(file) as f:
print(f.read())
I am trying to download file from GitHub(raw file) and then run this file as .sql file.
import snowflake.connector
from codecs import open
import logging
import requests
from os import getcwd
import os
import sys
#logging
logging.basicConfig(
filename='C:/Users/abc/Documents/Test.log',
level=logging.INFO
)
url = "https://github.com/raw/abc/master/file_name?token=Anvn3lJXDks5ciVaPwA%3D%3D"
directory = getcwd()
filename = os.path.join(getcwd(),'VIEWS.SQL')
r = requests.get(url)
filename.decode("utf-8")
f = open(filename,'w')
f.write(str(r.content))
with open(filename,'r') as theFile, open(filename,'w') as outFile:
data = theFile.read().split('\n')
data = theFile.read().replace('\n','')
data = theFile.read().replace("b'","")
data = theFile.read()
outFile.write(data)
However I get this error
syntax error line 1 at position 0 unexpected 'b'
My converted sql file has b at the beginning and bunch of newline \n characters in the file. Also the entire output file is in single quotes 'text'. Can anyone help me get rid of these? Looks like replace isn't working.
OS: Windows
Python Version: 3.7.0
You introduced a b'.. prefix by converting the response.content bytes value to a string with str():
>>> import requests
>>> r = requests.get("https://github.com/raw/abc/master/file_name?token=Anvn3lJXDks5ciVaPwA%3D%3D")
>>> r.content
b'Not Found'
>>> str(r.content)
"b'Not Found'"
Of course, the specific dummy URL you gave in your question produces a 404 Not Found response, hence the Not Found content of the response body:
>>> r.status_code
404
so the contents in this demonstration are not actually all that useful. However, even for your real URL you probably want to test for a 200 status code before moving to write the data to a file!
What is going wrong in the above is that str(bytesvalue) converts a bytes object to its representation. You'd normally want to decode a bytes value with a text codec, using the bytes.decode() method. But because you are writing the data to a file here, you should instead just open the file in binary mode and write the bytes object without decoding:
r = requests.get(url)
if r.status_code == 200:
with open(filename, 'wb') as f:
f.write(r.content)
The 'wb' mode opens the file for writing in binary mode. Writing binary content to a binary file is the most efficient; decoding it first then writing to a text file requires that it is encoded again. Better to avoid doing double work.
As a side note: there is no need to join a local filename with getcwd(); relative paths always end up in the current working directory, and otherwise it's better to use os.path.abspath(filename).
You could also trust that GitHub sets the correct character set in the Content-Type headers and have response decode the value to str for you in the form of the response.text attribute:
r = requests.get(url)
if r.status_code == 200:
with open(filename, 'w') as f:
f.write(r.text)
but again, that's really doing extra work for nothing, first decoding the binary content from the request, then encoding again when writing to a text file.
Finally, for larger file responses it is better to stream the data and copy it directly to a file. The shutil.copyfileobj() function can take a raw response fileobject directly, provided you enable transparent transport decompression:
import shutil
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(filename, 'wb') as f:
# enable transparent transport decompression handling
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
Depending on your version of Python/OS it could be as simple as changing the file to read/write in binary (and if they're still there then altering where you have the replaces):
with open(filename,'rb') as theFile, open(filename,'wb') as outFile:
outfile.write(str(r.content))
data = theFile.read().split('\n')
data = data.replace('\n','')
data = data.replace("b'","")
outFile.write(data)
It would help to have a copy of the file and the line the error is occurring on.
I have a local python file that decodes binary files. This python file first reads from the file, opens it as binary and then saves it in a buffer and interprets it. Reading it is simply:
with open(filepath, 'rb') as f:
buff = f.read()
read_all(buff)
This works fine locally. Now I'd like to setup a Azure Python job where I can send the file, approx. 100kb, over a HTTP POST and then read the interpreted meta data which my original python script does well.
I've first removed the read function so that I'll now work with the buffer only.
In my Azure Python Job I have the following, triggered by a HttpRequest
my_data = reader.read_file(req.get_body())
To test my sending I've tried the following in python
import requests
url = 'http://localhost:7071/api/HttpTrigger'
files = {'file': open('test', 'rb')}
with open('test', 'rb') as f:
buff = f.read()
r = requests.post(url, files=files) #Try using files
r = requests.post(url, data=buff) #Try using data
I've also tried in Postman adding the file to the body as a binary and setting the headers to application/octet-stream
All this doesn't send the binary file the same way as the original f.read() did. So I'm getting a wrong interpretation of the binary file.
What is file.read doing differently to how I'm sending it over as a HTTP Body message?
Printing out the first line from the local python read file gives.
b'\n\n\xfe\xfe\x00\x00\x00\x00\\\x18,A\x18\x00\x00\x00(\x00\x00\x00\x1f\x00\x00\
Whereas printing it out at the req.get_body() shows me
b'\n\n\xef\xbf\xbd\xef\xbf\xbd\x00\x00\x00\x00\\\x18,A\x18\x00\x00\x00(\x00\x00\x00\x1f\x00\
So something is clearly wrong. Any help why this could be different?
Thanks
EDIT:
I've implemented a similar function in Flask and it works well.
The code in flask is simply grabbing the file from a POST. No encoding/decoding.
if request.method == 'POST':
f = request.files['file']
#f.save(secure_filename(f.filename))
my_data = reader.read_file(f.read())
Why is the Azure Function different?
You can try UTF-16 to decode and do the further action in your code.
Here is the code for that:
with open(path_to_file,'rb') as f:
contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
Basically after doing re.get_body, perform the below operation:
contents = contents.rstrip("\n").decode("utf-16")
See if it gives you the same output as your receive in local python file.
Hope it helps.
In my project, when a user clicks a link, an AJAX request sends the information required to create a CSV. The CSV takes a long time to generate and so I want to be able to include a download link for the generated CSV in the AJAX response. Is this possible?
Most of the answers I've seen return the CSV in the following way:
return Response(
csv,
mimetype="text/csv",
headers={"Content-disposition":
"attachment; filename=myplot.csv"})
However, I don't think this is compatible with the AJAX response I'm sending with:
return render_json(200, {'data': params})
Ideally, I'd like to be able to send the download link in the params dict. But I'm also not sure if this is secure. How is this problem typically solved?
I think one solution may the futures library (pip install futures). The first endpoint can queue up the task and then send the file name back, and then another endpoint can be used to retrieve the file. I also included gzip because it might be a good idea if you are sending larger files. I think more robust solutions use Celery or Rabbit MQ or something along those lines. However, this is a simple solution that should accomplish what you are asking for.
from flask import Flask, jsonify, Response
from uuid import uuid4
from concurrent.futures import ThreadPoolExecutor
import time
import os
import gzip
app = Flask(__name__)
# Global variables used by the thread executor, and the thread executor itself
NUM_THREADS = 5
EXECUTOR = ThreadPoolExecutor(NUM_THREADS)
OUTPUT_DIR = os.path.dirname(os.path.abspath(__file__))
# this is your long running processing function
# takes in your arguments from the /queue-task endpoint
def a_long_running_task(*args):
time_to_wait, output_file_name = int(args[0][0]), args[0][1]
output_string = 'sleeping for {0} seconds. File: {1}'.format(time_to_wait, output_file_name)
print(output_string)
time.sleep(time_to_wait)
filename = os.path.join(OUTPUT_DIR, output_file_name)
# here we are writing to a gzipped file to save space and decrease size of file to be sent on network
with gzip.open(filename, 'wb') as f:
f.write(output_string)
print('finished writing {0} after {1} seconds'.format(output_file_name, time_to_wait))
# This is a route that starts the task and then gives them the file name for reference
#app.route('/queue-task/<wait>')
def queue_task(wait):
output_file_name = str(uuid4()) + '.csv'
EXECUTOR.submit(a_long_running_task, [wait, output_file_name])
return jsonify({'filename': output_file_name})
# this takes the file name and returns if exists, otherwise notifies it is not yet done
#app.route('/getfile/<name>')
def get_output_file(name):
file_name = os.path.join(OUTPUT_DIR, name)
if not os.path.isfile(file_name):
return jsonify({"message": "still processing"})
# read without gzip.open to keep it compressed
with open(file_name, 'rb') as f:
resp = Response(f.read())
# set headers to tell encoding and to send as an attachment
resp.headers["Content-Encoding"] = 'gzip'
resp.headers["Content-Disposition"] = "attachment; filename={0}".format(name)
resp.headers["Content-type"] = "text/csv"
return resp
if __name__ == '__main__':
app.run()
I have a views snippet like below, which get a zip filename form a request, and I want to append some string sign after the end of zip file
#require_GET
def download(request):
... skip
response = HttpResponse(readFile(abs_path, sign), content_type='application/zip')
response['Content-Length'] = os.path.getsize(abs_path) + len(sign)
response['Content-Disposition'] = 'attachment; filename=%s' % filename
return response
and the readFile function as below:
def readFile(fn, sign, buf_size=1024<<5):
f = open(fn, "rb")
logger.debug("started reading %s" % fn)
while True:
c = f.read(buf_size)
if c:
yield c
else:
break
logger.debug("finished reading %s" % fn)
f.close()
yield sign
It works fine when using runserver mode, but failed on big zip file when I use uwsgi + nginx or apache + mod_wsgi.
It seems timeout because need too long time to read a big file.
I don't understand why I use yield but the browser start to download after whole file read finished.(Because I see the browser wait until the log finished reading %s appeared)
Shouldn't it start to download right after the first chunk read?
Is any better way to serve a file downloading function that I need to append a dynamic string after the file?
Django doesn't allow streaming responses by default so it buffers the entire response. If it didn't, middlewares couldn't function the way they do right now.
To get the behaviour you are looking for you need to use the StreamingHttpResponse instead.
Usage example from the docs:
import csv
from django.utils.six.moves import range
from django.http import StreamingHttpResponse
class Echo(object):
"""An object that implements just the write method of the file-like
interface.
"""
def write(self, value):
"""Write the value by returning it, instead of storing in a buffer."""
return value
def some_streaming_csv_view(request):
"""A view that streams a large CSV file."""
# Generate a sequence of rows. The range is based on the maximum number of
# rows that can be handled by a single sheet in most spreadsheet
# applications.
rows = (["Row {}".format(idx), str(idx)] for idx in range(65536))
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
response = StreamingHttpResponse((writer.writerow(row) for row in rows),
content_type="text/csv")
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
return response
This is a use case for StreamingHttpResponse instead of HttpResponse.
It's better to use FileRespose, is a subclass of StreamingHttpResponse optimized for binary files. It uses wsgi.file_wrapper if provided by the wsgi server, otherwise it streams the file out in small chunks.
import os
from django.http import FileResponse
from django.core.servers.basehttp import FileWrapper
def download_file(request):
_file = '/folder/my_file.zip'
filename = os.path.basename(_file)
response = FileResponse(FileWrapper(file(filename, 'rb')), content_type='application/x-zip-compressed')
response['Content-Disposition'] = "attachment; filename=%s" % _file
return response