The following url downloads an excel spreadsheet
http://www.bocsar.nsw.gov.au/Documents/RCS-Annual/bluemountainslga.xlsx
via flask code I want to call that url, and save the spreadsheet to a folder
So far I have
r = requests.get('http://www.bocsar.nsw.gov.au/Documents/RCS-Annual/bluemountainslga.xlsx')
But need help moving the spread sheet to a downloads folder inside the project. The folder structure is.
App
+static
+templates
main.py
+downloads
|__ move file here
This is a minimal example of something I just got to work;
import requests
import shutil
def download(url):
filename = url.split("/")[-1]
path = "downloads/" + filename
r = requests.get(url, stream=True)
if r.status_code == 200:
with open(path, 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
else:
r.raise_for_status()
download('http://www.bocsar.nsw.gov.au/Documents/RCS-Annual/bluemountainslga.xlsx')
It reads the raw data from the request object and writes to a file where you want it to.
Related
What is the best way to save files to a folder with their native extension? The idea is that files are being downloaded from several urls in turn and stored in three folders into three folders, depending on the status code. And all these files with different extensions .
import requests
def save_file(link):
filename = link.split('/')[-1]
print(filename)
# proxies ={
# 'https': 'http://5.135.240.70:8080'
# }
data = requests.get('https://ipinfo.io/json')
print(data.text)
r =requests.get(link,allow_redirects=True)
print(r.status_code)
while True:
if():
if(r.status_code == 200):
with open('\\Users\\user\\Desktop\\good\\gp.txt', 'wb') as f:
f.write(r.content)
if(r.status_code != 200):
open(r'\Users\user\Desktop\bad\gp.zip', 'wb' ).write(r.content)
break
open(r'\Users\user\Desktop\general\gp.zip', 'wb').write(r.content)
link1 ='://...........................txt'
link2 ='://..............................jpeg'
link3 ='://..............................php'
link4 ='://........................rules'
In this form , it is more suitable for downloading one specific file . Maybe through the "glob" or "os.". I am grateful for any suggestions and help.
I am interested in this particular part of the code:
while True:
if():
if(r.status_code == 200):
with open('\\Users\\user\\Desktop\\good\\gp.txt', 'wb') as f:
f.write(r.content)
if(r.status_code != 200):
open(r'\Users\user\Desktop\bad\gp.zip', 'wb' ).write(r.content)
break
open(r'\Users\user\Desktop\general\gp.zip', 'wb').write(r.content)
As you suspected, the os module can come in handy in this case.
You can use the following syntax to get the file name and extension
file_name, file_ext = os.path.splitext(os.path.basename(link))
What is happening is that you are first getting the full file name using basename e.g. if your url is https://www.binarydrtyefense.com/banlist.txt basename would return banlist.txt. Then splitext takes the banlist.txt and produces a tuple ('banlist', '.txt').
Later in your code you can use this to make the file e.g.
download_name = f'gp{file_ext}'
download_dir = 'good' if r.status_code == 200 else 'bad'
download_path = os.path.join(r'\Users\user\Desktop', download_dir, download_name)
import json
import requests
def download_file(url):
r = requests.get(url)
filename = url.split('/')[-1]
with open(filename, 'wb') as f:
f.write(r.content)
api_url = 'https://api.fda.gov/download.json'
r = requests.get(api_url)
files = [file['file'] for file in json.loads(r.text)['results']['drug']['event']['partitions']]
count = 1
for file in files:
download_file(file)
print(f"{count}/{len(files)} downloaded!")
count += 1
This is the other code
import urllib.request, json
with urllib.request.urlopen("https://api.fda.gov/drug/label.json") as url:
data = json.loads(url.read().decode())
print(data)
The first code just downloads it. I wondering if theres a way to not have to download any of the 1000+ files and just display it, so the code can be used locally. While the second one prints the json in the terminal.
requests.get() and urllib.request.urlopen() both "download" the full response of the URL they are given.
If you do not want to "save" the file to disk, then remove the code that calls f.write()
More specifically,
import json
import requests
api_url = 'https://api.fda.gov/download.json'
r = requests.get(api_url)
files = [file['file'] for file in r.json()['results']['drug']['event']['partitions']]
total_files = len(files)
count = 0
for file in files:
print(requests.get(file).content)
print(f"{count+1}/{total_files} downloaded!")
count += 1
I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?
Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.
You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)
I am scraping a website which is accessible from this link, using Beautiful Soup. The idea is to download all href that contain the string .pdf using the get module.
The code below demonstrated the procedure and is working as intended:
filename = 'new_name.pdf'
url_to_download_pdf='https://bradscholars.brad.ac.uk/https://www.brad.ac.uk/library/additional-help/bradford-scholars-faqs/digital_preservation_policy.pdf'
with open(filename, 'wb') as f:
f.write(requests.get(url_to_download_pdf).content)
However, there is instance where the url such as given above (i.e., the variable url_to_download_pdf) direct to Page not found page. As a result, an unusable and unreadable pdf is downloaded.
Opening the file with pdf reader in Windows give the following warning
I am curious if there is any ways to avoid accessing and downloading an invalid pdf file?
Instead of directly accessing the contents of the file with
f.write(requests.get(url_to_download_pdf).content)
You can first check the status of the request, and then if it is a valid request, then only save to file.
filename = 'new_name.pdf'
url_to_download_pdf='https://bradscholars.brad.ac.uk/https://www.brad.ac.uk/library/additional-help/bradford-scholars-faqs/digital_preservation_policy.pdf'
response = requests.get(url_to_download_pdf)
if(response.status_code != 404):
with open(filename, 'wb') as f:
f.write(response.content)
You have to validate that the file you request for, already exists. If the file exists, the response code of the request will be 200. So here an example of how to do that:
filename = 'new_name.pdf'
url_to_download_pdf='https://bradscholars.brad.ac.uk/https://www.brad.ac.uk/library/additional-help/bradford-scholars-faqs/digital_preservation_policy.pdf'
with open(filename, 'wb') as f:
response = requests.get(url_to_download_pdf)
if response.status_code == 200:
f.write(response.content)
else:
print("Error, the file doesn't exist")
Thanks for the suggestion by the user.
As per #Nicolas,
Do the save as pdf only if the response return 200
if response.status_code == 200:
In the previous version, an empty file will be created regardless of the response because following with open(filename, 'wb') as f: was created before the checking status_code
To mitigate this, the with open(filename, 'wb') as f: should be initiated only if the condition set was as intended.
The complete code then is as below:
import requests
filename = 'new_name.pdf'
url_to_download_pdf='https://bradscholars.brad.ac.uk/https://www.brad.ac.uk/library/additional-help/bradford-scholars-faqs/digital_preservation_policy.pdf'
my_req = requests.get(url_to_download_pdf)
if my_req.status_code == 200:
with open(filename, 'wb') as f:
f.write(my_req.content)
I have a long list of .json files that I need to download to my computer. I want to download them as .json files (so no parsing or anything like that at this point).
I have some code that works for small files, but it is pretty buggy. Also it doesn't handle multiple links well.
Appreciate any advice to fix up this code:
import os
filename = 'test.json'
path = "C:/Users//Master"
fullpath = os.path.join(path, filename)
import urllib2
url = 'https://www.premierlife.com/secure/index.json'
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close
It's creating a blank file because the f.close at the end should be f.close().
I took your code and made a little function and then called it on a little loop to go through a .txt file with the list of urls called "list_of_urls.txt" having 1 url per line (you can change the delimiter in the split function if you want to format it differently).
def save_json(url):
import os
filename = url.replace('/','').replace(':','')
# this replaces / and : in urls
path = "C:/Users/Master"
fullpath = os.path.join(path, filename)
import urllib2
response = urllib2.urlopen(url)
webContent = response.read()
f = open(fullpath, 'w')
f.write(webContent)
f.close()
And then the loop:
f = open('list_of_urls.txt')
p = f.read()
url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed
for url in url_list:
save_json(url)