Download files from a website using python - python

I am new to Python and I have a requirement to download multiple csv-files from a website authenticated using username and password.
I wrote the below piece of code to download a single file but unfortunately the contents in the downloaded file are not same as in the original file.
Could you please let me know what I am doing wrong here and how to achieve this.
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="https:xxxxxxxxxxxxxxxxxxxx.aspx/20-02-2019 124316CampaignExport.csv"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("D:/20-02-2019 124316CampaignExport.csv", 'wb') as f:
shutil.copyfileobj(r.raw, f)

The following code worked for me (only indenting the last line):
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="linkToDownload"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("filename", 'wb') as f:
shutil.copyfileobj(r.raw, f)
This means the problem is stemming from your URL or authentication rather than the python code itself.
Your URL has a space in it, which is likely causing an error. I can't confirm for sure as I don't have your URL. If you have write-access to it, try renaming it with a "_" insetead of a space.

Related

Not able to download a CSV file from a website

Maybe someone knows what's a problem with downloading from the site below... I run this code in Jupiter, and nothing happens.
import requests
import os
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
I've also tried this code and it works fine on my side, assuming folder and file were defined correctly. Alternatively, you can try using pandas which can read a CSV file from URL. So the code would become:
import pandas as pd
import csv
url = '{Some CSV target}'
df = pd.read_csv(url)
df.to_csv('{absoulte path to CSV}', sep=',', index=False, quoting=csv.QUOTE_ALL)
Its has started working only after declaring proxy.
Im practicing on my work laptop. Maybe the local net is blocking my requests.
I hope it can be helpful for someone.
Thanks to everyone for your help!
import requests
import os
os.environ['HTTP_PROXY'] = 'your proxy'
url = 'http://www.football-data.co.uk/mmz4281/1920/E0.csv'
response = requests.get(url)
with open(os.path.join("C://DownloadLocation", "file.csv"), 'wb') as f:
f.write(response.content)

Download an SSRS report in Python using requests

I am trying to use requests to download an SSRS report. The following code will download an empty Excel file:
url = 'http://MY REPORT URL HERE/ReportServer?/REPORT NAME HERE&rs:Format=EXCELOPENXML'
s = requests.Session()
s.post(url, data={'_username': 'username, '_password': 'password'})
r = s.get(url)
output_file = r'C:\Saved Reports\File.xlsx'
downloaded_file = open(output_file, 'wb')
for chunk in r.iter_content(100000):
downloaded_file.write(chunk)
I have successfully used requests_ntlm to complete this task, but I am wondering why the above code is not working as intended. The Excel file turns out to be empty; I feel it is due to an issue with logging in and passing those cookies to the GET request.
I was able to get this to work, but for pdfs. I found the solution here
Here's a piece of my code snippet:
import requests
from requests_ntlm import HttpNtlmAuth
session = requests.Session()
session.auth = HttpNtlmAuth(domain+uid,pwd)
response = session.get(reporturl,stream=True)
print response.status_code
with open(outputlocation+mdcProp+'.pdf','wb') as pdf:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
pdf.write(chunk)
session.close()

Python - Retrieve and use a cookie to download a file

Trying to download the following file:
https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf
I first need to sign into the following site before doing so:
https://urs.earthdata.nasa.gov
After reviewing my browser's web console, I believe it's using a cookie to allow me to download the file. How can I do this using python? I find out how to retrieve the cookies:
import os, requests
username = 'user'
password = 'pwd'
url = 'https://urs.earthdata.nasa.gov'
r = requests.get(url, auth=(username,password))
cookies = r.cookies
How can I then use this to download the HDF file? I've tried the following but always receive 401 error.
url2 = "https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf"
r2 = requests.get(url2, cookies=r.cookies)
Have you tried a simple basic authentification :
from requests.auth import HTTPBasicAuth
url2='https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf'
requests.get(url2, auth=HTTPBasicAuth('user', 'pass'))
or read this example
To download a file using the Requests library with the browser cookies, you can use the next function:
import browser_cookie3
import requests
import shutil
import os
cj = browser_cookie3.brave()
def download_file(url, root_des_path='./'):
local_filename = url.split('/')[-1]
local_filename = os.path.join(root_des_path, local_filename)
# r = requests.get(link, cookies=cj)
with requests.get(url, cookies=cj, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
return local_filename
a = download_file(link)
In this example, cj is the cookies of Brave browser ( you can use ffox or chrome). then, these cj are passed to Requests to download the file.
Note, you need to get "browser_cookie3" library
pip install browser-cookie3

Downloading file from redirection link python

I am trying to make a simple program that will help with the confusing part of rooting.
I need to download the file from tiny.cc/latestmagisk
I am using this python code
import request
url = tiny.cc/latestmagisk
r = request.get(url)
r.content
The content it returns is the usual 403 Forbidden for nginx
I need this to work with the shortened URL is there anyway to make that happen?
its's not necessary to import request lib
all you need to do is import ssl, urllib and pass ssl._create_unverified_context() as context to the server while you're sendig a request!
your code should be look like this:
import ssl, urllib
certcontext = ssl._create_unverified_context()
f = open('image.jpg','wb') #creating placeholder
#creating image from url and saving it as `image.jpg`!
f.write(urllib.urlopen("https://i.stack.imgur.com/IKh7E.png", context=certcontext).read())
f.close()
note: it will save the image as image.jpg file ..
Contrary to the other answer, you really should use requests for this as requests has better support for redirects.
For getting a page through a redirect from requests:
r=requests.get(url, allow_redirects=True)
For downloading files through redirects:
r = requests.get(url, allow_redirects=True, stream=True)
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=1024):
if chunk: f.write(chunk)
However, in this case, either tiny.cc or XDA does not allow a simple requests.get; the 403 forbidden is likely due to the User-Agent or other intrinsic header as this method works well with bit.ly and other shortlink generators. You may need to fake headers.

Download a binary file using Python requests module

I need to download a file from an external source, I am using Basic authentication to login to the URL
import requests
response = requests.get('<external url', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<API URL to download the attachment>', auth=('<username>', '<password>'), stream=True)
print (data.content)
I am getting below output
<url to download the binary data>
\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\xcb\x00\x00\x1e\x00\x1e\x00\xbe\x07\x00\x00.\xcf\x05\x00\x00\x00'
I am expecting the URL to download the word document within the same session.
Working solution
import requests
import shutil
response = requests.get('<url>', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<url>', auth=('<username>', '<password>'), stream=True)
with open("C:/myfile.docx", 'wb') as f:
data.raw.decode_content = True
shutil.copyfileobj(data.raw, f)
I am able to download the file as it is.
When you want to download a file directly you can use shutil.copyfileobj():
https://docs.python.org/2/library/shutil.html#shutil.copyfileobj
You already are passing stream=True to requests which is what you need to get a file-like object back. Just pass that as the source to copyfileobj().

Categories