I want to download multiple files using python. So I use wget package. But I need to attach cookies to my request. How can I do that?
You can use requests module in python to achieve what you have in mind:
import requests
urls = ['url1', 'url2', 'url3']
filenames = ['filename1', 'filename2', 'filename3']
cookies = dict(foo='bar')
for url, filename in zip(urls, filenames):
r = requests.get(url, cookies=cookies)
with open(filename, 'wb') as file:
file.write(r.content)
Related
I'm trying to download this video: https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4
I tried the following but it doesn't work.
link = "https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4"
urllib.request.urlretrieve(link, 'video.mp4')
I'm getting:
urllib.error.HTTPError: HTTP Error 403: Forbidden
Is there another way to download an mp4 file without using urllib?
I have no problem to download with module requests
import requests
url = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4'
response = requests.get(url)
with open('video.mp4', 'wb') as f: # use `"b"` to open in `bytes mode`
f.write(response.content) # use `.content` to get `bytes`
It was small file ~10MB but for bigger file you may download in chunks.
import requests
url = 'https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4'
response = requests.get(url, stream=True)
with open('video.mp4', 'wb') as f:
for chunk in response.iter_content(10000): # 10_000 bytes
if chunk:
#print('.', end='') # every dot will mean 10_000 bytes
f.write(chunk)
Documentation shows Streaming Requests but for text data.
url is a string so you can use string-functions to get element after last /
filename = url.split('/')[-1]
Or you can try to use os.path
At least it works on Linux - maybe because Linux also use / in local paths.
import os
head, tail = os.path.split(url)
# head: 'https://www.learningcontainer.com/wp-content/uploads/2020/05'
# tail: 'sample-mp4-file.mp4'
I'm trying to automate the download of docs via Selenium.
I'm using requests.get() to download the file after extracting the url from the website:
import requests
url= 'https://www.schroders.com/hkrprewrite/retail/en/attach.aspx?fileid=e47b0366c44e4f33b04c20b8b6878aa7.pdf'
myfile = requests.get(url)
open('/Users/hemanthj/Downloads/AB Test/' + "A-Acc-USD" + '.pdf', 'wb').write(myfile.content)
time.sleep(3)
The file is downloaded but is corrupted when I try to open. The file size is only a few KB at most.
I tried adding the header info from this thread too but no luck:
Corrupted PDF file after requests.get() with Python
What within the headers makes the download work? Any solutions?
The problem was in an incorrect URL.
It loaded HTML instead of PDF.
Looking throw the site I found the URL that you were looking for.
Try this code and then open the document with pdf reader program.
import requests
import pathlib
def load_pdf_from(url:str, filename:pathlib.Path) -> None:
response:requests.Response = requests.get(url, stream=True)
if response.status_code == 200:
with open(filename, 'wb') as pdf_file:
for chunk in response.iter_content(chunk_size=1024):
pdf_file.write(chunk)
else:
print(f"Failed to load pdf: {url}")
url:str = 'https://www.schroders.com/hkrprewrite/retail/en/attachment2.aspx?fileid=e47b0366c44e4f33b04c20b8b6878aa7.pdf'
target_filename:pathlib.Path = pathlib.Path.cwd().joinpath('loaded_pdf.pdf')
load_pdf_from(url, target_filename)
I am new to Python and I have a requirement to download multiple csv-files from a website authenticated using username and password.
I wrote the below piece of code to download a single file but unfortunately the contents in the downloaded file are not same as in the original file.
Could you please let me know what I am doing wrong here and how to achieve this.
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="https:xxxxxxxxxxxxxxxxxxxx.aspx/20-02-2019 124316CampaignExport.csv"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("D:/20-02-2019 124316CampaignExport.csv", 'wb') as f:
shutil.copyfileobj(r.raw, f)
The following code worked for me (only indenting the last line):
import requests
import shutil
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
url="linkToDownload"
r = requests.get(url, auth=('username', 'Password'),
verify=False,stream=True)
r.raw.decode_content = True
with open("filename", 'wb') as f:
shutil.copyfileobj(r.raw, f)
This means the problem is stemming from your URL or authentication rather than the python code itself.
Your URL has a space in it, which is likely causing an error. I can't confirm for sure as I don't have your URL. If you have write-access to it, try renaming it with a "_" insetead of a space.
Trying to download the following file:
https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf
I first need to sign into the following site before doing so:
https://urs.earthdata.nasa.gov
After reviewing my browser's web console, I believe it's using a cookie to allow me to download the file. How can I do this using python? I find out how to retrieve the cookies:
import os, requests
username = 'user'
password = 'pwd'
url = 'https://urs.earthdata.nasa.gov'
r = requests.get(url, auth=(username,password))
cookies = r.cookies
How can I then use this to download the HDF file? I've tried the following but always receive 401 error.
url2 = "https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf"
r2 = requests.get(url2, cookies=r.cookies)
Have you tried a simple basic authentification :
from requests.auth import HTTPBasicAuth
url2='https://e4ftl01.cr.usgs.gov/MOLA/MYD14A2.006/2017.10.24/MYD14A2.A2017297.h19v01.006.2017310142443.hdf'
requests.get(url2, auth=HTTPBasicAuth('user', 'pass'))
or read this example
To download a file using the Requests library with the browser cookies, you can use the next function:
import browser_cookie3
import requests
import shutil
import os
cj = browser_cookie3.brave()
def download_file(url, root_des_path='./'):
local_filename = url.split('/')[-1]
local_filename = os.path.join(root_des_path, local_filename)
# r = requests.get(link, cookies=cj)
with requests.get(url, cookies=cj, stream=True) as r:
with open(local_filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
return local_filename
a = download_file(link)
In this example, cj is the cookies of Brave browser ( you can use ffox or chrome). then, these cj are passed to Requests to download the file.
Note, you need to get "browser_cookie3" library
pip install browser-cookie3
I need to download a file from an external source, I am using Basic authentication to login to the URL
import requests
response = requests.get('<external url', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<API URL to download the attachment>', auth=('<username>', '<password>'), stream=True)
print (data.content)
I am getting below output
<url to download the binary data>
\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\xcb\x00\x00\x1e\x00\x1e\x00\xbe\x07\x00\x00.\xcf\x05\x00\x00\x00'
I am expecting the URL to download the word document within the same session.
Working solution
import requests
import shutil
response = requests.get('<url>', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<url>', auth=('<username>', '<password>'), stream=True)
with open("C:/myfile.docx", 'wb') as f:
data.raw.decode_content = True
shutil.copyfileobj(data.raw, f)
I am able to download the file as it is.
When you want to download a file directly you can use shutil.copyfileobj():
https://docs.python.org/2/library/shutil.html#shutil.copyfileobj
You already are passing stream=True to requests which is what you need to get a file-like object back. Just pass that as the source to copyfileobj().