Python - download a file from URL - python

I would like to know, how to download a file from a specific URL without knowing file type and name in Python? Simply like downloading it by opening via browser.
URL example:
https://sourceforge.net/projects/portableapps/files/PortableApps.com%20Platform/PortableApps.com_Platform_Setup_19.0.paf.exe/download?use_mirror=deac-fra&use_mirror=deac-fra&r=

Try this:
import requests
URL = "http://www.example.com"
with open ("Filename", "wb") as f:
f.write(requests.get(URL).content)

Related

Get attached PDF file from HTTP request

I would like to download a file like this: https://www.bbs.unibo.it/conferma/?var=FormScaricaBrochure&brochureid=61305 with Python.
The problem is that is not directly a link to the file, but I only get the file id with query string.
I tried this code:
import requests
remote_url = "https://www.bbs.unibo.it/conferma/"
r = requests.get(remote_url, params = {"var":"FormScaricaBrochure", "brochureid": 61305})
But only the HTML is returned. How can I get the attached pdf?
You can use this example how to download the file using only brochureid:
import requests
url = "https://www.bbs.unibo.it/wp-content/themes/bbs/brochure-download.php?post_id={brochureid}&presentazione=true"
brochureid = 61305
with open("file.pdf", "wb") as f_out:
f_out.write(requests.get(url.format(brochureid=brochureid)).content)
Downloads the PDF to file.pdf (screenshot):

How to download zip file using Python scraping?

I want to download zip file of this link. I tried various method but I couldn't do this.
url = "https://www.cms.gov/apps/ama/license.asp?file=http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare_National_HCPCS_Aggregate_CY2017.zip"
# downloading with requests
# import the requests library
import requests
# download the file contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open("minemaster1.zip", "wb") as code:
code.write(r.content)
# downloading with urllib
# import the urllib library
import urllib
# Copy a network object to a local file
urllib.request.urlretrieve(url, "minemaster.zip")
Can anybody help me in resolving this issue.
They're using some accept/decline mechanism, so you'll need to add this parameters to url:
url = 'http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare_National_HCPCS_Aggregate_CY2017.zip?agree=yes&next=Accept'

urllib.request.urlretrieve returns corrupt file (How to handle this kind of url?)

I want to download about 1000 pdf files from a web page.
Then I encountered this awkward pdf url format.
Both requests.get() and urllib.request.urlretrieve() don't work for me.
Usual pdf url looks like :
https://webpage.com/this_file.pdf
But this url is like :
https://gongu.copyright.or.kr/gongu/wrt/cmmn/wrtFileDownload.do?wrtSn=9000001&fileSn=1&wrtFileTy=01
So it doesn't have .pdf in url, and if you click on it, you can download it, But using python's urllib, you get corrupt file.
At first I thought it is redirected into some other url.
So I used request.get(url, allow_retrieves=True) option,
the result is the same url as before..
filename = './novel/pdf1.pdf'
url = 'https://gongu.copyright.or.kr/gongu/wrt/cmmn/wrtFileDownload.do?wrtSn=9031938&fileSn=1&wrtFileTy=01'
urllib.request.urlretrieve(url, filename)
this code downloads corrupt pdf file.
I solved it using content field in the retrieved object.
filename = './novel1/pdf1.pdf'
url = . . .
object = requests.get(url)
with open('./novels/'+filename, 'wb') as f:
f.write(t.content)
refered to this QnA ; Download and save PDF file with Python requests module

Run URL to download a file with python

I'm working on a program that downloads data from a series of URLs, like this:
https://server/api/getsensordetails.xmlid=sesnsorID&username=user&password=password
the program goes through a list with IDs (about 2500) and running the URL, try to do it using the following code
import webbrowser
webbrowser.open(url)
but this code implies to open the URL in the browser and confirm if I want to download, I need him to simply download the files without opening a browser and much less without having to confirm
thanks for everything
You can use the Requests library.
import requests
print('Beginning file download with requests')
url = 'http://PathToFile.jpg'
r = requests.get(url)
with open('pathOfFileToReceiveDownload.jpg', 'wb') as f:
f.write(r.content)

Download a binary file using Python requests module

I need to download a file from an external source, I am using Basic authentication to login to the URL
import requests
response = requests.get('<external url', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<API URL to download the attachment>', auth=('<username>', '<password>'), stream=True)
print (data.content)
I am getting below output
<url to download the binary data>
\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0f\xcb\x00\x00\x1e\x00\x1e\x00\xbe\x07\x00\x00.\xcf\x05\x00\x00\x00'
I am expecting the URL to download the word document within the same session.
Working solution
import requests
import shutil
response = requests.get('<url>', auth=('<username>', '<password>'))
data = response.json()
html = data['list'][0]['attachments'][0]['url']
print (html)
data = requests.get('<url>', auth=('<username>', '<password>'), stream=True)
with open("C:/myfile.docx", 'wb') as f:
data.raw.decode_content = True
shutil.copyfileobj(data.raw, f)
I am able to download the file as it is.
When you want to download a file directly you can use shutil.copyfileobj():
https://docs.python.org/2/library/shutil.html#shutil.copyfileobj
You already are passing stream=True to requests which is what you need to get a file-like object back. Just pass that as the source to copyfileobj().

Categories