Downloading a whole folder of files from URL - python

I'm writing a program/script in python3. I know how to download single files from URL, but I need to download whole folder, unzip the files and merge text files.
Is it possible to download all files FROM HERE to new folder on my computer with python? I'm using a urllib to download a single files, can anyone give a example how to download whole folder from link above?

Install bs4 and requests, than you can use code like this:
import bs4
import requests
url = "http://bossa.pl/pub/metastock/ofe/sesjaofe/"
r = requests.get(url)
data = bs4.BeautifulSoup(r.text, "html.parser")
for l in data.find_all("a"):
r = requests.get(url + l["href"])
print(r.status_code)
Than you have to save the data of the request into your directory.

Related

How can I extract Azure IP ranges json file programatically through python?

I want to download the ipranges.json (which is updated weekly) from https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
I have this python code which keeps running forever.
import wget
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
response = wget.download(URL, "ips.json")
print(response)
How can I download the JSON file in Python?
Because https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519 is the link which automatically trigger javascript to download, therefore you just download the page, not the file
If you check downloaded file, the source will look like this
We realize the file will change after a while, so we have to scrape it in generic way
For convenience, I will not use wget, 2 libraries here are requests to request page and download file, beaufitulsoup to parse html
# pip install requests
# pip install bs4
import requests
from bs4 import BeautifulSoup
# request page
URL = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
page = requests.get(URL)
# parse HTML to get the real link
soup = BeautifulSoup(page.content, "html.parser")
link = soup.find('a', {'data-bi-containername':'download retry'})['href']
# download
file_download = requests.get(link)
# save in azure_ips.json
open("azure_ips.json", "wb").write(file_download.content)

How to write a Python script to download file from the link that starts downloading automatically when clicked?

I have a link: https://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate=20220114&reportType=F&productId=425
If you copy this url and paste it to the browser, .xls file will start downloading
How to write a Python script to download the .xls file programmatically? So I can run .py file to download .xls file
If it was a link pointing to the file directly e.g. "http://google.com/favicon.ico" then it would be straightforward, something like:
import requests
url = 'http://google.com/favicon.ico'
r = requests.get(url)
open('google.ico', 'wb').write(r.content)
but since my link is not just the link to the file, this solution doesn't work
Please help to write the Python script to download the .xls file from the following link https://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate=20220114&reportType=F&productId=425
you can get the current day date from the DateTime module and strip that out with strftime and change your URL like this:
from datetime import date
import requests
today = date.today()
todayStr= date.today().strftime('%Y%m%d')
url = "https://www.cmegroup.com/CmeWS/exp/voiProductDetailsViewExport.ctl?media=xls&tradeDate="+todayStr+"&reportType=F&productId=425"
r = requests.get(url)
open(todayStr+'.xls', 'wb').write(r.content)

Run URL to download a file with python

I'm working on a program that downloads data from a series of URLs, like this:
https://server/api/getsensordetails.xmlid=sesnsorID&username=user&password=password
the program goes through a list with IDs (about 2500) and running the URL, try to do it using the following code
import webbrowser
webbrowser.open(url)
but this code implies to open the URL in the browser and confirm if I want to download, I need him to simply download the files without opening a browser and much less without having to confirm
thanks for everything
You can use the Requests library.
import requests
print('Beginning file download with requests')
url = 'http://PathToFile.jpg'
r = requests.get(url)
with open('pathOfFileToReceiveDownload.jpg', 'wb') as f:
f.write(r.content)

Download files using Python 3.4 from Google Patents

I would like to download (using Python 3.4) all (.zip) files on the Google Patent Bulk Download Page http://www.google.com/googlebooks/uspto-patents-grants-text.html
(I am aware that this amounts to a large amount of data.) I would like to save all files for one year in directories [year], so 1976 for all the (weekly) files in 1976. I would like to save them to the directory that my Python script is in.
I've tried using the urllib.request package, but I could get far enoughto get to the http text, not how to "click" on the file to download it.
import urllib.request
url = 'http://www.google.com/googlebooks/uspto-patents-grants-text.html'
savename = 'google_patent_urltext'
urllib.request.urlretrieve(url, savename )
Thank you very much for help.
As I understand you seek for a command that will simulate leftclicking on file and automatically download it. If so, you can use Selenium.
something like:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
profile = FirefoxProfile ()
profile.set_preference("browser.download.folderList",2)
profile.set_preference("browser.download.manager.showWhenStarting",False)
profile.set_preference("browser.download.dir", 'D:\\') #choose folder to download to
profile.set_preference("browser.helperApps.neverAsk.saveToDisk",'application/octet-stream')
driver = webdriver.Firefox(firefox_profile=profile)
driver.get('https://www.google.com/googlebooks/uspto-patents-grants-text.html#2015')
filename = driver.find_element_by_xpath('//a[contains(text(),"ipg150106.zip")]') #use loop to list all zip files
filename.click()
UPDATED! 'application/octet-stream' zip-mime type should be used instead of "application/zip". Now it should work:)
The html you are downloading is the page of links. You need to parse the html to find all the download links. You could use a library like beautiful soup to do this.
However, the page is very regularly structured so you could use a regular expression to get all the download links:
import re
html = urllib.request.urlopen(url).read()
links = re.findall('<a href="(.*)">', html)

Downloading a pdf from link but server redirects to homepage

I am trying to download a pdf from a webpage using urllib. I used the source link that downloads the file in the browser but that same link fails to download the file in Python. Instead what downloads is a redirect to the main page.
import os
import urllib
os.chdir(r'/Users/file')
url = "http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414"
urllib.urlretrieve (url, "downloaded_file")
Please try downloading the file manually from the link provided or from the redirected site, the link on the main page is called 'sectionals'.
Your help is much appreciated.
It is because the given link redirects you to a "raw" pdf file. Examining the response headers via Firebug, I am able to get the filename sectionals/2014/2607RAND.pdf (see screenshot below) and as it is relative to the current .aspx file, the required URI should be switched to (in your case by changing the url variable to this link) http://www.australianturfclub.com.au/races/sectionals/2014/2607RAND.pdf
In python3:
import urllib.request
import shutil
local_filename, headers = urllib.request.urlretrieve('http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414')
shutil.move(local_filename, 'ret.pdf')
The shutil is there because python save to a temp folder (im my case, that's another partition so os.rename will give me an error).

Categories