Handling a Redirecting Auto Download Link Python - python

I have a URL which upon loading in, will automatically download a CSV to your machine.
I am trying to do this automatically in python and control what the downloaded file is named. Here is my current code:
import os
import sys
import urllib.request
URL= https://www.google.com/url?q=https%3A%2F%2Fbasketballmonster.com%2FDaily.aspx%3Fv%3D2%26exportcsv%3DXnZZUZaDa0E296JhVEGWbs8HRGOXsEkeJKs2towTT%2Fw%3D&sa=D&sntz=1&usg=AFQjCNHYm9T_QIZvEJ8qIKfyXQuZb4HPVA
response = urllib.request.urlopen(URL)
URL2 = response.geturl()
urllib.request.urlretrieve(URL2, "file2.csv")
For the URL:
https://www.google.com/url?q=https%3A%2F%2Fbasketballmonster.com%2FDaily.aspx%3Fv%3D2%26exportcsv%3DXnZZUZaDa0E296JhVEGWbs8HRGOXsEkeJKs2towTT%2Fw%3D&sa=D&sntz=1&usg=AFQjCNHYm9T_QIZvEJ8qIKfyXQuZb4HPVA
(clicking that downloads a CSV to disk)
However, the CSV downloaded has this html markup instead of being the actual data
Any ideas on a solution?

Related

How to download zip file using Python scraping?

I want to download zip file of this link. I tried various method but I couldn't do this.
url = "https://www.cms.gov/apps/ama/license.asp?file=http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare_National_HCPCS_Aggregate_CY2017.zip"
# downloading with requests
# import the requests library
import requests
# download the file contents in binary format
r = requests.get(url)
# open method to open a file on your system and write the contents
with open("minemaster1.zip", "wb") as code:
code.write(r.content)
# downloading with urllib
# import the urllib library
import urllib
# Copy a network object to a local file
urllib.request.urlretrieve(url, "minemaster.zip")
Can anybody help me in resolving this issue.
They're using some accept/decline mechanism, so you'll need to add this parameters to url:
url = 'http://download.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/Medicare_National_HCPCS_Aggregate_CY2017.zip?agree=yes&next=Accept'

Run URL to download a file with python

I'm working on a program that downloads data from a series of URLs, like this:
https://server/api/getsensordetails.xmlid=sesnsorID&username=user&password=password
the program goes through a list with IDs (about 2500) and running the URL, try to do it using the following code
import webbrowser
webbrowser.open(url)
but this code implies to open the URL in the browser and confirm if I want to download, I need him to simply download the files without opening a browser and much less without having to confirm
thanks for everything
You can use the Requests library.
import requests
print('Beginning file download with requests')
url = 'http://PathToFile.jpg'
r = requests.get(url)
with open('pathOfFileToReceiveDownload.jpg', 'wb') as f:
f.write(r.content)

Can't Download Full File in Python

I was using Bs4 in Python for downloading a wallpaper from nmgncp.com.
However the code downloads only 16KB file whereas the full image is around 300KB.
Please help me. I have even tried wget.download method.
PS:- I am using Python 3.6 on Windows 10.
Here is my code::--
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com/'+a
print(newurl)
response = requests.get(newurl)
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
The source of your problem is because there is a protection : the image page requires a referer, otherwise it redirects to the html page.
Source code fixed :
from bs4 import BeautifulSoup
import requests
import datetime
import time
import re
import wget
import os
url='http://www.nmgncp.com/dark-wallpaper-1920x1080.html'
html=requests.get(url)
soup=BeautifulSoup(html.text,"lxml")
a = soup.findAll('img')[0].get('src')
newurl='http://www.nmgncp.com'+a
print(newurl)
response = requests.get(newurl, headers={'referer': newurl})
if response.status_code == 200:
with open("C:/Users/KD/Desktop/Python_practice/newwww.jpg", 'wb') as f:
f.write(response.content)
First of all http://www.nmgncp.com/dark-wallpaper-1920x1080.html is an HTML document. Second when you try to download an image by direct URL (like: http://www.nmgncp.com/data/out/95/4351795-dark-wallpaper-1920x1080.jpg) it will also redirect you to a HTML document. This is most probably because the hoster (nmgncp.com) does not want to provide direct links to its images. He can check whether the image was called directly by looking at the HTTP referer and deciding if it is valid. So in this case you have to put in some more effort to make the hoster think, that you are a valid caller of direct URLs.

Downloading a whole folder of files from URL

I'm writing a program/script in python3. I know how to download single files from URL, but I need to download whole folder, unzip the files and merge text files.
Is it possible to download all files FROM HERE to new folder on my computer with python? I'm using a urllib to download a single files, can anyone give a example how to download whole folder from link above?
Install bs4 and requests, than you can use code like this:
import bs4
import requests
url = "http://bossa.pl/pub/metastock/ofe/sesjaofe/"
r = requests.get(url)
data = bs4.BeautifulSoup(r.text, "html.parser")
for l in data.find_all("a"):
r = requests.get(url + l["href"])
print(r.status_code)
Than you have to save the data of the request into your directory.

Downloading a pdf from link but server redirects to homepage

I am trying to download a pdf from a webpage using urllib. I used the source link that downloads the file in the browser but that same link fails to download the file in Python. Instead what downloads is a redirect to the main page.
import os
import urllib
os.chdir(r'/Users/file')
url = "http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414"
urllib.urlretrieve (url, "downloaded_file")
Please try downloading the file manually from the link provided or from the redirected site, the link on the main page is called 'sectionals'.
Your help is much appreciated.
It is because the given link redirects you to a "raw" pdf file. Examining the response headers via Firebug, I am able to get the filename sectionals/2014/2607RAND.pdf (see screenshot below) and as it is relative to the current .aspx file, the required URI should be switched to (in your case by changing the url variable to this link) http://www.australianturfclub.com.au/races/sectionals/2014/2607RAND.pdf
In python3:
import urllib.request
import shutil
local_filename, headers = urllib.request.urlretrieve('http://www.australianturfclub.com.au/races/SectionalsMeeting.aspx?meetingId=2414')
shutil.move(local_filename, 'ret.pdf')
The shutil is there because python save to a temp folder (im my case, that's another partition so os.rename will give me an error).

Categories