When i try this line:
import urllib.request
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
i get the following error:
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
But the link works fine in my browser? Why does it work in the browser but not for a request? It works with other pictures from the same site.
The request returns
If you check your developer console, It's a 404:
So what you see is imgur's custom 404 "page" (which is an image).
EDIT:
So urlretrieve fails on 404 status code. If you want to use the contents of the request (even if the statuscode is 404) you can do the following:
try:
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
with open("error_photo.jpg", 'wb') as fp:
fp.write(e.read())
Try to change user-agent. You can just add a kwarg:
req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})
Related
with this code I am reading a URL and using the data for filtration but urllib could not work
url = "myurl"
response = urllib.request.urlopen(url)
data = json.loads(response.read())
yesterday it was working well but now giving me error:
Traceback (most recent call last):
File "vaccine_survey.py", line 22, in <module>
response = urllib.request.urlopen(url)
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
this works for me, here 'myurl' is a url address
from urllib.request import Request, urlopen
req = Request('myurl', headers={'User-Agent': 'Mozilla/5.0'})
response = urlopen(req).read()
data = json.loads(response.read())
Problem
I'm using urllib.request.urlopen on the Wall Street Journal and it gives me a 404.
Details
Other sites work fine. Same error if I use https://. I did this example in REPL but the same error happens in my calls from my Django server:
>>> from urllib.request import urlopen
>>> urlopen('http://www.wsj.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
This is how it should work:
>>> urlopen('http://www.cbc.ca')
<http.client.HTTPResponse object at 0x10b0f8c88>
I'm not sure how to debug this. Anyone know what's going on, and how I can fix it?
first import Request like this:
from urllib.request import **Request**, urlopen
and then pass your url and header to Request like below:
url = 'https://www.wsj.com/'
response_obj = urlopen(Request(url, headers={'User-Agent': 'Mozilla/5.0'}))
print(response_obj)
I tested it now its working
I get an error when i try to download a video from youtube with pytube.
from pytube import YouTube
yt = YouTube('https://www.youtube.com/watch?v=9bZkp7q19f0')
stream = yt.streams.first()
stream
stream.download()
The error says "Forbidden", but fetching e.g. with curl from the same URL seems to work just fine. What could be wrong here?
Traceback (most recent call last):
File "C:/Users/Mr. jarvis/Desktop/youtube/youtube.py", line 7, in <module>
stream.download()
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\streams.py", line 217, in download
bytes_remaining = self.filesize
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\streams.py", line 164, in filesize
headers = request.get(self.url, headers=True)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\site-packages\pytube\request.py", line 21, in get
response = urlopen(url)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\Mr. jarvis\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Use pytube3 instead, it fixes this error (but is Python 3 only): https://github.com/hbmartin/pytube3
Experienced the same problem. I found unfinished downloaded file in the project folder. After deleting the unfinished downloaded file and running stream.download() again, it worked. So maybe worth checking.
pytube version 12.0.0
I am getting an error that makes me believe my program is unable to find a website I know exists. the website is
https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207
My code looks like
from urllib import request as u_r
def strip_webite():
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
pass
if __name__ == "__main__":
strip_webite()
And the error I get is
File "database_management.py", line 19, in <module>
strip_webite()
File "database_management.py", line 15, in strip_webite
with u_r.urlopen("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207") as f:
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
It looks like Transfermarkt is blocking requests from bots with the default User-Agent string sent by Python's urllib library, though it doesn't mention anything about that in its robots.
This seems to imply they don't mind us scraping them, but they'd prefer we announce who we are.
To do so with urllib, do the following:
from urllib import request as u_r
def strip_webite():
request = u_r.Request("https://www.transfermarkt.com/marco-reus/verletzungen/spieler/35207")
request.add_header('User-Agent', 'my-cool-app')
with u_r.urlopen(request) as f:
pass
if __name__ == "__main__":
strip_webite()
Hello fellow Programmers,
today I wanted to get some JSON Data from this website using Python 3.3: http://ladv.de/api/-apikey-redacted-/ausDetail?id=884&wettbewerbe=true&all=true
The official API tells me that calling this URL returns some JSON Data. But if I use the following code to get it (which I found on stackoverflow, too), it throws an error:
import urllib.request
import json
request = 'http://ladv.de/api/mmetzger/ausDetail?id=884&wettbewerbe=true&all=true'
response = urllib.request.urlopen(request)
obj = json.load(response)
str_response = response.readall().decode('utf-8')
obj = json.loads(str_response)
print(obj)
prints out
Traceback (most recent call last):
File "D:/ladvclient/testscrape.py", line 5, in <module>
response = urllib.request.urlopen(request)
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Where is the bug, and what is the correct code?
Thanks in advance,
forumfresser
The site you're trying to fetch is not available, as seen here:
http://ladv.de/api/-apikey-redacted-/ausDetail?id=884&wettbewerbe=true&all=true
You could also just read the error message by yourself:
urllib.error.HTTPError: HTTP Error 404: Not Found