Hello fellow Programmers,
today I wanted to get some JSON Data from this website using Python 3.3: http://ladv.de/api/-apikey-redacted-/ausDetail?id=884&wettbewerbe=true&all=true
The official API tells me that calling this URL returns some JSON Data. But if I use the following code to get it (which I found on stackoverflow, too), it throws an error:
import urllib.request
import json
request = 'http://ladv.de/api/mmetzger/ausDetail?id=884&wettbewerbe=true&all=true'
response = urllib.request.urlopen(request)
obj = json.load(response)
str_response = response.readall().decode('utf-8')
obj = json.loads(str_response)
print(obj)
prints out
Traceback (most recent call last):
File "D:/ladvclient/testscrape.py", line 5, in <module>
response = urllib.request.urlopen(request)
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
Where is the bug, and what is the correct code?
Thanks in advance,
forumfresser
The site you're trying to fetch is not available, as seen here:
http://ladv.de/api/-apikey-redacted-/ausDetail?id=884&wettbewerbe=true&all=true
You could also just read the error message by yourself:
urllib.error.HTTPError: HTTP Error 404: Not Found
Related
When i try this line:
import urllib.request
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
i get the following error:
Traceback (most recent call last):
File "scraper.py", line 26, in <module>
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
File "/usr/lib/python3.6/urllib/request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 532, in open
response = meth(req, response)
File "/usr/lib/python3.6/urllib/request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.6/urllib/request.py", line 570, in error
return self._call_chain(*args)
File "/usr/lib/python3.6/urllib/request.py", line 504, in _call_chain
result = func(*args)
File "/usr/lib/python3.6/urllib/request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
But the link works fine in my browser? Why does it work in the browser but not for a request? It works with other pictures from the same site.
The request returns
If you check your developer console, It's a 404:
So what you see is imgur's custom 404 "page" (which is an image).
EDIT:
So urlretrieve fails on 404 status code. If you want to use the contents of the request (even if the statuscode is 404) you can do the following:
try:
urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg")
except Exception as e:
with open("error_photo.jpg", 'wb') as fp:
fp.write(e.read())
Try to change user-agent. You can just add a kwarg:
req = urllib.request.urlretrieve("https://i.redd.it/53tfh959wnv41.jpg", "photo.jpg", headers={"User-Agent": "put custom user agent here"})
Problem
I'm using urllib.request.urlopen on the Wall Street Journal and it gives me a 404.
Details
Other sites work fine. Same error if I use https://. I did this example in REPL but the same error happens in my calls from my Django server:
>>> from urllib.request import urlopen
>>> urlopen('http://www.wsj.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
This is how it should work:
>>> urlopen('http://www.cbc.ca')
<http.client.HTTPResponse object at 0x10b0f8c88>
I'm not sure how to debug this. Anyone know what's going on, and how I can fix it?
first import Request like this:
from urllib.request import **Request**, urlopen
and then pass your url and header to Request like below:
url = 'https://www.wsj.com/'
response_obj = urlopen(Request(url, headers={'User-Agent': 'Mozilla/5.0'}))
print(response_obj)
I tested it now its working
Anyway, I'm trying to write a simple request to the Amazon API using the following code:
ak = "***"
sk = "***"
at = "***"
import bottlenose
amazon = bottlenose.Amazon(ak, sk, at, "DE")
response=amazon.ItemLookup(ItemId="B00KWAO4CI")
print(response.price_and_currency)
It should return an XML object. Instead I get the following result:
Traceback (most recent call last):
File "simpleamazon.py", line 7, in <module>
response=amazon.ItemLookup(ItemId="B00KWAO4CI")
File "/Library/Python/2.7/site-packages/bottlenose/api.py", line 251, in __call__
{'api_url': api_url, 'cache_url': cache_url})
File "/Library/Python/2.7/site-packages/bottlenose/api.py", line 212, in _call_api
return urllib2.urlopen(api_request, timeout=self.Timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
Up until recently I received HTTP Error 400 instead. To my knowledge I haven't changed anything. I've also tried using response groups, but it resulted in the same error(s).
Do you have any leads?
Using Python 3.5.2
I am really an ETL guy trying to learn Python, please help
import urllib2
urls =urllib2.urlopen("url1","url2")
i=0
while i< len(urls):
htmlfile = urllib2.urlopen(urls[i])
htmltext = htmlfile.read()
print htmltext
i+=1
I am getting errors as
Traceback (most recent call last):
File ".\test.py", line 2, in
urls =urllib2.urlopen("url1","url2")
File "c:\python27\Lib\urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "c:\python27\Lib\urllib2.py", line 437, in open
response = meth(req, response)
File "c:\python27\Lib\urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "c:\python27\Lib\urllib2.py", line 475, in error
return self._call_chain(*args)
File "c:\python27\Lib\urllib2.py", line 409, in _call_chain
result = func(*args)
File "c:\python27\Lib\urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 405: Method Not Allowed
Your error is coming from line 2:
urls =urllib2.urlopen("url1","url2")
Whatever url you're trying to access is returning a http error code
HTTP Error 405: Method Not Allowed
Looking at the urllib2 docs, you should only be using 1 url as an argument
https://docs.python.org/2/library/urllib2.html
Open the URL url, which can be either a string or a Request object.
data may be a string specifying additional data to send to the server, or None if no such data is needed. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided.
The 2nd argument you're putting in may be turning the request into a POST, which would explain the Method Not Allowed code.
I have a piece of code that calls facebook API like this:
ID = str(cell.value) #ID comes from an excel spread sheet
data = json.load(urllib2.urlopen('http://graph.facebook.com/' + urllib.quote(ID) +'/comments?summary=true&limit=0'))
Comments_count = int(data.get("summary").get("total_count"))
However, I am getting error on certain URLs.
Traceback (most recent call last):
File "FBS.py", line 50, in <module>
data = json.load(urllib2.urlopen('http://graph.facebook.com/' + urllib.quote(ID) +'/comments?summary=true&limit=0'))
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 410, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 523, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 448, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 400: Bad Request
I already tried using Urllib.quote(ID), but I still have the same issue.
Any help is greatly appreciated.
Thanks!!
You need to pass access token to graph api to get this information. I suggest you use a sdk like this to access graph api