Iam using python to fetch content from some urls. So I have a list of urls, and all are fine except one of them where I get a 404. I wanted to fetch this like:
for url in urls:
r = requests.get(url)
try:
r.raise_for_status()
except RuntimeError:
print('error: could not get content from url because of {}'.format(r.status_code))
But now, the exception raised by raise_for_status() is not fetched but just printed out? How can I print my own error code if its raised?
You need to modify your try catch block
try:
r = requests.get(url)
r.raise_for_status()
except requests.exceptions.HTTPError as error:
print error
You could create your own exception class and just raise that,
class MyException(Exception):
pass
...
...
for url in urls:
r = requests.get(url)
try:
r.raise_for_status()
except requests.exceptions.HTTPError as error:
raise MyException('error: could not get content from url because of {}'.format(r.status_code))
Related
I want to access mouser.com using urllib. When I try to fetch the data from the URL, it hangs indefinitely.
Here is code:
import urllib.error
import urllib.request
try:
htmls = urllib.request.urlopen("https://www.mouser.com/")
except urllib.error.HTTPError as e:
print("HTTP ERROR")
except urllib.error.URLError as e:
print("URL ERROR")
else :
print(htmls.read().decode("utf-8"))
This piece of code works fine for most URLs, but for some URLs it doesn't like Mouser or element14.
I have 1000 website list to check weather its exist or not but my code is showing all correct which is starting with https:// here is below my code
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request("http://stackoverflow.com")
try:
response = urlopen(req)
except HTTPError as e:
print('The server couldn\'t fulfill the request.')
print('Error code: ', e.code)
except URLError as e:
print('We failed to reach a server.')
print('Reason: ', e.reason)
else:
print ('Website is working fine')
You can use the python request library.
If you do response = requests.get('http://stackoverflow.com') and then do response.status_code, you should get 200. But if you try for site that is not available, you should get status_code as 404. You can use status_code in your case.
More on status codes: Link.
I am trying to get all the 'a' tags which are used for links and also the the 'form' tag. The code that I have written fetches the whole page.
import requests
from requests.exceptions import HTTPError
for url in ['http://www.example.com', 'http://mail.example.com']:
try:
response = requests.get(url)
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}') # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}') # Python 3.6
else:
response.encoding = 'utf-8' # Optional: requests infers this internally
print(response.text)
I can use regular expressions to get a specific thing from the page, but I don't know how to get the entire contents of a particular tag.
Thanks
You can use BeautifulSoup to parse html page:
import requests
from requests.exceptions import HTTPError
from bs4 import BeautifulSoup
for url in ['http://www.example.com', 'http://mail.example.com']:
try:
response = requests.get(url)
# If the response was successful, no Exception will be raised
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}') # Python 3.6
except Exception as err:
print(f'Other error occurred: {err}') # Python 3.6
else:
response.encoding = 'utf-8' # Optional: requests infers this internally
soup = BeautifulSoup(response.text, 'lxml')
links = soup.find_all('a')
forms = soup.find_all('form')
To install BeautifulSoup use:
pip install beautifulsoup4
I want to get the response code from a web server, but sometime I get code 200 even if the page doesn't exist and I don't know how to deal with it.
I'm using this code:
def checking_url(link):
try:
link = urllib.request.urlopen(link)
response = link.code
except urllib.error.HTTPError as e:
response = e.code
return response
When I'm checking a website like this one:
https://www.wykop.pl/notexistlinkkk/
It still returns code 200 even if the page doesn't exist.
Is there any solution to deal with it?
I found solution, now gonna test it with more websites
I had to use http.client.
You are getting response code 200, because the website you are checking has automatic redirection. In the URL you gave, even if you specify a non-existing page, it automatically redirects you to the home page, rather than returning a 404 status code. Your code works fine.
import urllib2
thisCode = None
try:
i = urllib2.urlopen('http://www.google.com')
thisCode = i.code
except urllib2.HTTPError, e:
thisCode = e.code
print thisCode
I am getting this error:
requests.exceptions.MissingSchema: Invalid URL 'http:/1525/bg.png': No schema supplied. Perhaps you meant http://http:/1525/bg.png?
I don't really care why the error happened, I want to be able to capture any Invalid URL errors, issue a message and proceed with the rest of the code.
Below is my code, where I'm trying to use try/except for that specific error but its not working...
# load xkcd page
# save comic image on that page
# follow <previous> comic link
# repeat until last comic is reached
import webbrowser, bs4, os, requests
url = 'http://xkcd.com/1526/'
os.makedirs('xkcd', exist_ok=True)
while not url.endswith('#'): # - last page
# download the page
print('Dowloading page %s...' % (url))
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, "html.parser")
# find url of the comic image (<div id ="comic"><img src="........"
</div
comicElem = soup.select('#comic img')
if comicElem == []:
print('Could not find any images')
else:
comicUrl = 'http:' + comicElem[0].get('src')
#download the image
print('Downloading image... %s' % (comicUrl))
res = requests.get(comicUrl)
try:
res.raise_for_status()
except requests.exceptions.MissingSchema as err:
print(err)
continue
# save image to folder
imageFile = open(os.path.join('xkcd',
os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(1000000):
imageFile.write(chunk)
imageFile.close()
#get <previous> button url
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
print('Done')
What a my not doing? (I'm on python 3.5)
Thanks allot in advance...
if you don't care about the error (which i see as bad programming), just use a blank except statement that catches all exceptions.
#download the image
print('Downloading image... %s' % (comicUrl))
try:
res = requests.get(comicUrl) # moved inside the try block
res.raise_for_status()
except:
continue
but on the other hand if your except block isn't catching the exception then it's because the exception actually happens outside your try block, so move requests.get into the try block and the exception handling should work (that's if you still need it).
Try this, if you have this type of issue occur on use wrong URL.
Solution:
import requests
correct_url = False
url = 'Ankit Gandhi' # 'https://gmail.com'
try:
res = requests.get(url)
correct_url = True
except:
print("Please enter a valid URL")
if correct_url:
"""
Do your operation
"""
print("Correct URL")
Hope this help full.
The reason your try/except block isn't caching the exception is that the error is happening at the line
res = requests.get(comicUrl)
Which is above the try keyword.
Keeping your code as is, and just moving the try block up one line will fix it.