python not catching HTTPError - python

My code is the following:
import json
import urllib2
from urllib2 import HTTPError
def karma_reddit(user):
while True:
try:
url = "https://www.reddit.com/user/" + str(user) + ".json"
data = json.load(urllib2.urlopen(url))
except urllib2.HTTPError as err:
if err == "Too Many Requests":
continue
if err == "Not Found":
print str(user) + " isn't a valid username."
else:
raise
break
I'm trying to get the data from the reddit user profile. However HTTPErrors keep occuring. When trying to catch them using the except statement they keep coming up without the program executing either another iteration of the loop or the print statement. How do I manage to catch the HTTPErrors? I'm pretty new to Python so this might be a rookie mistake. Thanks!

You need to check err.msg for the string, err itself is never equal to either so you always reach the else:raise :
if err.msg == "Too Many Requests":
continue
if err.msg == "Not Found":
print str(user) + " isn't a valid username."
I would recommend using requests and with reddit the error code is actually returned in the json so you can use that:
import requests
def karma_reddit(user):
while True:
data = requests.get("https://www.reddit.com/user/" + str(user) + ".json").json()
if data.get("error") == 429:
print("Too many requests")
elif data.get("error") == 404:
print str(user) + " isn't a valid username."
return data
The fact you are raising all exceptions bar your 429 and 404's means you don't need a try. You should really break on any error and just output a message to the user and limit the amount of requests.

Related

Infinite spamming on messaging

In this loop, sometimes I get infinity spamming in telegram error with code request, that not equal 200 and that there is no records in log_file. Like looping on bot.send_message. And I don't get why.
But the code doesn't break, so I can't get any errors. Seems like "sleep" doesn't work sometimes, but how it can work randomly?
Most of the time, the code works properly
import time
import telebot
import requests
import datetime
import json
import os
from funcs import *
from varias import *
from pathlib import Path
if not os.path.exists('C:/pymon_logs'):
os.makedirs('C:/pymon_logs')
log_file = path("api_log", "a")
while 1:
try:
request = str(requests.get("https://***"))
if request == "<Response [200]>":
time.sleep(5)
elif request != "<Response [200]>":
bot.send_message(chat_id, "API " + str(request)[1:-1])
log_file.write(str(request)[1:-1] + " " + today + '\n')
log_file.close()
time.sleep(120)
except requests.ConnectionError as e:
bot.send_message(chat_id, "API Connection Error")
log_file.write("API Connection Error" + " " + today + '\n')
log_file.close()
time.sleep(120)
except requests.Timeout as e:
bot.send_message(chat_id, "API Timeout Error")
log_file.write("API Timeout Error" + " " + today + '\n')
log_file.close()
time.sleep(120)
except requests.RequestException as e:
bot.send_message(chat_id, "API huy znaet chto za oshibka")
log_file.write("API huy znaet chto za oshibka" + " " + today + '\n')
log_file.close()
time.sleep(120)
except:
pass
First of all, you shouldn't check response code by converting to string. It's not a good practice IMHO.
You should:
request = requests.get("https://***")
and check by:
if r.status_code == 200:
...
And this next line is unnecessary. Replace this code with else:
elif request != "<Response [200]>":
But the main problem with your code, is you shouldn't close the file in an infinite loop. With your code, on the very first exception you are closing the file. If you close the file handle, you cannot write, so you will not be able to see anything in the log file.
And, after your code gets another non-200 response code, you try to write on a closed file, then throw another exception. This way, you skip the sleep(120) part, and throw again and again and again...
TL,DR;
Remove all the
log_file.close()
parts.

urlerror and ssl.CertificateError

I have the following code:
from urllib.request import urlopen
from urllib.error import HTTPError, URLError
from bs4 import BeautifulSoup
# target = "https://www.rolcruise.co.uk/cruise-detail/1158731-hawaii-round-trip-honolulu-2020-05-23"
target = "https://www.rolcruise.co.uk"
try:
html = urlopen(target)
except HTTPError as e:
print("You got a HTTP Error. Something wrong with the path.")
print("Here is the error code: " + str(e.code))
print("Here is the error reason: " + e.reason)
print("Happy for the program to end here"
except URLError as e:
print("You got a URL Error. Something wrong with the URL.")
print("Here is the error reason: " + str(e.reason))
print("Happy for the program to end here")
else:
bs_obj = BeautifulSoup(html, features="lxml")
print(bs_obj)
If I deliberately make a mistake in typing certain parts of the url, the urlerror handling works fine, i.e. if I deliberately type "htps" instead of "https", or "ww" instead of "www", or "u" instead of "uk".
e.g.
target = "https://www.rolcruise.co.u"
However if there is a mistake in the typing of the hostname ("rolcruise") or in the "co" part of url, urlerror does not work and I get an error message that says ssl.CertificateError.
e.g.
target = "https://www.rolcruise.c.uk"
I do not understand why URLError doesn't cover all scenarios where there is a typo somewhere in a url?
Given that it is happening, what is the next move to handle the ssl.CertificateError?
Thanks for your help!
Get ssl into your namespace to start:
import ssl
Then you can catch that kind of exception:
try:
html = urlopen(target)
except HTTPError as e:
print("You got a HTTP Error. Something wrong with the path.")
print("Here is the error code: " + str(e.code))
print("Here is the error reason: " + e.reason)
print("Happy for the program to end here"
except URLError as e:
print("You got a URL Error. Something wrong with the URL.")
print("Here is the error reason: " + str(e.reason))
print("Happy for the program to end here")
except ssl.CertificateError:
# Do your stuff here...
else:
bs_obj = BeautifulSoup(html, features="lxml")
print(bs_obj)

Strange Intermittent posting issue using python requests

I'm using a raspberry pi 3 and python 2.7 with requests to post data to my lamp server. All works great except for intermittent posting errors which are trapped requests.exceptions.ConnectTimeout. BTW, timeout=0.5 sec which is 2.5x the posting time (0.2 sec). See code below.
When the request exception occurs, I check for internet access using CheckConnection(). BTW, this takes only 0.016 sec on pi;so fast compared to other techniques. When False, it doesn't retry posting and logs data locally.
However, I can connect remotely to the Pi using TeamViewer while this is happening! I am posting data to our server with other installations so it is not a cloud server down issue.
After several to many minutes, the issue resolves itself and posting resumes like nothing was wrong.
Any suggestions to how I can change my code is most welcomed either to determine the root cause or fix the issue. Thank you in advance.
******** CODE ************
def PostData(payload,retry_count=3):
url = 'http://xxx.xxx.xxx.xxx/api/data/push/'
try:
response = requests.post(url,params=payload,timeout=0.5)
if response.status_code == 200:
return response.text
response.raise_for_status()
except (requests.exceptions.RequestException, requests.exceptions.ConnectTimeout) as e:
print "Post Error..."
x = CheckConnection()
if x==False:
return "Internet for Posting: " + str(x)
if retry_count >0:
Reason = "Post Settings Retry: " + str(retry_count)
print Reason
#sleeptime = 0.05*2**(3-retry_count)
#time.sleep(sleeptime)
return PostData(payload, retry_count-1)
if retry_count==0:
Reason = "Error! Post settings retry failed. Retry=0. Internet: " + str(x)
return Reason
return None
except Exception as e:
x = CheckConnection()
Reason= "Error! Posting Exception: " + str(e) + "Internet: " + str(x)
print Reason
return None
def CheckConnection(host="8.8.8.8",port=53,timeout=0.5):
try:
socket.setdefaulttimeout(timeout)
socket.socket(socket.AF_INET,socket.SOCK_STREAM).connect((host,port))
return True
except Exception:
return False

How to check if a list of URLs exists

I'm trying to test if a simple list of urls exists, the code works when I'm just testing one url, but when I try add a array of urls, it's breaks.
Any idea what i'm doing wrong?
Single URL Code
import httplib
c = httplib.HTTPConnection('www.example.com')
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
Broken Array Code
import httplib
Urls = ['www.google.ie', 'www.msn.com', 'www.fakeniallweb.com', 'www.wikipedia.org', 'www.galwaydxc.com', 'www.foxnews.com', 'www.blizzard.com', 'www.youtube.com']
for x in Urls:
c = httplib.HTTPConnection(x)
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
else:
print('web site' + x + 'un-reachable')
#To prevent code from closing
input ()
The problem is not that you do it as an array, it is that one of your urls (www.fakeniallweb.com) has a different problem than your other urls.
I think because the DNS cannot be resolved, you cannot request the HEAD as you do. So you need an additional check other than just checking for response code 200.
Maybe you could do something like this:
try:
c.request("HEAD", '')
if c.getresponse().status == 200:
print('web site exists')
else:
print('website does not exist')
except gaierror as e:
print('Error resolving DNS')
Honestly I suspect you will find other cases where a website returns different status codes. For example a website might return something in the 3xx range for a redirect, or a 403 if you cannot access it. That does not mean the website does not exist.
Hope this helps you on your way!
#Dries De Rydt
Thanks for your help , it was a unresolved dns error causing it to crash out.
I ended up Lib/socket.py
solution
import socket
Urls = ['www.google.ie', 'www.msn.com', 'www.fakeniallweb.com', 'www.wikipedia.org', 'www.galwaydxc.com', 'www.foxnews.com', 'www.blizzard.com', 'www.youtube.com']
for x in Urls:
try:
url = socket.gethostbyname(x)
print x + ' was reachable '
except socket.gaierror, err:
print "cannot resolve hostname: ", x, err
#To prevent code from closing
input ()
Thanks for all the help.

urllib2 exception handling with couchdb

I usually have a hard time nailing down how to handle urllib2 exceptions. So I'm still learning. Here is a scenario that I'd like some advice on.
I have a local couch db database. I want to know if the database exists. ie "127.0.0.1:5984/database". If it does not exist, and I can reach "127.0.0.1:5984", I want to know so I can create the new database.
Here are several cases I'm thinking about:
1) I could get a timeout.
2) my url is wrong in the sense that I fail to reach the database entirely ie I typed 127.0.4.1:5984/database but couchdb is on 127.0.0.1:5984
3) the database path "database" does not exist on the couch database.
So here some code I wrote to handle it:
What I do is test the response. If everything is fine I set db_exists to True. The only time I set db_exists to False is if I get a 404. Everything else just exits the program.
request = urllib2.Request(address)
try:
response = urllib2.urlopen(request)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
print 'Failed to reach database'
print 'Reason: ', e.reason
sys.exit()
elif hasattr(e, 'code'):
if e.code == 404:
db_exists = False
else:
print 'Failed to reach database'
print 'Reason: ' + str(e)
sys.exit()
else:
try:
#I am expecting a json response. So make sure of it.
json.loads(response.read())
except:
print 'Failed to reach database at "' + address + '"'
sys.exit()
else:
db_exists = True
I am following the exception handling scheme layed out in URLlib2 The Missing Manual.
So basically my questions are...
1) Is this a clean, robust way to handle this?
2) is it common practice to sprinkle sys.exit() throughout code.
-Update-
Using couchdb-python:
main(db_url):
database = couchdb.Database(url=db_url)
try:
database.info()
except couchdb.http.ResourceNotFound, err:
print '"' + db_url + '" ' + err.message[0] + ', ' + err.message[1]
return
except couchdb.http.Unauthorized, err:
print err.message[1]
return
except couchdb.http.ServerError, err:
print err.message
return
except socket.error, err:
print str(err)
return
if __name__ == '__main__':
# note that id did not show it here, but db_url comes from an arg.
main(db_url)
I would argue that you're attacking this problem at too low a level. Why not use couchdb-python?
To answer your questions, 1) no it is not an especially clean way to do this. I would at least factor the code in your except block out into a method that extracts error types suitable for your application out of the urrlib2.URLError. For 2), no it is bad practice to call sys.exit() nearly all the time. Raise an appropriate exception. By default this will bubble up and halt the interpreter, just like your sys.exit() but with a traceback. Or, since your Couch client is a library, the exceptions can be handled at the application's discretion. Library code should never exit the interpreter.

Categories