urllib2 exception handling with couchdb - python

I usually have a hard time nailing down how to handle urllib2 exceptions. So I'm still learning. Here is a scenario that I'd like some advice on.
I have a local couch db database. I want to know if the database exists. ie "127.0.0.1:5984/database". If it does not exist, and I can reach "127.0.0.1:5984", I want to know so I can create the new database.
Here are several cases I'm thinking about:
1) I could get a timeout.
2) my url is wrong in the sense that I fail to reach the database entirely ie I typed 127.0.4.1:5984/database but couchdb is on 127.0.0.1:5984
3) the database path "database" does not exist on the couch database.
So here some code I wrote to handle it:
What I do is test the response. If everything is fine I set db_exists to True. The only time I set db_exists to False is if I get a 404. Everything else just exits the program.
request = urllib2.Request(address)
try:
response = urllib2.urlopen(request)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
print 'Failed to reach database'
print 'Reason: ', e.reason
sys.exit()
elif hasattr(e, 'code'):
if e.code == 404:
db_exists = False
else:
print 'Failed to reach database'
print 'Reason: ' + str(e)
sys.exit()
else:
try:
#I am expecting a json response. So make sure of it.
json.loads(response.read())
except:
print 'Failed to reach database at "' + address + '"'
sys.exit()
else:
db_exists = True
I am following the exception handling scheme layed out in URLlib2 The Missing Manual.
So basically my questions are...
1) Is this a clean, robust way to handle this?
2) is it common practice to sprinkle sys.exit() throughout code.
-Update-
Using couchdb-python:
main(db_url):
database = couchdb.Database(url=db_url)
try:
database.info()
except couchdb.http.ResourceNotFound, err:
print '"' + db_url + '" ' + err.message[0] + ', ' + err.message[1]
return
except couchdb.http.Unauthorized, err:
print err.message[1]
return
except couchdb.http.ServerError, err:
print err.message
return
except socket.error, err:
print str(err)
return
if __name__ == '__main__':
# note that id did not show it here, but db_url comes from an arg.
main(db_url)

I would argue that you're attacking this problem at too low a level. Why not use couchdb-python?
To answer your questions, 1) no it is not an especially clean way to do this. I would at least factor the code in your except block out into a method that extracts error types suitable for your application out of the urrlib2.URLError. For 2), no it is bad practice to call sys.exit() nearly all the time. Raise an appropriate exception. By default this will bubble up and halt the interpreter, just like your sys.exit() but with a traceback. Or, since your Couch client is a library, the exceptions can be handled at the application's discretion. Library code should never exit the interpreter.

Related

Simple way to re-run try block if a specific exception occurs?

I'm using Python 3.7 and Django and trying to figure out how to rerun a try block if a specific exception is thrown. I have
for article in all_articles:
try:
self.save_article_stats(article)
except urllib2.HTTPError as err:
if err.code == 503:
print("Got 503 error when looking for stats on " + url)
else:
raise
What I would like is if a 503 errors occurs, for the section in the "try" to be re-run, a maximum of three times. Is there a simple way to do this in Python?
You can turn this in a for loop, and break in case the try block was successful:
for article in all_articles:
for __ in range(3):
try:
self.save_article_stats(article)
break
except urllib2.HTTPError as err:
if err.code == 503:
print("Got 503 error when looking for stats on " + url)
else:
raise
In case the error code is not 503, then the error will reraise, and the control flow will exit the for loops.

urllib request fails when page takes too long to respond

I have a simple function (in python 3) to take a url and attempt to resolve it: printing an error code if there is one (e.g. 404) or resolve one of the shortened urls to its full url. My urls are in one column of a csv files and the output is saved in the next column. The problem arises where the program encounters a url where the server takes too long to respond- the program just crashes. Is there a simple way to force urllib to print an error code if the server is taking too long. I looked into Timeout on a function call but that looks a little too complicated as i am just starting out. Any suggestions?
i.e. (COL A) shorturl (COL B) http://deals.ebay.com/500276625
def urlparse(urlColumnElem):
try:
conn = urllib.request.urlopen(urlColumnElem)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
else:
redirect=conn.geturl()
#check redirect
if(redirect == urlColumnElem):
#print ("same: ")
#print(redirect)
return (redirect)
else:
#print("Not the same url ")
return(redirect)
EDIT: if anyone gets the http.client.disconnected error (like me), see this question/answer http.client.RemoteDisconnected error while reading/parsing a list of URL's
Have a look at the docs:
urllib.request.urlopen(url, data=None[, timeout])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used).
You can set a realistic timeout (in seconds) for your process:
conn = urllib.request.urlopen(urlColumnElem, timeout=realistic_timeout_in_seconds)
and in order for your code to stop crushing, move everything inside the try except block:
import socket
def urlparse(urlColumnElem):
try:
conn = urllib.request.urlopen(
urlColumnElem,
timeout=realistic_timeout_in_seconds
)
redirect=conn.geturl()
#check redirect
if(redirect == urlColumnElem):
#print ("same: ")
#print(redirect)
return (redirect)
else:
#print("Not the same url ")
return(redirect)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
except socket.timeout as e:
return ('Connection timeout')
Now if a timeout occurs, you will catch the exception and the program will not crush.
Good luck :)
First, there is a timeout parameter than can be used to control the time allowed for urlopen. Next an timeout in urlopen should just throw an exception, more precisely a socket.timeout. If you do not want it to abort the program, you just have to catch it:
def urlparse(urlColumnElem, timeout=5): # allow 5 seconds by default
try:
conn = urllib.request.urlopen(urlColumnElem, timeout = timeout)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
except socket.timeout:
return ('Timeout')
else:
...

Error handling in python with ldap

I have this bit of code below, it is part of a python script iv been working on(piecing it together blocks at a time as learning curve). This bit binds to an ldap directory to query, so the rest of the script can to the queries.
When successful, it will print the below message in the block. When not successful it will throw an error- or at least i want to control the error.
If im not domain bound/vpn it will throw this message:
{'desc': "Can't contact LDAP server"}
if incorrect credentials :
Invalid credentials
nowhere in my script is it defined for the error message, how can i find where its fetching what to print that messsage- and possibly create or customize it?
(for what its worth i am using PyCharm)
try:
l = ldap.open(server)
l.simple_bind_s(user, pwd)
#if connection is successful print:
print "successfully bound to %s.\n" %server
l.protocol_version = ldap.VERSION3
except ldap.LDAPError, e:
print e
thanks
you can do something like this to provide a specific message for a specific exception.
try:
foo = 'hi'
bar = 'hello'
#do stuff
except ValueError:
raise ValueError("invalid credientials: {},{}".format(foo, bar))
so in your example it could become
except ldap.LDAPError:
raise ldap.LDAPError("invalid credientials: {},{}".format(user, pwd))
or if you literally just want to print it
except ldap.LDAPError:
print("invalid credientials: {},{}".format(user, pwd))

python not catching HTTPError

My code is the following:
import json
import urllib2
from urllib2 import HTTPError
def karma_reddit(user):
while True:
try:
url = "https://www.reddit.com/user/" + str(user) + ".json"
data = json.load(urllib2.urlopen(url))
except urllib2.HTTPError as err:
if err == "Too Many Requests":
continue
if err == "Not Found":
print str(user) + " isn't a valid username."
else:
raise
break
I'm trying to get the data from the reddit user profile. However HTTPErrors keep occuring. When trying to catch them using the except statement they keep coming up without the program executing either another iteration of the loop or the print statement. How do I manage to catch the HTTPErrors? I'm pretty new to Python so this might be a rookie mistake. Thanks!
You need to check err.msg for the string, err itself is never equal to either so you always reach the else:raise :
if err.msg == "Too Many Requests":
continue
if err.msg == "Not Found":
print str(user) + " isn't a valid username."
I would recommend using requests and with reddit the error code is actually returned in the json so you can use that:
import requests
def karma_reddit(user):
while True:
data = requests.get("https://www.reddit.com/user/" + str(user) + ".json").json()
if data.get("error") == 429:
print("Too many requests")
elif data.get("error") == 404:
print str(user) + " isn't a valid username."
return data
The fact you are raising all exceptions bar your 429 and 404's means you don't need a try. You should really break on any error and just output a message to the user and limit the amount of requests.

check ftplib response code

I have a python application that's accessing an ftp server. There are several error cases I'd like to catch in a fashion similar to httplib2:
try:
urllib2.urlopen("http://google.com")
except urllib2.HTTPError, e:
if e.code == 304:
#do 304 stuff
if e.code == 404:
#do 404 stuff
else:
pass
Does a a construct like this exist in ftplib.err_perm? I know that could return a code of 500-599 according to the docs but I don't see anything in the docs about how to access that value. Did I miss something?
You can access error reponse string using <exception_obj>.args[0]. It contains strings like '550 /no-such-dir: No such file or directory'.
To get error code (only leading three chracters), use <exception_obj>.args[0][:3].
For example:
import ftplib
ftp = ftplib.FTP('ftp.hq.nasa.gov')
ftp.login('anonymous', 'user#example.com')
try:
ftp.cwd('/no-such-dir')
except ftplib.error_perm as e:
print('Error {}'.format(e.args[0][:3]))
finally:
ftp.quit()

Categories