urllib request fails when page takes too long to respond

urllib request fails when page takes too long to respond - python

I have a simple function (in python 3) to take a url and attempt to resolve it: printing an error code if there is one (e.g. 404) or resolve one of the shortened urls to its full url. My urls are in one column of a csv files and the output is saved in the next column. The problem arises where the program encounters a url where the server takes too long to respond- the program just crashes. Is there a simple way to force urllib to print an error code if the server is taking too long. I looked into Timeout on a function call but that looks a little too complicated as i am just starting out. Any suggestions?
i.e. (COL A) shorturl (COL B) http://deals.ebay.com/500276625
def urlparse(urlColumnElem):
try:
conn = urllib.request.urlopen(urlColumnElem)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
else:
redirect=conn.geturl()
#check redirect
if(redirect == urlColumnElem):
#print ("same: ")
#print(redirect)
return (redirect)
else:
#print("Not the same url ")
return(redirect)
EDIT: if anyone gets the http.client.disconnected error (like me), see this question/answer http.client.RemoteDisconnected error while reading/parsing a list of URL's

Have a look at the docs:
urllib.request.urlopen(url, data=None[, timeout])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used).
You can set a realistic timeout (in seconds) for your process:
conn = urllib.request.urlopen(urlColumnElem, timeout=realistic_timeout_in_seconds)
and in order for your code to stop crushing, move everything inside the try except block:
import socket
def urlparse(urlColumnElem):
try:
conn = urllib.request.urlopen(
urlColumnElem,
timeout=realistic_timeout_in_seconds
)
redirect=conn.geturl()
#check redirect
if(redirect == urlColumnElem):
#print ("same: ")
#print(redirect)
return (redirect)
else:
#print("Not the same url ")
return(redirect)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
except socket.timeout as e:
return ('Connection timeout')
Now if a timeout occurs, you will catch the exception and the program will not crush.
Good luck :)

First, there is a timeout parameter than can be used to control the time allowed for urlopen. Next an timeout in urlopen should just throw an exception, more precisely a socket.timeout. If you do not want it to abort the program, you just have to catch it:
def urlparse(urlColumnElem, timeout=5): # allow 5 seconds by default
try:
conn = urllib.request.urlopen(urlColumnElem, timeout = timeout)
except urllib.error.HTTPError as e:
return (e.code)
except urllib.error.URLError as e:
return ('URL_Error')
except socket.timeout:
return ('Timeout')
else:
...

Related

'continue not in loop' error while trying to add an error handling mechanism

I've been trying to add an error handling mechanism to my code section. However when it runs it says 'continue outside of loop' but looking at the code it should be inside the try loop. What's going wrong?
def download_media_item(self, entry):
try:
url, path = entry
# Get the file extension example: ".jpg"
ext = url[url.rfind('.'):]
if not os.path.isfile(path + ext):
r = requests.get(url, headers=headers, timeout=15)
if r.status_code == 200:
open(path + ext, 'wb').write(r.content)
self.user_log.info('File {} downloaded from {}'.format(path, url))
return True
elif r.status_code == 443:
print('------------the server reported a 443 error-----------')
return False
else:
self.user_log.info('File {} already exists. URL: {}'.format(path, url))
return False
except requests.ConnectionError:
print("Received ConnectionError. Retrying...")
continue
except requests.exceptions.ReadTimeout:
print("Received ReadTimeout. Retrying...")
continue

It seems that what you are actually wanting to do is keep looping until no exception is raised.
In general, you can do this by having an infinite loop that you break from in the event of a successful completion.
Either:
while True:
try:
# do stuff
except requests.ConnectionError:
# handle error
continue
except requests.exceptions.ReadTimeout:
# handle error
continue
break
Or:
while True:
try:
# do stuff
except requests.ConnectionError:
# handle error
except requests.exceptions.ReadTimeout:
# handle error
else:
break
However, in this case, the "do stuff" seems to always end by reaching a return statement, so the break is not required, and the following reduced version would suffice:
while True:
try:
# do stuff
return some_value
except requests.ConnectionError:
# handle error
except requests.exceptions.ReadTimeout:
# handle error
(The single return shown here may refer to alternative control flows, all of which lead to a return, as in your case.)

continue is specifically for immediately moving to the next iteration of a for or while loop; it is not an all-purpose move-to-the-next-statement instruction.
In a try/except statement, anytime you reach the end of a try, except, else, or finally block, execution proceeds with the next complete statement, not the next portion of the try statement.
def download_media_item(self, entry):
# 1: try statement
try:
...
# If you get here, execution goes to #2 below, not the
# except block below
except requests.ConnectionError:
print("Received ConnectionError. Retrying...")
# Execution goes to #2 below, not the except block below
except requests.exceptions.ReadTimeout:
print("Received ReadTimeout. Retrying...")
# Execution goes to #2 below
# 2: next statement
...

python threading, confirming responses before moving to next line

Recently I have been working to integrate google directory, calendar and classroom to work seamlessly with the existing services that we have.
I need to loop through 1500 objects and make requests in google to check something. Responses from google does take awhile hence I want to wait on that request to complete but at the same time run other checks.
def __get_students_of_course(self, course_id, index_in_course_list, page=None):
print("getting students from gclass ", course_id, "page ", page)
# self.__check_request_count(10)
try:
response = self.class_service.courses().students().list(courseId=course_id,
pageToken=page).execute()
# the response must come back before proceeding to the next checks
course_to_add_to = self.course_list_gsuite[index_in_course_list]
current_students = course_to_add_to["students"]
for student in response["students"]:
current_students.append(student["profile"]["emailAddress"])
self.course_list_gsuite[index_in_course_list] = course_to_add_to
try:
if "nextPageToken" in response:
self.__get_students_of_course(
course_id, index_in_course_list, page=response["nextPageToken"])
else:
return
except Exception as e:
print(e)
return
except Exception as e:
print((e))
And I run that function from another function
def __check_course_state(self, course):
course_to_create = {...}
try:
g_course = next(
(g_course for g_course in self.course_list_gsuite if g_course["name"] == course_to_create["name"]), None)
if g_course != None:
index_2 = None
for index_1, class_name in enumerate(self.course_list_gsuite):
if class_name["name"] == course_to_create["name"]:
index_2 = index_1
self.__get_students_of_course(
g_course["id"], index_2) # need to wait here
students_enrolled_in_g_class = self.course_list_gsuite[index_2]["students"]
request = requests.post() # need to wait here
students_in_iras = request.json()
students_to_add_in_g_class = []
for student in students["data"]:
try:
pass
except Exception as e:
print(e)
students_to_add_in_g_class.append(
student["studentId"])
if len(students_to_add_in_g_class) != 0:
pass
else:
pass
else:
pass
except Exception as e:
print(e)
I need to these tasks for 1500 objects.
Although they are not related to each other. I want to move to the next object in the loop while it waits for the other results to come back and finish.
Here is how I tried this with threads:
def create_courses(self):
# pool = []
counter = 0
with concurrent.futures.ThreadPoolExecutor() as excecutor:
results = excecutor.map(
self.__check_course_state, self.courses[0:5])
The problem is when I run it like this I get multiple SSL errors and other errors and as far as I understand, as the threads themselves are running, the requests never wait to finish and move to the next line hence I have nothing in the request object so it throws me errors?
Any Ideas on how to approach this?

The ssl error occurs her because i was reusing the http instance from google api lib. self.class_service is being used to send a request while waiting on another request. The best way to handle this is to create instances of the service on every request.

Correct way to try/except using Python requests module?

try:
r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
print(e)
Is this correct? Is there a better way to structure this? Will this cover all my bases?

Have a look at the Requests exception docs. In short:
In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.
In the event of the rare invalid HTTP response, Requests will raise an HTTPError exception.
If a request times out, a Timeout exception is raised.
If a request exceeds the configured number of maximum redirections, a TooManyRedirects exception is raised.
All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.
To answer your question, what you show will not cover all of your bases. You'll only catch connection-related errors, not ones that time out.
What to do when you catch the exception is really up to the design of your script/program. Is it acceptable to exit? Can you go on and try again? If the error is catastrophic and you can't go on, then yes, you may abort your program by raising SystemExit (a nice way to both print an error and call sys.exit).
You can either catch the base-class exception, which will handle all cases:
try:
r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e: # This is the correct syntax
raise SystemExit(e)
Or you can catch them separately and do different things.
try:
r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
# Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
raise SystemExit(e)
As Christian pointed out:
If you want http errors (e.g. 401 Unauthorized) to raise exceptions, you can call Response.raise_for_status. That will raise an HTTPError, if the response was an http error.
An example:
try:
r = requests.get('http://www.google.com/nothere')
r.raise_for_status()
except requests.exceptions.HTTPError as err:
raise SystemExit(err)
Will print:
404 Client Error: Not Found for url: http://www.google.com/nothere

One additional suggestion to be explicit. It seems best to go from specific to general down the stack of errors to get the desired error to be caught, so the specific ones don't get masked by the general one.
url='http://www.google.com/blahblah'
try:
r = requests.get(url,timeout=3)
r.raise_for_status()
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah
vs
url='http://www.google.com/blahblah'
try:
r = requests.get(url,timeout=3)
r.raise_for_status()
except requests.exceptions.RequestException as err:
print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
print ("Timeout Error:",errt)
OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

Exception object also contains original response e.response, that could be useful if need to see error body in response from the server. For example:
try:
r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
r.raise_for_status()
except requests.exceptions.HTTPError as e:
print (e.response.text)

Here's a generic way to do things which at least means that you don't have to surround each and every requests call with try ... except:
Basic version
# see the docs: if you set no timeout the call never times out! A tuple means "max
# connect time" and "max read time"
DEFAULT_REQUESTS_TIMEOUT = (5, 15) # for example
def log_exception(e, verb, url, kwargs):
# the reason for making this a separate function will become apparent
raw_tb = traceback.extract_stack()
if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
kwargs['data'] = f'{kwargs["data"][:500]}...'
msg = f'BaseException raised: {e.__class__.__module__}.{e.__class__.__qualname__}: {e}\n' \
+ f'verb {verb}, url {url}, kwargs {kwargs}\n\n' \
+ 'Stack trace:\n' + ''.join(traceback.format_list(raw_tb[:-2]))
logger.error(msg)
def requests_call(verb, url, **kwargs):
response = None
exception = None
try:
if 'timeout' not in kwargs:
kwargs['timeout'] = DEFAULT_REQUESTS_TIMEOUT
response = requests.request(verb, url, **kwargs)
except BaseException as e:
log_exception(e, verb, url, kwargs)
exception = e
return (response, exception)
NB
Be aware of ConnectionError which is a builtin, nothing to do with the class requests.ConnectionError*. I assume the latter is more common in this context but have no real idea...
When examining a non-None returned exception, requests.RequestException, the superclass of all the requests exceptions (including requests.ConnectionError), is not "requests.exceptions.RequestException" according to the docs. Maybe it has changed since the accepted answer.**
Obviously this assumes a logger has been configured. Calling logger.exception in the except block might seem a good idea but that would only give the stack within this method! Instead, get the trace leading up to the call to this method. Then log (with details of the exception, and of the call which caused the problem)
*I looked at the source code: requests.ConnectionError subclasses the single class requests.RequestException, which subclasses the single class IOError (builtin)
**However at the bottom of this page you find "requests.exceptions.RequestException" at the time of writing (2022-02)... but it links to the above page: confusing.
Usage is very simple:
search_response, exception = utilities.requests_call('get',
f'http://localhost:9200/my_index/_search?q={search_string}')
First you check the response: if it's None something funny has happened and you will have an exception which has to be acted on in some way depending on context (and on the exception). In Gui applications (PyQt5) I usually implement a "visual log" to give some output to the user (and also log simultaneously to the log file), but messages added there should be non-technical. So something like this might typically follow:
if search_response == None:
# you might check here for (e.g.) a requests.Timeout, tailoring the message
# accordingly, as the kind of error anyone might be expected to understand
msg = f'No response searching on |{search_string}|. See log'
MainWindow.the().visual_log(msg, log_level=logging.ERROR)
return
response_json = search_response.json()
if search_response.status_code != 200: # NB 201 ("created") may be acceptable sometimes...
msg = f'Bad response searching on |{search_string}|. See log'
MainWindow.the().visual_log(msg, log_level=logging.ERROR)
# usually response_json will give full details about the problem
log_msg = f'search on |{search_string}| bad response\n{json.dumps(response_json, indent=4)}'
logger.error(log_msg)
return
# now examine the keys and values in response_json: these may of course
# indicate an error of some kind even though the response returned OK (status 200)...
Given that the stack trace is logged automatically you often need no more than that...
Advanced version when json object returned
(... potentially sparing a great deal of boilerplate!)
To cross the Ts, when a json object is expected to be returned:
If, as above, an exception gives your non-technical user a message "No response", and a non-200 status "Bad response", I suggest that
a missing expected key in the response's JSON structure should give rise to a message "Anomalous response"
an out-of-range or strange value to a message "Unexpected response"
and the presence of a key such as "error" or "errors", with value True or whatever, to a message "Error response"
These may or may not prevent the code from continuing.
... and in fact to my mind it is worth making the process even more generic. These next functions, for me, typically cut down 20 lines of code using the above requests_call to about 3, and make most of your handling and your log messages standardised. More than a handful of requests calls in your project and the code gets a lot nicer and less bloated:
def log_response_error(response_type, call_name, deliverable, verb, url, **kwargs):
# NB this function can also be used independently
if response_type == 'No': # exception was raised (and logged)
if isinstance(deliverable, requests.Timeout):
MainWindow.the().visual_log(f'Time out of {call_name} before response received!', logging.ERROR)
return
else:
if isinstance(deliverable, BaseException):
# NB if response.json() raises an exception we end up here
log_exception(deliverable, verb, url, kwargs)
else:
# if we get here no exception has been raised, so no stack trace has yet been logged.
# a response has been returned, but is either "Bad" or "Anomalous"
response_json = deliverable.json()
raw_tb = traceback.extract_stack()
if 'data' in kwargs and len(kwargs['data']) > 500: # anticipate giant data string
kwargs['data'] = f'{kwargs["data"][:500]}...'
added_message = ''
if hasattr(deliverable, 'added_message'):
added_message = deliverable.added_message + '\n'
del deliverable.added_message
call_and_response_details = f'{response_type} response\n{added_message}' \
+ f'verb {verb}, url {url}, kwargs {kwargs}\nresponse:\n{json.dumps(response_json, indent=4)}'
logger.error(f'{call_and_response_details}\nStack trace: {"".join(traceback.format_list(raw_tb[:-1]))}')
MainWindow.the().visual_log(f'{response_type} response {call_name}. See log.', logging.ERROR)
def check_keys(req_dict_structure, response_dict_structure, response):
# so this function is about checking the keys in the returned json object...
# NB both structures MUST be dicts
if not isinstance(req_dict_structure, dict):
response.added_message = f'req_dict_structure not dict: {type(req_dict_structure)}\n'
return False
if not isinstance(response_dict_structure, dict):
response.added_message = f'response_dict_structure not dict: {type(response_dict_structure)}\n'
return False
for dict_key in req_dict_structure.keys():
if dict_key not in response_dict_structure:
response.added_message = f'key |{dict_key}| missing\n'
return False
req_value = req_dict_structure[dict_key]
response_value = response_dict_structure[dict_key]
if isinstance(req_value, dict):
# if the response at this point is a list apply the req_value dict to each element:
# failure in just one such element leads to "Anomalous response"...
if isinstance(response_value, list):
for resp_list_element in response_value:
if not check_keys(req_value, resp_list_element, response):
return False
elif not check_keys(req_value, response_value, response): # any other response value must be a dict (tested in next level of recursion)
return False
elif isinstance(req_value, list):
if not isinstance(response_value, list): # if the req_value is a list the reponse must be one
response.added_message = f'key |{dict_key}| not list: {type(response_value)}\n'
return False
# it is OK for the value to be a list, but these must be strings (keys) or dicts
for req_list_element, resp_list_element in zip(req_value, response_value):
if isinstance(req_list_element, dict):
if not check_keys(req_list_element, resp_list_element, response):
return False
if not isinstance(req_list_element, str):
response.added_message = f'req_list_element not string: {type(req_list_element)}\n'
return False
if req_list_element not in response_value:
response.added_message = f'key |{req_list_element}| missing from response list\n'
return False
# put None as a dummy value (otherwise something like {'my_key'} will be seen as a set, not a dict
elif req_value != None:
response.added_message = f'required value of key |{dict_key}| must be None (dummy), dict or list: {type(req_value)}\n'
return False
return True
def process_json_requests_call(verb, url, **kwargs):
# "call_name" is a mandatory kwarg
if 'call_name' not in kwargs:
raise Exception('kwarg "call_name" not supplied!')
call_name = kwargs['call_name']
del kwargs['call_name']
required_keys = {}
if 'required_keys' in kwargs:
required_keys = kwargs['required_keys']
del kwargs['required_keys']
acceptable_statuses = [200]
if 'acceptable_statuses' in kwargs:
acceptable_statuses = kwargs['acceptable_statuses']
del kwargs['acceptable_statuses']
exception_handler = log_response_error
if 'exception_handler' in kwargs:
exception_handler = kwargs['exception_handler']
del kwargs['exception_handler']
response, exception = requests_call(verb, url, **kwargs)
if response == None:
exception_handler('No', call_name, exception, verb, url, **kwargs)
return (False, exception)
try:
response_json = response.json()
except BaseException as e:
logger.error(f'response.status_code {response.status_code} but calling json() raised exception')
# an exception raised at this point can't truthfully lead to a "No response" message... so say "bad"
exception_handler('Bad', call_name, e, verb, url, **kwargs)
return (False, response)
status_ok = response.status_code in acceptable_statuses
if not status_ok:
response.added_message = f'status code was {response.status_code}'
log_response_error('Bad', call_name, response, verb, url, **kwargs)
return (False, response)
check_result = check_keys(required_keys, response_json, response)
if not check_result:
log_response_error('Anomalous', call_name, response, verb, url, **kwargs)
return (check_result, response)
Example call (NB with this version, the "deliverable" is either an exception or a response which delivers a json structure):
success, deliverable = utilities.process_json_requests_call('get',
f'{ES_URL}{INDEX_NAME}/_doc/1',
call_name=f'checking index {INDEX_NAME}',
required_keys={'_source':{'status_text': None}})
if not success: return False
# here, we know the deliverable is a response, not an exception
# we also don't need to check for the keys being present:
# the generic code has checked that all expected keys are present
index_status = deliverable.json()['_source']['status_text']
if index_status != 'successfully completed':
# ... i.e. an example of a 200 response, but an error nonetheless
msg = f'Error response: ES index {INDEX_NAME} does not seem to have been built OK: cannot search'
MainWindow.the().visual_log(msg)
logger.error(f'index |{INDEX_NAME}|: deliverable.json() {json.dumps(deliverable.json(), indent=4)}')
return False
So the "visual log" message seen by the user in the case of missing key "status_text", for example, would be "Anomalous response checking index XYZ. See log." (and the log would give a more detailed technical message, constructed automatically, including the stack trace but also details of the missing key in question).
NB
mandatory kwarg: call_name; optional kwargs: required_keys, acceptable_statuses, exception_handler.
the required_keys dict can be nested to any depth
finer-grained exception-handling can be accomplished by including a function exception_handler in kwargs (though don't forget that requests_call will have logged the call details, the exception type and __str__, and the stack trace).
in the above I also implement a check on key "data" in any kwargs which may be logged. This is because a bulk operation (e.g. to populate an index in the case of Elasticsearch) can consist of enormous strings. So curtail to the first 500 characters, for example.
PS Yes, I do know about the elasticsearch Python module (a "thin wrapper" around requests). All the above is for illustration purposes.

urllib2 exception handling with couchdb

I usually have a hard time nailing down how to handle urllib2 exceptions. So I'm still learning. Here is a scenario that I'd like some advice on.
I have a local couch db database. I want to know if the database exists. ie "127.0.0.1:5984/database". If it does not exist, and I can reach "127.0.0.1:5984", I want to know so I can create the new database.
Here are several cases I'm thinking about:
1) I could get a timeout.
2) my url is wrong in the sense that I fail to reach the database entirely ie I typed 127.0.4.1:5984/database but couchdb is on 127.0.0.1:5984
3) the database path "database" does not exist on the couch database.
So here some code I wrote to handle it:
What I do is test the response. If everything is fine I set db_exists to True. The only time I set db_exists to False is if I get a 404. Everything else just exits the program.
request = urllib2.Request(address)
try:
response = urllib2.urlopen(request)
except urllib2.URLError, e:
if hasattr(e, 'reason'):
print 'Failed to reach database'
print 'Reason: ', e.reason
sys.exit()
elif hasattr(e, 'code'):
if e.code == 404:
db_exists = False
else:
print 'Failed to reach database'
print 'Reason: ' + str(e)
sys.exit()
else:
try:
#I am expecting a json response. So make sure of it.
json.loads(response.read())
except:
print 'Failed to reach database at "' + address + '"'
sys.exit()
else:
db_exists = True
I am following the exception handling scheme layed out in URLlib2 The Missing Manual.
So basically my questions are...
1) Is this a clean, robust way to handle this?
2) is it common practice to sprinkle sys.exit() throughout code.
-Update-
Using couchdb-python:
main(db_url):
database = couchdb.Database(url=db_url)
try:
database.info()
except couchdb.http.ResourceNotFound, err:
print '"' + db_url + '" ' + err.message[0] + ', ' + err.message[1]
return
except couchdb.http.Unauthorized, err:
print err.message[1]
return
except couchdb.http.ServerError, err:
print err.message
return
except socket.error, err:
print str(err)
return
if __name__ == '__main__':
# note that id did not show it here, but db_url comes from an arg.
main(db_url)

I would argue that you're attacking this problem at too low a level. Why not use couchdb-python?
To answer your questions, 1) no it is not an especially clean way to do this. I would at least factor the code in your except block out into a method that extracts error types suitable for your application out of the urrlib2.URLError. For 2), no it is bad practice to call sys.exit() nearly all the time. Raise an appropriate exception. By default this will bubble up and halt the interpreter, just like your sys.exit() but with a traceback. Or, since your Couch client is a library, the exceptions can be handled at the application's discretion. Library code should never exit the interpreter.

Python urllib2 HTTPBasicAuthHandler

Here is the code:
import urllib2 as URL
def get_unread_msgs(user, passwd):
auth = URL.HTTPBasicAuthHandler()
auth.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s'%user,
passwd=passwd
)
opener = URL.build_opener(auth)
URL.install_opener(opener)
try:
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
except:
return None
It works just fine. The only problem is that when a wrong username or password is used, it takes forever to open to url #
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
It doesn't throw up any errors, just keep executing the urlopen statement forever.
How can i know if username/password is incorrect.
I thought of a timeout for the function but then that would turn all error and even slow internet into a authentication error.

It should throw an error, more precisely an urllib2.HTTPError, with the code field set to 401, you can see some adapted code below. I left your general try/except structure, but really, do not use general except statements, catch only what you expect that could happen!
def get_unread_msgs(user, passwd):
auth = URL.HTTPBasicAuthHandler()
auth.add_password(
realm='New mail feed',
uri='https://mail.google.com',
user='%s'%user,
passwd=passwd
)
opener = URL.build_opener(auth)
URL.install_opener(opener)
try:
feed= URL.urlopen('https://mail.google.com/mail/feed/atom')
return feed.read()
except HTTPError, e:
if e.code == 401:
print "authorization failed"
else:
raise e # or do something else
except: #A general except clause is discouraged, I let it in because you had it already
return None
I just tested it here, works perfectly

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.