Scrapy suppress handled errors - python

Relevant Code
def start_requests( self ):
requests = [ Request( url['url'], meta=url['meta'], callback=self.parse, errback=self.handle_error ) for url in self.start_urls if valid_url( url['url'] )]
return requests
def handle_error( self, err ):
# Errors being saved in DB
# So I don't want them displayed in the logs
I've got my own code for saving error codes in DB. I don't want them displayed in the log output. How can I suppress these errors?
Note that I don't want to suppress all errors - just the ones being handled here.

Try to use self.skipped.add, self.failed.add with isinstance condition in your handle_error method.
Here is an example
def on_error(self, failure):
if isinstance(failure.value, HttpError):
response = failure.value.response
if response.status in self.bypass_status_codes:
self.skipped.add(response.url[-3:])
return self.parse(response)
# it assumes there is a response attached to failure
self.failed.add(failure.value.response.url[-3:])
return failure

Answer by #Daniil Mashkin seems to be the most comprehensive solution.
For simple cases, you can add http error codes Spider.handle_httpstatus_list or HTTPERROR_ALLOWED_CODES in Settings.py.
This will send some erroneous answers to your callback function, thus skipping logging as well

Use a simple try-except in your function. As long as you handle the exception by yourself (adding rows to the db, just "pass", ...), twisted won't recognize the error.
e.g.
def handle_error( self, err ):
try:
#do something that raises an exception
#twisted won't log this as long as you handle it yourself
myvar = 14 / 0
except:
pass

Related

Nested function call in async way python

I have an api that returns response of pagination only 10 records at a time. I want to process 10 record (index=0 and limit=10) then next 10(index=10 and limit=10) and so on till it returns empty array.
I want to do it in async way.
I am using the following deps:
yarl==1.6.0
Mako==1.1.3
asyncio==3.4.3
aiohttp==3.6.2
The code is:
loop = asyncio.get_event_loop()
loop.run_until_complete(getData(id, token,0, 10))
logger.info("processed all data")
async def getData(id, token, index, limit):
try:
async with aiohttp.ClientSession() as session:
response = await fetch_data_from_api(session, id, token, index, limit)
if response == []:
logger.info('Fetched all data')
else:
# process data(response)
getData(session, id, limit, limit+10)
except Exception as ex:
raise Exception(ex)
async def fetch_data_from_api(
session, id, token, index, limit
):
try:
url = f"http://localhost:8080/{id}?index={index}&limit={limit}"
async with session.post(
url=url,
headers={"Authorization": token}
) as response:
response.raise_for_status()
response = await response.json()
return json.loads(json.dumps(response))
except Exception as ex:
raise Exception(
f"Exception {ex} occurred"
)
I issue is that it works fine for first time but when i am calling the method getData(session, id, limit, limit+10) again from async def getData(id, token, index, limit). It is not been called.
How can i resolve the issue?
There are a few issues I see in your code.
First, and this is what you talk about, is the getData method.
It is unclear to me a bit by looking at the code, what is that "second" getData. In the function definition your arguments are getData(id, token, index, limit), but when you call it from within the function you call it with getData(session, id, limit, limit+10) where the id is the second parameter. Is that intentional? This looks to me like there is another getData method, or it's a bug.
In case of the first option: (a) you probably need to show us that code as well, as it's important for us to be able to give you better responses, and (b), more importantly, it will not work. Python doesn't support overloading and the getData you are referencing from within the wrapping getData is the same wrapping method.
In case it's the second option: (a) you might have an issue with the function parameters, and (b) - you are missing an await before the getData (i.e. await getData). This is actually probably also relevant in case it's the "first option".
Other than that, your exception handling is redundant. You basically just re raise the exception, so I don't see any point in having the try-catch blocks. Even more, for some reason in the first method, you create an Exception from the base exception class (not to be confused with BaseException). Just don't have the try block.

Handling exception in calling a external api

I'm calling Udemy external api to build a simple REST service for experimental purpose.
https://www.udemy.com/developers/affiliate/
Here is my get_all() courses method.
class Courses(object):
"""
Handles all requests related to courses.
ie; gets the courses-list, courses-detail, coursesreviews-list
"""
def __init__(self, api):
self.api = api
logger.debug("courses initialized")
def get_all(self):
page = 1
per_page = 20
while True:
res = self._get_courses(page, per_page)
if not res['results']:
break
try:
for one in res['results']:
yield one
except Exception as e: -->>>handling exception
print(e)
break
page += 1
def _get_courses_detail(self, page, per_page):
resource = "courses"
params = {'page': page, 'per_page': per_page,
# 'fields[course]': '#all'
}
res = self.api.get(resource, params)
return res
Now, is it reasonable to handle a exception(in get_all() method) assuming that there could some error in the returning data of the api?
Or handling the exception(in get_all) is not needed here and it should be handled by the calling function?
Most of the open source projects that I see don't handle this exception.
I'm sharing the opinion in this answer. So catch the exception as soon as possible and rethrow it if needed to the next layer.
With practice and experience with your code base it becomes quite easy to judge when to add additional context to errors, and where it's most sensible to actually, finally handle the errors.
Catch → Rethrow
Do this where you can usefully add more information that would save a developer having to work through all the layers to understand the problem.
Catch → Handle
Do this where you can make final decisions on what is an appropriate, but different execution flow through the software.
Catch → Error Return

RESTful api design: handle exceptions through nested functions (python, flask)

I would like to improve my coding style with a more robust grasp of try, except and raise in designing API, and less verbose code.
I have nested functions, and when one catches an execption, I am passing the exception to the other one and so on.
But like this, I could propagate multiple checks of a same error.
I am referring to:
[Using try vs if in python
for considering cost of try operation.
How would you handle an error only once across nested functions ?
E.g.
I have a function f(key) doing some operations on key; result is
passed to other functions g(), h()
if result comply with
expected data structure, g() .. h() will manipulate and return
updated result
a decorator will return final result or return the
first error that was met, that is pointing out in which method it was raised (f(),g() or h()).
I am doing something like this:
def f(key):
try:
#do something
return {'data' : 'data_structure'}
except:
return {'error': 'there is an error'}
#application.route('/')
def api_f(key):
data = f(k)
try:
# do something on data
return jsonify(data)
except:
return jsonify({'error':'error in key'})
IMO try/except is the best way to go for this use case. Whenever you want to handle an exceptional case, put in a try/except. If you can’t (or don’t want to) handle the exception in some sane way, let it bubble up to be handled further up the stack. Of course there are various reasons to take different approaches (e.g. you don’t really care about an error and can return something else without disrupting normal operation; you expect “exceptional” cases to happen more often than not; etc.), but here try/except seems to make the most sense:
In your example, it’d be best to leave the try/except out of f() unless you want to…
raise a different error (be careful with this, as this will reset your stack trace):
try:
### Do some stuff
except:
raise CustomError('Bad things')
do some error handling (e.g. logging; cleanup; etc.):
try:
### Do some stuff
except:
logger.exception('Bad things')
cleanup()
### Re-raise the same error
raise
Otherwise, just let the error bubble up.
Subsequent functions (e.g. g(); h()) would operate the same way. In your case, you’d probably want to have some jsonify helper function that jsonifies when possible but also handles non-json data:
def handle_json(data):
try:
return json.dumps(data)
except TypeError, e:
logger.exception('Could not decode json from %s: %s', data, e)
# Could also re-raise the same error
raise CustomJSONError('Bad things')
Then, you would have handler(s) further up the stack to handle either the original error or the custom error, ending with a global handler that can handle any error. In my Flask application, I created custom error classes that my global handler is able to parse and do something with. Of course, the global handler is configured to handle unexpected errors as well.
For instance, I might have a base class for all http errors…
### Not to be raised directly; raise sub-class instances instead
class BaseHTTPError(Exception):
def __init__(self, message=None, payload=None):
Exception.__init__(self)
if message is not None:
self.message = message
else:
self.message = self.default_message
self.payload = payload
def to_dict(self):
"""
Call this in the the error handler to serialize the
error for the json-encoded http response body.
"""
payload = dict(self.payload or ())
payload['message'] = self.message
payload['code'] = self.code
return payload
…which is extended for various http errors:
class NotFoundError(BaseHTTPError):
code = 404
default_message = 'Resource not found'
class BadRequestError(BaseHTTPError):
code = 400
default_message = 'Bad Request'
class NotFoundError(BaseHTTPError):
code = 500
default_message = 'Internal Server Error'
### Whatever other http errors you want
And my global handler looks like this (I am using flask_restful, so this gets defined as a method on my extended flask_restful.Api class):
class RestAPI(flask_restful.Api):
def handle_error(self, e):
code = getattr(e, 'code', 500)
message = getattr(e, 'message', 'Internal Server Error')
to_dict = getattr(e, 'to_dict', None)
if code == 500:
logger.exception(e)
if to_dict:
data = to_dict()
else:
data = {'code': code, 'message': message}
return self.make_response(data, code)
With flask_restful, you may also just define your error classes and pass them as a dictionary to the flask_restful.Api constructor, but I prefer the flexibility of defining my own handler that can add payload data dynamically. flask_restful automatically passes any unhandled errors to handle_error. As such, this is the only place I’ve needed to convert the error to json data because that is what flask_restful needs in order to return an https status and payload to the client. Notice that even if the error type is unknown (e.g. to_dict not defined), I can return a sane http status and payload to the client without having had to convert errors lower down the stack.
Again, there are reasons to convert errors to some useful return value at other places in your app, but for the above, try/except works well.

Python class methods, when to return self?

I'm confused as to when to return self inside a class and when to return a value which may or may not possibly be used to check the method ran correctly.
def api_request(self, data):
#api web request code
return response.text
def connect(self):
#login to api, set some vars defined in __init__
return self
def send_message(self, message):
#send msg code
return self
So above theres a few examples. api_request I know having the text response is a must. But with send_message what should I return?
which is then converted to a dict to check a key exists, else raise error).
Should it return True, the response->dict, or self?
Thanks in advance
Since errors tend to be delivered as exceptions and hence success/fail return values are rarely useful, a lot of object-modifier functions wind up with no return value at all—or more precisely, return None, since you can't return nothing-at-all. (Consider some of Python's built-in objects, like list, where append and extend return None, and dict, where dict.update returns None.)
Still, returning self is convenient for chaining method calls, even if some Pythonistas don't like it. See kindall's answer in Should internal class methods returnvalues or just modify instance variables in python? for example.
Edit to add some examples based on comment:
What you "should" return—or raise an exception, in which case, "what exception"—depends on the problem. Do you want send_message() to wait for a response, validate that response, and verify that it was good? If so, do you want it to raise an error if there is no response, the validation fails, or the response was valid but says "message rejected"? If so, do you want different errors for each failure, etc? One reasonable (for some value of reasonable) method is to capture all failures with a "base" exception, and make each "type" of failure a derivative of that:
class ZorgError(Exception): # catch-all "can't talk via the Zorg-brand XML API"
pass
class ZorgRemoteDown(ZorgError): # connect or send failed, or no response/timeout
pass
class ZorgNuts(ZorgError): # remote response incomprehensible
pass
class ZorgDenied(ZorgError): # remote says "permission denied"
pass
# add more if needed
Now some of your functions might look something like this (note, none of this is tested):
def connect(self):
"""connect to server, log in"""
... # do some prep work
addr = self._addr
try:
self._sock.connect(addr)
except socket.error as err:
if err.errno == errno.ECONNREFUSED: # server is down
raise ZorgRemoteDown(addr) # translate that to our Zorg error
# add more special translation here if needed
raise # some other problem, propagate it
... # do other stuff now that we're connected, including trying to log in
response = self._get_response()
if response == 'login denied' # or whatever that looks like
raise ZorgDenied() # maybe say what exactly was denied, maybe not
# all went well, return None by not returning anything
def send_message(self, msg):
"""encode the message in the way the remote likes, send it, and wait for
a response from the remote."""
response = self._send_and_wait(self._encode(msg))
if response == 'ok':
return
if response == 'permission denied':
raise ZorgDenied()
# don't understand what we got back, so say the remote is crazy
raise ZorgNuts(response)
Then you need some "internal" functions like these:
def _send_and_wait(self, raw_xml):
"""send raw XML to server"""
try:
self._sock.sendall(raw_xml)
except socket.error as err:
if err.errno in (errno.EHOSTDOWN, errno.ENETDOWN) # add more if needed
raise ZorgRemoteDown(self._addr)
raise
return self._get_response()
def _get_response(self):
"""wait for a response, which is supposedly XML-encoded"""
... some code here ...
if we_got_a_timeout_while_waiting:
raise ZorgRemoteDown(self._addr)
try:
return some_xml_decoding_stuff(raw_xml)
except SomeXMLDecodeError:
raise ZorgNuts(raw_xml) # or something else suitable for debug
You might choose not to translate socket.errors at all, and not have all your own errors; perhaps you can squeeze your errors into ValueError and KeyError and so on, for instance.
These choices are what programming is all about!
Generally, objects in python are mutable. You therefore do not return self, as the modifications you make in a method are reflected in the object itself.
To use your example:
api = API() # initialise the API
if api.connect(): # perhaps return a bool, indicating that the connection succeeded
api.send_message() # you now know that this API instance is connected, and can send messages

Python: Getting the error message of an exception

In python 2.6.6, how can I capture the error message of an exception.
IE:
response_dict = {} # contains info to response under a django view.
try:
plan.save()
response_dict.update({'plan_id': plan.id})
except IntegrityError, e: #contains my own custom exception raising with custom messages.
response_dict.update({'error': e})
return HttpResponse(json.dumps(response_dict), mimetype="application/json")
This doesnt seem to work. I get:
IntegrityError('Conflicts are not allowed.',) is not JSON serializable
Pass it through str() first.
response_dict.update({'error': str(e)})
Also note that certain exception classes may have specific attributes that give the exact error.
Everything about str is correct, yet another answer: an Exception instance has message attribute, and you may want to use it (if your customized IntegrityError doesn't do something special):
except IntegrityError, e: #contains my own custom exception raising with custom messages.
response_dict.update({'error': e.message})
You should use unicode instead of string if you are going to translate your application.
BTW, Im case you're using json because of an Ajax request, I suggest you to send errors back with HttpResponseServerError rather than HttpResponse:
from django.http import HttpResponse, HttpResponseServerError
response_dict = {} # contains info to response under a django view.
try:
plan.save()
response_dict.update({'plan_id': plan.id})
except IntegrityError, e: #contains my own custom exception raising with custom messages.
return HttpResponseServerError(unicode(e))
return HttpResponse(json.dumps(response_dict), mimetype="application/json")
and then manage errors in your Ajax procedure.
If you wish I can post some sample code.
Suppose you raise error like this
raise someError("some error message")
and 'e' is catched error instance
str(e) returns:
[ErrorDetail(string='some error message', code='invalid')]
but if you want "some error message" only
e.detail
will gives you that (actually gives you a list of str which includes "some error message")
This works for me:
def getExceptionMessageFromResponse( oResponse ):
#
'''
exception message is burried in the response object,
here is my struggle to get it out
'''
#
l = oResponse.__dict__['context']
#
oLast = l[-1]
#
dLast = oLast.dicts[-1]
#
return dLast.get( 'exception' )

Categories