Because the Twisted getPage function doesn't give me access to headers, I had to write my own getPageWithHeaders function.
def getPageWithHeaders(contextFactory=None, *args, **kwargs):
try:
return _makeGetterFactory(url, HTTPClientFactory,
contextFactory=contextFactory,
*args, **kwargs)
except:
traceback.print_exc()
This is exactly the same as the normal getPage function, except that I added the try/except block and return the factory object instead of returning the factory.deferred
For some reason, I sometimes get a maximum recursion depth exceeded error here. It happens consistently a few times out of 700, usually on different sites each time. Can anyone shed any light on this? I'm not clear why or how this could be happening, and the Twisted codebase is large enough that I don't even know where to look.
EDIT: Here's the traceback I get, which seems bizarrely incomplete:
Traceback (most recent call last):
File "C:\keep-alive\utility\background.py", line 70, in getPageWithHeaders
factory = _makeGetterFactory(url, HTTPClientFactory, timeout=60 , contextFactory=context, *args, **kwargs)
File "c:\Python26\lib\site-packages\twisted\web\client.py", line 449, in _makeGetterFactory
factory = factoryFactory(url, *args, **kwargs)
File "c:\Python26\lib\site-packages\twisted\web\client.py", line 248, in __init__
self.headers = InsensitiveDict(headers)
RuntimeError: maximum recursion depth exceeded
This is the entire traceback, which clearly isn't long enough to have exceeded our max recursion depth. Is there something else I need to do in order to get the full stack? I've never had this problem before; typically when I do something like
def f(): return f()
try: f()
except: traceback.print_exc()
then I get the kind of "maximum recursion depth exceeded" stack that you'd expect, with a ton of references to f()
The specific traceback that you're looking at is a bit mystifying. You could try traceback.print_stack rather than traceback.print_exc to get a look at the entire stack above the problematic code, rather than just the stack going back to where the exception is caught.
Without seeing more of your traceback I can't be certain, but you may be running into the problem where Deferreds will raise a recursion limit exception if you chain too many of them together.
If you turn on Deferred debugging (from twisted.internet.defer import setDebugging; setDebugging(True)) you may get more useful tracebacks in some cases, but please be aware that this may also slow down your server quite a bit.
You should look at the traceback you're getting together with the exception -- that will tell you what function(s) is/are recursing too deeply, "below" _makeGetterFactory. Most likely you'll find that your own getPageWithHeaders is involved in the recursion, exactly because instead of properly returning a deferred it tries to return a factory that's not ready yet. What happens if you do go back to returning the deferred?
The URL opener is likely following an un-ending series of 301 or 302 redirects.
Related
How do you best handle multiple levels of methods in a call hierarchy that raise exceptions, so that if it is a fatal error the program will exit (after displaying an error dialog)?
I'm basically coming from Java. There I would simply declare any methods as throws Exception, re-throw it and catch it somewhere at the top level.
However, Python is different. My Python code basically looks like the below.
EDIT: added much simpler code...
Main entry function (plugin.py):
def main(catalog):
print "Executing main(catalog)... "
# instantiate generator
gen = JpaAnnotatedClassGenerator(options)
# run generator
try:
gen.generate_bar() # doesn't bubble up
except ValueError as error:
Utilities.show_error("Error", error.message, "OK", "", "")
return
... usually do the real work here if no error
JpaAnnotatedClassGenerator class (engine.py):
class JpaAnnotatedClassGenerator:
def generate_bar(self):
self.generate_value_error()
def generate_value_error(self):
raise ValueError("generate_value_error() raised an error!")
I'd like to return to the caller with an exception that is to be thrown back to that ones call until it reaches the outermost try-except to display an error dialog with the exception's message.
QUESTION:
How is this best done in Python? Do I really have to repeat try-except for every method being called?
BTW: I am using Python 2.6.x and I cannot upgrade due to being bound to MySQL Workbench that provides the interpreter (Python 3 is on their upgrade list).
If you don't catch an exception, it bubbles up the call stack until someone does. If no one catches it, the runtime will get it and die with the exception error message and a full traceback. IOW, you don't have to explicitely catch and reraise your exception everywhere - which would actually defeat the whole point of having exceptions. Actually, despite being primarily used for errors / unexpected conditions, exceptions are first and foremost a control flow tool allowing to break out of the normal execution flow and pass control (and some informations) to any arbitrary place up in the call stack.
From this POV your code seems mostlt correct (caveat: I didn't bother reading the whole thing, just had a quick look), except (no pun indented) for a couple points:
First, you should define your own specific exception class(es) instead of using the builtin ValueError (you can inherit from it if it makes sense to you) so you're sure you only catch the exact exceptions you expect (quite a few layers "under" your own code could raise a ValueError that you didn't expect).
Then, you may (or not, depending on how your code is used) also want to add a catch-all top-level handler in your main() function so you can properly log (using the logger module) all errors and eventually free resources, do some cleanup etc before your process dies.
As a side note, you may also want to learn and use proper string formatting, and - if perfs are an issue at least -, avoid duplicate constant calls like this:
elif AnnotationUtil.is_embeddable_table(table) and AnnotationUtil.is_secondary_table(table):
# ...
elif AnnotationUtil.is_embeddable_table(table):
# ...
elif AnnotationUtil.is_secondary_table(table):
# ...
Given Python's very dynamic nature, neither the compiler nor runtime can safely optimize those repeated calls (the method could have been dynamically redefined between calls), so you have to do it yourself.
EDIT:
When trying to catch the error in the main() function, exceptions DON'T bubble up, but when I use this pattern one level deeper, bubbling-up seems to work.
You can easily check that it works correctly with a simple MCVE:
def deeply_nested():
raise ValueError("foo")
def nested():
return deeply_nested()
def firstline():
return nested()
def main():
try:
firstline()
except ValueError as e:
print("got {}".format(e))
else:
print("you will not see me")
if __name__ == "__main__":
main()
It appears the software that supplies the Python env is somehow treating the main plugin file in a wrong way. Looks I will have to check the MySQL Workbench guys
Uhu... Even embedded, the mechanism expection should still work as expected - at least for the part of the call stack that depends on your main function (can't tell what happens upper in the call stack). But given how MySQL treats errors (what about having your data silently truncated ?), I wouldn't be specially suprised if they hacked the runtime to silently pass any error in plugins code xD
It is fine for errors to bubble up
Python's exceptions are unchecked, meaning you have no obligation to declare or handle them. Even if you know that something may raise, only catch the error if you intend to do something with it. It is fine to have exception-transparent layers, which gracefully abort as an exception bubbles through them:
def logged_get(map: dict, key: str):
result = map[key] # this may raise, but there is no state to corrupt
# the following is not meaningful if an exception occurred
# it is fine for it to be skipped by the exception bubbling up
print(map, '[%s]' % key, '=>', result)
return result
In this case, logged_get will simply forward any KeyError (and others) that are raised by the lookup.
If an outer caller knows how to handle the error, it can do so.
So, just call self.create_collection_embeddable_class_stub the way you do.
It is fine for errors to kill the application
Even if nothing handles an error, the interpreter does. You get a stack trace, showing what went wrong and where. Fatal errors of the kind "only happens if there is a bug" can "safely" bubble up to show what went wrong.
In fact, exiting the interpreter and assertions use this mechanism as well.
>>> assert 2 < 1, "This should never happen"
Traceback (most recent call last):
File "<string>", line 1, in <module>
AssertionError: This should never happen
For many services, you can use this even in deployment - for example, systemd would log that for a Linux system service. Only try to suppress errors for the outside if security is a concern, or if users cannot handle the error.
It is fine to use precise errors
Since exceptions are unchecked, you can use arbitrary many without overstraining your API. This allows to use custom errors that signal different levels of problems:
class DBProblem(Exception):
"""Something is wrong about our DB..."""
class DBEntryInconsistent(DBProblem):
"""A single entry is broken"""
class DBInconsistent(DBProblem):
"""The entire DB is foobar!"""
It is generally a good idea not to re-use builtin errors, unless your use-case actually matches their meaning. This allows to handle errors precisely if needed:
try:
gen.generate_classes(catalog)
except DBEntryInconsistent:
logger.error("aborting due to corrupted entry")
sys.exit(1)
except DBInconsistent as err:
logger.error("aborting due to corrupted DB")
Utility.inform_db_support(err)
sys.exit(1)
# do not handle ValueError, KeyError, MemoryError, ...
# they will show up as a stack trace
I'm trying to make a parser use beautifulSoup and multiprocessing. I have an error:
RecursionError: maximum recursion depth exceeded
My code is:
import bs4, requests, time
from multiprocessing.pool import Pool
html = requests.get('https://www.avito.ru/moskva/avtomobili/bmw/x6?sgtd=5&radius=0')
soup = bs4.BeautifulSoup(html.text, "html.parser")
divList = soup.find_all("div", {'class': 'item_table-header'})
def new_check():
with Pool() as pool:
pool.map(get_info, divList)
def get_info(each):
pass
if __name__ == '__main__':
new_check()
Why I get this error and how I can fix it?
UPDATE:
All text of error is
Traceback (most recent call last):
File "C:/Users/eugen/PycharmProjects/avito/main.py", line 73, in <module> new_check()
File "C:/Users/eugen/PycharmProjects/avito/main.py", line 67, in new_check
pool.map(get_info, divList)
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 644, in get
raise self._value
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\pool.py", line 424, in _handle_tasks
put(task)
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "C:\Users\eugen\AppData\Local\Programs\Python\Python36\lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
RecursionError: maximum recursion depth exceeded
When you use multiprocessing, everything you pass to a worker has to be pickled.
Unfortunately, many BeautifulSoup trees can't be pickled.
There are a few different reasons for this. Some of them are bugs that have since been fixed, so you could try making sure you have the latest bs4 version, and some are specific to different parsers or tree builders… but there's a good chance nothing like this will help.
But the fundamental problem is that many elements in the tree contain references to the rest of the tree.
Occasionally, this leads to an actual infinite loop, because the circular references are too indirect for its circular reference detection. But that's usually a bug that gets fixed.
But, even more importantly, even when the loop isn't infinite, it can still drag in more than 1000 elements from all over the rest of the tree, and that's already enough to cause a RecursionError.
And I think the latter is what's happening here. If I take your code and try to pickle divList[0], it fails. (If I bump the recursion limit way up and count the frames, it needs a depth of 23080, which is way, way past the default of 1000.) But if I take that exact same div and parse it separately, it succeeds with no problem.
So, one possibility is to just do sys.setrecursionlimit(25000). That will solve the problem for this exact page, but a slightly different page might need even more than that. (Plus, it's usually not a great idea to set the recursion limit that high—not so much because of the wasted memory, but because it means actual infinite recursion takes 25x as long, and 25x as much wasted resources, to detect.)
Another trick is to write code that "prunes the tree", eliminating any upward links from the div before/as you pickle it. This is a great solution, except that it might be a lot of work, and requires diving into the internals of how BeautifulSoup works, which I doubt you want to do.
The easiest workaround is a bit clunky, but… you can convert the soup to a string, pass that to the child, and have the child re-parse it:
def new_check():
divTexts = [str(div) for div in divList]
with Pool() as pool:
pool.map(get_info, divTexts)
def get_info(each):
div = BeautifulSoup(each, 'html.parser')
if __name__ == '__main__':
new_check()
The performance cost for doing this is probably not going to matter; the bigger worry is that if you had imperfect HTML, converting to a string and re-parsing it might not be a perfect round trip. So, I'd suggest that you do some tests without multiprocessing first to make sure this doesn't affect the results.
Currently trying to construct a program with multiple windows (Main screen -> A Landlord / Customer section -> calendar/calculator ect)
I am very much a beginner at this moment in time, i keep coming across two errors:
Exception in Tkinter callback Traceback (most recent call last):
File
"C:\Users\HP\AppData\Local\Programs\Python\Python35-32\lib\idlelib\run.py",
line 119, in main
seq, request = rpc.request_queue.get(block=True, timeout=0.05) File
"C:\Users\HP\AppData\Local\Programs\Python\Python35-32\lib\queue.py",
line 172, in get
raise Empty queue.Empty
Also another query; a error i receive a lot is how to define "Self" ("self is not defined")
EDIT
My code is very much dis-functional - i think looking at my code will probably give you a heart attack. When running the code i want there to be 1 screen at one time, currently 3 come up at the start, im assuming this is too me using wrong inheritance or something
It was too big to place in here so you can easily view the code here
http://textuploader.com/525p5
To be honest any help will really be appreciated. My first time doing something complex on python such as a working program with features such as a calendar, calculator ect
Cheers
Ross
File "C:\Users\HP\AppData\Local\Programs\Python\Python35-32\lib\idlelib\run.py",
line 119, in main seq, request = rpc.request_queue.get(block=True, timeout=0.05)
File "C:\Users\HP\AppData\Local\Programs\Python\Python35-32\lib\queue.py", line 172,
in get raise Empty queue.Empty
these two lines
request = rpc.request_queue.get(block=True, timeout=0.05)
raise Empty queue.Empty
Suggest you are trying to get data from an empty queue, and an exception is raised. The proper way to deal with this is to put this code in a try..except block, catch the exception and deal with it accordingly.
Here is a decent tutorial on this matter.
try:
....
request = rpc.request_queue.get(block=True, timeout=0.05)
....
except Exception,e:
# this will cath all exceptions derived from the exception class
About your second query, I strongly advise you to post a new question instead of bundling them up.
But I'll try and give some advice: self is used to address an instance of a class, it's the current object in use. You can't 'define' self, you use it in a class implementation to tell python that your method is to be used with a specific instance, not the global scope.
class demo:
def __init__(self):
self.a = 5
def foo(self):
self.a = 6
def global_foo():
print 'global_foo'
When using multiprocessing.Pool's apply_async(), what happens to breaks in code? This includes, I think, just exceptions, but there may be other things that make the worker functions fail.
import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
for f in files:
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct)
As I understand it right now, the process/worker fails (all other processes continue) and anything past a thrown error is not executed, EVEN if I catch the error with try/except.
As an example, usually I'd except Errors and put in a default value and/or print out an error message, and the code continues. If my callback function involves writing to file, that's done with default values.
This answerer wrote a little about it:
I suspect the reason you're not seeing anything happen with your example code is because all of your worker function calls are failing. If a worker function fails, callback will never be executed. The failure won't be reported at all unless you try to fetch the result from the AsyncResult object returned by the call to apply_async. However, since you're not saving any of those objects, you'll never know the failures occurred. If I were you, I'd try using pool.apply while you're testing so that you see errors as soon as they occur.
If you're using Python 3.2+, you can use the error_callback keyword argument to to handle exceptions raised in workers.
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct, error_callback=handle_error)
handle_error will be called with the exception object as an argument.
If you're not, you have to wrap all your worker functions in a try/except to ensure your callback is executed. (I think you got the impression that this wouldn't work from my answer in that other question, but that's not the case. Sorry!):
def workerfunct(*args):
try:
# Stuff
except Exception as e:
# Do something here, maybe return e?
pool.apply_async(workerfunct, args=(*args), callback=callbackfunct)
You could also use a wrapper function if you can't/don't want to change the function you actually want to call:
def wrapper(func, *args):
try:
return func(*args)
except Exception as e:
return e
pool.apply_async(wrapper, args=(workerfunct, *args), callback=callbackfunct)
I am new to twisted and having trouble with the following script.
When I run the following:
#!/usr/bin/env python
from twisted.internet import defer
from twisted.web.client import getPage, reactor
def success(results):
print 'success'
def error(results):
print 'error'
return results
def finished(results):
print 'finished'
reactor.stop()
tasks = []
d = getPage('thiswontwork').addCallback(success).addErrback(error)
tasks.append(d)
dl = defer.DeferredList(tasks)
dl.addCallback(finished)
reactor.run()
I get the following output:
error
finished
Unhandled error in Deferred:
Unhandled Error
Traceback (most recent call last):
Failure: twisted.internet.error.ConnectionRefusedError: Connection was refused by other side: 61: Connection refused.
My question is why am I getting an unhandled error when I seem to have caught the error with my error callback?
The problem is that in your error def you return result which, given it was called by an error back is a Failure object, and returning a Failure object is one of the two criteria for re-raising the error state. See the following blurb from krondo's twisted intro - part 9:
Now, in synchronous code we can “re-raise” an exception using the raise keyword without any arguments. Doing so raises the original exception we were handling and allows us to take some action on an error without completely handling it. It turns out we can do the same thing in an errback. A deferred will consider a callback/errback to have failed if:
The callback/errback raises any kind of exception, or
The callback/errback returns a Failure object.
Since an errback’s first argument is always a Failure, an errback can “re-raise” the exception by returning its first argument, after performing whatever action it wants to take.
Yup, just tried it, if you change:
def error(results):
print 'error'
return results
to
def error(results):
print 'error'
return
You won't re-raise the error state, so it won't percolate back to the reactor, and won't cause the traceback thats annoying you.
P.S. I can't recommend krondo's twisted introduction enough! It may be really long but if you can get through it, you really will be able to produce code in twisted and these sort of behaviors won't be a mystery.
P.P.S I see you have previous SO question (Python DeferredList callback reporting success when deferreds raise error) about deferred's that might be the reason you built the code this way. I think you may have a fundamental misunderstanding about the return-value/callback-value of def's involved in deferreds (particularly errbacks). Check out part 9 (though you might have to backup part 7 or even earlier to track it) of krondo's, it really should help clear things up.