Passing exception stack traces from C++ to Python (via Cython) - python

Problem
I have a logging and infra system which I can't (and don't want to modify), in Python, that relies on the Tracebacks of the exceptions.
I have C++ code wrapped with Cython. This C++ code can potentially raise exceptions (std::runtime_errors). Cython gives the ability to translate exception (e.g. with except +). However, and that makes sense of course, the translation only translates the exception type and message.
In my case, I would want to translate the stack trace in the C++ exception (which I can recover to a string), into a Python's Traceback. Obviously, the translation is not fully meaningful, but it allows me to "cheat" and continue (or replace) the stack trace in Python with something that represents the stack trace in C++.
Assuming I can parse the C++ exception's stack trace into meaningful values, I wonder what's the best way to achieve that. I have one idea listed below, and I will be happy to get your opinions on it or share alternatives.
I'll add that this whole thing feels very flaky, but I'm both curious about it and want to avoid doing stuff like putting the C++ stack trace in the Python's exception message.
My Idea
So after looking a bit, I found that CPython has a method that is called _PyTraceback_Add. Specifically it's signature is void _PyTraceback_Add(const char *funcname, const char *filename, int lineno) and looks like it adds the values (funcname, filename and lineno) into the current trace. Link to implementation: https://github.com/python/cpython/blob/66f77caca39ba39ebe1e4a95dba6d19b20d51951/Python/traceback.c#L257
My thoughts were to use that right when I raise the Python exception in Cython. Then, I could use the exception string from C++ and insert the values into the existing trace.
However, this method seems to be private in CPython and I'm not sure it will be sound to rely on it in production code.
Are there any good alternatives to this solution?
Thanks!

Well the function seems to be using only public API's, so can just do the same steps instead.
With that said, it looks unlikely to change anytime soon.
Also, it seems somebody's already doing exactly what you're doing: see issue #24743.

Related

In what circumstances, if any, is it appropriate to raise a FileExistsError manually?

I'm writing a Python function which takes data from an online source and copies it into a local data dump. If there's a file present already on the intended path for said data dump, I'd like my program to stop abruptly, giving a short message to the user - I'm just building a tiny CLI-type program - explaining why what he or she was about to attempt could destroy valuable data.
Would it be appropriate to raise a FileExists error in the above circumstances? If so, I imagine my code would look something like this:
def make_data_dump():
if os.path.exists("path/to/dump"):
raise FileExistsError("Must not overwrite dump at path/to/dump.")
data = get_data_from_url()
write_to_path(data, "path/to/dump")
Apologies if this is a silly question, but I couldn't find any guidance on when to raise a FileExistsError manually, only on what to do if one's program raises such an exception unexpectedly - hence my asking if raising said exception manually is ever good practice.
The Python documentation explicitly states that this is allowed:
User code can raise built-in exceptions. This can be used to test an exception handler or to report an error condition “just like” the situation in which the interpreter raises the same exception; but beware that there is nothing to prevent user code from raising an inappropriate error.
However, your code example is wrong for a different reason. The problem with this code is that it's using the LBYL (Look Before You Leap) pattern, which might read to race conditions. (In the time between checking if the file exists and writing the file, another process could have created the file, which now would be overwritten). A better pattern for these type of scenarios is the EAFP (Easier to Ask for Forgiveness than Permission) pattern.
See What is the EAFP principle in Python? for more information and examples.
Having said that, I think that for most Python code, manually raising a FileExistsError isn't that common, since you would use the standard Python libraries that already throw this error when necessary. However, one valid reason I can think of is when you would write a wrapper for a low-level function (implemented in another language like C) and would want to translate the error to Python.
A hypothetical code example to demonstrate this:
def make_data_dump():
data = get_data_from_url()
# Assuming write_to_path() is a function in a C library, which returns an error code.
error = write_to_path(data, "path/to/dump")
if error == EEXIST:
raise FileExistsError("Must not overwrite dump at path/to/dump.")
Note that some other built-in exceptions are much more common to raise manually, for example a StopIteration, when implementing an iterator, or a ValueError, for example when your method gets an argument with the wrong value (example).
Some exceptions you will rarely use yourself, like a SyntaxError for example.
As per NobbyNobbs's comment above: if the programmer raises standard exception in his code, it's difficult to work out, during error handling, if a given exception was raised on the application or the system level. Therefore, it's a practice best avoided.

Is there a way to get the entire C stack trace of a Python code block from Python?

I was trying to figure out the exact control flow and function call to better guide me while writing in cpython for a fairly large and complicated codebase. It feels like this should be easily doable using pdb() but I can't seem to figure it out. Using bp.set_trace() only reveals the Python file called during the execution. Is this the write way of going about it? Since a fair bit of codegen and dynamic dispatch of function calls is used I can't precisely find the C++ method definitions of the functions called from the Python code.
It seems like this should be straightforward but most SO threads don't focus precisely on this, just the code flow sequence
I was wondering if pdb.pm() would give me what I need but it doesn't exactly work unless an exception has occurred.

Why doesn't Python spot errors before execution?

Suppose I have the following code in Python:
a = "WelcomeToTheMachine"
if a == "DarkSideOfTheMoon":
awersdfvsdvdcvd
print "done!"
Why doesn't this error? How does it even compile? In Java or C#, this would get spotted during compilation.
Python isn't a compiled language, that's why your code doesn't throw compilation errors.
Python is a byte code interpreted language. Technically the source code gets "compiled" to byte code, but then the byte code is just in time (JIT) compiled if using PyPy or Pyston otherwise it's line by line interpreted.
The workflow is as follows :
Your Python Code -> Compiler -> .pyc file -> Interpreter -> Your Output
Using the standard python runtime What does all this mean? Essentially all the heavy work happens during runtime, unlike with C or C++ where the source code in it's entirety is analyzed and translated to binary at compile time.
During "compiling", python pretty much only checks your syntax. Since awersdfvsdvdcvd is a valid identifier, no error is raised until that line actually gets executed. Just because you use a name which wasn't defined doesn't mean that it couldn't have been defined elsewhere... e.g.:
globals()['awersdfvsdvdcvd'] = 1
earlier in the file would be enough to suppress the NameError that would occur if the line with the misspelled name was executed.
Ok, so can't python just look for globals statements as well? The answer to that is again "no" -- From module "foo", I can add to the globals of module "bar" in similar ways. And python has no way of knowing what modules are or will be imported until it's actually running (I can dynamically import modules at runtime too).
Note that most of the reasons that I'm mentioning for why Python as a language can't give you a warning about these things involve people doing crazy messed up things. There are a number of tools which will warn you about these things (making the assumption that you aren't going to do stupid stuff like that). My favorite is pylint, but just about any python linter should be able to warn you about undefined variables. If you hook a linter up to your editor, most of the time you can catch these bugs before you ever actually run the code.
Because Python is an interpreted language. This means that if Python's interpreter doesn't arrive to that line, it won't produce any error.
There's nothing to spot: It's not an "error" as far as Python-the-language is concerned. You have a perfectly valid Python program. Python is a dynamic language, and the identifiers you're using get resolved at runtime.
An equivalent program written in C#, Java or C++ would be invalid, and thus the compilation would fail, since in all those languages the use of an undefined identifier is required to produce a diagnostic to the user (i.e. a compile-time error). In Python, it's simply not known whether that identifier is known or not at compile time. So the code is valid. Think of it this way: in Python, having the address of a construction site (a name) doesn't require the construction to have even started yet. What's important is that by the time you use the address (name) as if there was a building there, there better be a building or else an exception is raised :)
In Python, the following happens:
a = "WelcomeToTheMachine" looks up the enclosing context (here: the module context) for the attribute a, and sets the attribute 'a' to the given string object stored in a pool of constants. It also caches the attribute reference so the subsequent accesses to a will be quicker.
if a == "DarkSideOfTheMoon": finds the a in the cache, and executes a binary comparison operator on object a. This ends up in builtins.str.__eq__. The value returned from this operator is used to control the program flow.
awersdfvsdvdcvd is an expression, whose value is the result of a lookup of the name 'awersdfvsdvdcvd'. This expression is evaluted. In your case, the name is not found in the enclosing contexts, and the lookup raises the NameError exception.
This exception propagates to the matching exception handler. Since the handler is outside of all the nested code blocks in the current module, the print function never gets a chance of being called. The Python's built-in exception handler signals the error to the user. The interpreter (a misnomer!) instance has nothing more to do. Since the Python process doesn't try to do anything else after the interpreter instance is done, it terminates.
There's absolutely nothing that says that the program will cause a runtime error. For example, awersdfvsdvdcvd could be set in an enclosing scope before the module is executed, and then no runtime error would be raised. Python allows fine control over the lifetime of a module, and your code could inject the value for awersdfvsdvdcvd after the module has been compiled, but before it got executed. It takes just a few lines of fairly straightforward code to do that.
This is, in fact, one of the many dynamic programming techniques that get used in Python programs. Their judicious use makes possible the kinds of functionality that C++ will not natively get in decades or ever, and that are very cumbersome in both C# and Java. Of course, Python has a performance cost - nothing is free.
If you like to get such problems highlighted at compilation time, there are tools you can easily integrate in an IDE that would spot this problem. E.g. PyCharm has a built-in static checker, and this error would be highlighted with the red squiggly line as expected.

Is exception handling always expensive?

I've been told time and again that exception handling for operations like determining type is bad form since exceptions are always computationally expensive. Nevertheless, I've seen posts (especially Python-related ones, such as the to reply of this one) that advise using exception handling for exactly that purpose.
I was wondering, then, if throwing and catching exceptions is to be avoided universally, because it is always computationally expensive, or whether some languages, such as Python, handle exceptions better and it is permissible to use exception handling more liberally.
You cannot give general advice such as "exceptions are expensive and therefore they should be avoided" for all programming languages.
As you suspected, in Python, Exceptions are used more liberally than in other languages such as C++. Instead of raw performance, Python puts emphasis on code readability. There is an idiom "It's easier to ask for forgiveness than for permission", meaning: It's easier to just attempt what you want to achieve and catch an exception than check for compatibility first.
Forgiveness:
try:
do_something_with(dict["key"])
except (KeyError, TypeError):
# Oh well, there is no "key" in dict, or it has the wrong type
Permission:
if hasattr(dict, "__getitem__") and "key" in dict:
do_something_with(dict["key"])
else:
# Oh well
Actually, in Python, iteration with for loops is implemented with exceptions under the hood: The iterable raises a StopIteration exception when the end is reached. So even if you try to avoid exceptions, you will use them anyway all the time.
I think a lot of it comes down to specific use cases.
In the example you posted, the poster explicitly refers to the "duck-typing" aspect of Python. Essentially, you use the exceptions generated to determine if a variable has a particular capability or set of capabilities instead of manually checking (since Python allows a lot of dynamic operations, a class might access "split" through __getattr__, which makes it impossible to check using a standard if statement, so you try to use split, then if it can't do it, we go to plan B).
In a lot of Python applications, also, we tend not to worry a lot about some of the performance details that might matter in other applications, so any overhead from exceptions is "trivial."
In coding my module tco, I encountered this question. In the version 1.0.1alpha, I emplemented three versions of the same class. The module is intended for computational purpose; thus I think I may give some answer to your question.
Computing quick operations by embedding them in the class working without exceptions was twice as quick as with the two classes working with exception. But you have to know that such a test may be meaningless if you think that computing interesting things between the exceptions will make the difference become very tiny. Nobody will seriously care about the difference of time between an empty loop and an empty system raising and catching exceptions!
For that reason, I decided to remove the first system when releasing the 1.1 version of my module. Though a little slower, I found that the system relying on exceptions was more robust and I focused on it.

How to handle "duck typing" in Python?

I usually want to keep my code as generic as possible. I'm currently writing a simple library and being able to use different types with my library feels extra important this time.
One way to go is to force people to subclass an "interface" class. To me, this feels more like Java than Python and using issubclass in each method doesn't sound very tempting either.
My preferred way is to use the object in good faith, but this will raise some AttributeErrors. I could wrap each dangerous call in a try/except block. This, too, seems kind of cumbersome:
def foo(obj):
...
# it should be able to sleep
try:
obj.sleep()
except AttributeError:
# handle error
...
# it should be able to wag it's tail
try:
obj.wag_tail()
except AttributeError:
# handle this error as well
Should I just skip the error handling and expect people to only use objects with the required methods? If I do something stupid like [x**2 for x in 1234] I actually get a TypeError and not a AttributeError (ints are not iterable) so there must be some type checking going on somewhere -- what if I want to do the same?
This question will be kind of open ended, but what is the best way to handle the above problem in a clean way? Are there any established best practices? How is the iterable "type checking" above, for example, implemented?
Edit
While AttributeErrors are fine, the TypeErrors raised by native functions usually give more information about how to solve the errors. Take this for example:
>>> ['a', 1].sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
I'd like my library to be as helpful as possible.
I'm not a python pro but I believe that unless you can try an alternative for when the parameter doesn't implement a given method, you shoudn't prevent exceptions from being thrown. Let the caller handle these exceptions. This way, you would be hidding problems from the developers.
As I have read in Clean Code, if you want to search for an item in a collection, don't test your parameters with ìssubclass (of a list) but prefer to call getattr(l, "__contains__"). This will give someone who is using your code a chance to pass a parameter that isn't a list but which has a __contains__ method defined and things should work equally well.
So, I think that you should code in an abstract, generic way, imposing as few restrictions as you can. For that, you'll have to make the fewest assumptions possible. However, when you face something that you can't handle, raise an exception and let the programmer know what mistake he made!
If your code requires a particular interface, and the user passes an object without that interface, then nine times out of ten, it's inappropriate to catch the exception. Most of the time, an AttributeError is not only reasonable but expected when it comes to interface mismatches.
Occasionally, it may be appropriate to catch an AttributeError for one of two reasons. Either you want some aspect of the interface to be optional, or you want to throw a more specific exception, perhaps a package-specific exception subclass. Certainly you should never prevent an exception from being thrown if you haven't honestly handled the error and any aftermath.
So it seems to me that the answer to this question must be problem- and domain- specific. It's fundamentally a question of whether using a Cow object instead of a Duck object ought to work. If so, and you handle any necessary interface fudging, then that's fine. On the other hand, there's no reason to explicitly check whether someone has passed you a Frog object, unless that will cause a disastrous failure (i.e. something much worse than a stack trace).
That said, it's always a good idea to document your interface -- that's what docstrings (among other things) are for. When you think about it, it's much more efficient to throw a general error for most cases and tell users the right way to do it in the docstring, than to try to foresee every possible error a user might make and create a custom error message.
A final caveat -- it's possible that you're thinking about UI here -- I think that's another story. It's good to check the input that an end user gives you to make sure it isn't malicious or horribly malformed, and provide useful feedback instead of a stack trace. But for libraries or things like that, you really have to trust the programmer using your code to use it intelligently and respectfully, and to understand the errors that Python generates.
If you just want the unimplemented methods to do nothing, you can try something like this, rather than the multi-line try/except construction:
getattr(obj, "sleep", lambda: None)()
However, this isn't necessarily obvious as a function call, so maybe:
hasattr(obj, "sleep") and obj.sleep()
or if you want to be a little more sure before calling something that it can in fact be called:
hasattr(obj, "sleep") and callable(obj.sleep) and obj.sleep()
This "look-before-you-leap" pattern is generally not the preferred way to do it in Python, but it is perfectly readable and fits on a single line.
Another option of course is to abstract the try/except into a separate function.
Good question, and quite open-ended. I believe typical Python style is not to check, either with isinstance or catching individual exceptions. Cerainly, using isinstance is quite bad style, as it defeats the whole point of duck typing (though using isinstance on primitives can be OK -- be sure to check for both int and long for integer inputs, and check for basestring for strings (base class of str and unicode). If you do check, you hould raise a TypeError.
Not checking is generally OK, as it typically raises either a TypeError or AttributeError anyway, which is what you want. (Though it can delay those errors making client code hard to debug).
The reason you see TypeErrors is that primitive code raises it, effectively because it does an isinstance. The for loop is hard-coded to raise a TypeError if something is not iterable.
First of all, the code in your question is not ideal:
try:
obj.wag_tail()
except AttributeError:
...
You don't know whether the AttributeError is from the lookup of wag_tail or whether it happened inside the function. What you are trying to do is:
try:
f = getattr(obj, 'wag_tail')
except AttributeError:
...
finally:
f()
Edit: kindall rightly points out that if you are going to check this, you should also check that f is callable.
In general, this is not Pythonic. Just call and let the exception filter down, and the stack trace is informative enough to fix the problem. I think you should ask yourself whether your rethrown exceptions are useful enough to justify all of this error-checking code.
The case of sorting a list is a great example.
List sorting is very common,
passing unorderable types happens for a significant proportion of those, and
throwing AttributeError in that case is very confusing.
If those three criteria apply to your problem (especially the third), I agree with building pretty exception rethrower.
You have to balance with the fact that throwing these pretty errors is going to make your code harder to read, which statistically means more bugs in your code. It's a question of balancing the pros and the cons.
If you ever need to check for behaviours (like __real__ and __contains__), don't forget to use the Python abstract base classes found in collections, io, and numbers.

Categories