Python 3: Determine if object supports IO - python

Some Python methods work on various input sources. For example, the XML element tree parse method takes an object which can either be a string, (in which case the API treats it like a filename), or an object that supports the IO interface, like a file object or io.StringIO.
So, obviously the parse method is doing some kind of interface sniffing to figure out which course of action to take. I guess the simplest way to achieve this would be to check if the input parameter is a string by saying isinstance(x, str), and if so treat it as a file name, else treat it as an IO object.
But for better error-checking, I would think it would be best to check if x supports the IO interface. What is the standard, idiomatic way to check if an object supports a specified interface?
One way, I suppose, would be to just say:
if "read" in x.__class__.__dict__: # check if object has a read method
But just because x has a "read" method doesn't necessarily mean it supports the IO interface, so I assume I should also check for every method in the IO interface. Is this usually the best way to go about doing this? Or should I just forget about checking the interface, and just let a possible AttributeError get handled further up the stack?

Python strongly encourages duck typing: Just assume the object that was passed in is valid and try to use it. This way, your code is as flexible as possible. Of course, if the actions of your code depend on the type of the object that is passed in, you do need some kind of type checking. I suggest to keep this type checking to a minimum though, and go for isinstance(x, str).
If you pass in an object that neither is a string nor supports an IO interface, this will result in an AttributeError. If this happens, this is a bug in the calling code. This exception shouldn't be handled anywhere -- instead the bug should be fixed!
That said, you could use
isinstance(x, io.IOBase)
to test for the built-in classes supporting the I/O protocol. This would restrict your code to classes that actually derive from io.IOBase though -- a superficial and unnecessary restriction.

Or should I just forget about checking the interface, and just let a possible AttributeError get handled further up the stack?
The general pythonic principle seems to be doing whatever you want to do with the object you get and just capture any exception it might cause. This is the so-called duck typing. It does not necessarily mean you should let those exception slip from your function to the calling code, though. You can handle them in the function itself if it's capable of doing so in meaningful way.

Yeah, python is all about duck typing, and it's perfectly acceptable to check for a few methods to decide whether an object supports the IO interface. Sometimes it even makes sense to just try calling your methods in a try/except block and catch TypeError or ValueError so you know if it really supports the same interface (but use this sparingly). I'd say use hasattr instead of looking at __class__.__dict__, but otherwise that's the approach I would take.
(In general, I'd check first if there wasn't already a method somewhere in the standard library to handle stuff like this, since it can be error-prone to decide what constitutes the "IO interface" yourself. For example, there a few handy gems in the types and inspect modules for related interface checking.)

Related

Is pickle secure against malicious input after restricting to safe types? [duplicate]

The pickle module documentation says right at the beginning:
Warning:
The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an
untrusted or unauthenticated source.
However, further down under restricting globals it seems to describe a way to make unpickling data safe using a whitelist of allowed objects.
Does this mean that I can safely unpickle untrusted data if I use a RestrictedUnpickler that allows only some "elementary" types, or are there additional security issues that are not addressed by this method? If there are, is there another way to make unpickling safe (obviously at the cost of not being able to unpickle every stream)?
With "elementary types" I mean precisely the following:
bool
str, bytes, bytearray
int, float, complex
tuple, list, dict, set and frozenset
In this answer we're going to explore what exactly the pickle protocol allows an attacker to do. This means we're only going to rely on documented features of the protocol, not implementation details (with a few exceptions). In other words, we'll assume that the source code of the pickle module is correct and bug-free and allows us to do exactly what the documentation says and nothing more.
What does the pickle protocol allow an attacker to do?
Pickle allows classes to customize how their instances are pickled. During the unpickling process, we can:
Call (almost) any class's __setstate__ method (as long as we manage to unpickle an instance of that class).
Invoke arbitrary callables with arbitrary arguments, thanks to the __reduce__ method (as long as we can gain access to the callable somehow).
Invoke (almost) any unpickled object's append, extend and __setitem__ methods, once again thanks to __reduce__.
Access any attribute that Unpickler.find_class allows us to.
Freely create instances of the following types: str, bytes, list, tuple, dict, int, float, bool. This is not documented, but these types are built into the protocol itself and don't go through Unpickler.find_class.
The most useful (from an attacker's perspective) feature here is the ability to invoke callables. If they can access exec or eval, they can make us execute arbitrary code. If they can access os.system or subprocess.Popen they can run arbitrary shell commands. Of course, we can deny them access to these with Unpickler.find_class. But how exactly should we implement our find_class method? Which functions and classes are safe, and which are dangerous?
An attacker's toolbox
Here I'll try to explain some methods an attacker can use to do evil things. Giving an attacker access to any of these functions/classes means you're in danger.
Arbitrary code execution during unpickling:
exec and eval (duh)
os.system, os.popen, subprocess.Popen and all other subprocess functions
types.FunctionType, which allows to create a function from a code object (can be created with compile or types.CodeType)
typing.get_type_hints. Yes, you read that right. How, you ask? Well, typing.get_type_hints evaluates forward references. So all you need is an object with __annotations__ like {'x': 'os.system("rm -rf /")'} and get_type_hints will run the code for you.
functools.singledispatch. I see you shaking your head in disbelief, but it's true. Single-dispatch functions have a register method, which internally calls typing.get_type_hints.
... and probably a few more
Accessing things without going through Unpickler.find_class:
Just because our find_class method prevents an attacker from accessing something directly doesn't mean there's no indirect way of accessing that thing.
Attribute access: Everything is an object in python, and objects have lots of attributes. For example, an object's class can accessed as obj.__class__, a class's parents can be accessed as cls.__bases__, etc.
getattr
operator.attrgetter
object.__getattribute__
Tools.scripts.find_recursionlimit.RecursiveBlowup5.__getattr__
... and many more
Indexing: Lots of things are stored in lists, tuples and dicts - being able to index data structures opens many doors for an attacker.
operator.itemgetter
list.__getitem__, dict.__getitem__, etc
... and almost certainly some more
See Ned Batchelder's Eval is really dangerous to find out how an attacker can use these to gain access to pretty much everything.
Code execution after unpickling:
An attacker doesn't necessarily have to do something dangerous during the unpickling process - they can also try to return a dangerous object and let you call a dangerous function on accident. Maybe you call typing.get_type_hints on the unpickled object, or maybe you expect to unpickle a CuteBunny but instead unpickle a FerociousDragon and get your hand bitten off when you try to .pet() it. Always make sure the unpickled object is of the type you expect, its attributes are of the types you expect, and it doesn't have any attributes you don't expect it to have.
At this point, it should be obvious that there aren't many modules/classes/functions you can trust. When you implement your find_class method, never ever write a blacklist - always write a whitelist, and only include things you're sure can't be abused.
So what's the answer to the question?
If you really only allow access to bool, str, bytes, bytearray, int, float, complex, tuple, list, dict, set and frozenset then you're most likely safe. But let's be honest - you should probably use JSON instead.
In general, I think most classes are safe - with exceptions like subprocess.Popen, of course. The worst thing an attacker can do is call the class - which generally shouldn't do anything more dangerous than return an instance of that class.
What you really need to be careful about is allowing access to functions (and other non-class callables), and how you handle the unpickled object.
I'd go so far as saying that there is no safe way to use pickle to handle untrusted data.
Even with restricted globals, the dynamic nature of Python is such that a determined hacker still has a chance of finding a way back to the __builtins__ mapping and from there to the Crown Jewels.
See Ned Batchelder's blog posts on circumventing restrictions on eval() that apply in equal measure to pickle.
Remember that pickle is still a stack language and you cannot foresee all possible objects produced from allowing arbitrary calls even to a limited set of globals. The pickle documentation also doesn't mention the EXT* opcodes that allow calling copyreg-installed extensions; you'll have to account for anything installed in that registry too here. All it takes is one vector allowing a object call to be turned into a getattr equivalent for your defences to crumble.
At the very least use a cryptographic signature to your data so you can validate the integrity. You'll limit the risks, but if an attacker ever managed to steal your signing secrets (keys) then they could again slip you a hacked pickle.
I would instead use an an existing innocuous format like JSON and add type annotations; e.g. store data in dictionaries with a type key and convert when loading the data.
This idea has been discussed also on the mailing list python-ideas when addressing the problem of adding a safe pickle alternative in the standard library. For example here:
To make it safer I would have a restricted unpickler as the default (for load/loads) and force people to override it if they want to loosen restrictions. To be really explicit, I would make load/loads only work with built-in types.
And also here:
I've always wanted a version of pickle.loads() that takes a list of classes that are allowed to be instantiated.
Is the following enough for you: http://docs.python.org/3.4/library/pickle.html#restricting-globals ?
Indeed, it is. Thanks for pointing it out! I've never gotten past the module interface part of the docs. Maybe the warning at the top of the page could also mention that there are ways to mitigate the safety concerns, and point to #restricting-globals?
Yes, that would be a good idea :-)
So I don't know why the documentation has not been changed but according to me, using a RestrictedUnpickler to restrict the types that can be unpickled is a safe solution. Of course there could exist bugs in the library that compromise the system, but there could be a bug also in OpenSSL that show random memory data to everyone who asks.

enforcing python function parameters types from docstring

Both epydoc and Sphinx document generators permit the coder to annotate what the types should be of any/all function parameter.
My question is: Is there a way (or module) that enforces these types (at run-time) when documented in the docstring. This wouldn't be strong-typing (compile-time checking), but (more likely) might be called firm-typing (run-time checking). Maybe raising a "ValueError", or even better still... raising a "SemanticError"
Ideally there would already be something (like a module) similar to the "import antigravity" module as per xkcd, and this "firm_type_check" module would already exist somewhere handy for download.
FYI: The docstring for epydoc and sphinz are as follows:
epydoc:
Functions and Methods parameters:
#param p: ... # A description of the parameter p for a function or method.
#type p: ... # The expected type for the parameter p.
#return: ... # The return value for a function or method.
#rtype: ... # The type of the return value for a function or method.
#keyword p: ... # A description of the keyword parameter p.
#raise e: ... # A description of the circumstances under which a function or method
raises exception e.
Sphinx: Inside Python object description directives, reST field lists with these fields are recognized and formatted nicely:
param, parameter, arg, argument, key, keyword: Description of a parameter.
type: Type of a parameter.
raises, raise, except, exception: That (and when) a specific exception is raised.
var, ivar, cvar: Description of a variable.
returns, return: Description of the return value.
rtype: Return type.
The closest I could find was a mention by Guido in mail.python.org and created by Jukka Lehtosalo at Mypy Examples. CMIIW: mypy cannot be imported as a py3 module.
Similar stackoverflow questions that do not use the docstring per se:
Pythonic Way To Check for A Parameter Type
What's the canonical way to check for type in python?
To my knowledge, nothing of the sort exists, for a few important reasons:
First, docstrings are documentation, just like comments. And just like comments, people will expect them to have no effect on the way your program works. Making your program's behavior depend on its documentation is a major antipattern, and a horrible idea all around.
Second, docstrings aren't guaranteed to be preserved. If you run python with -OO, for example, all docstrings are removed. What then?
Finally, Python 3 introduced optional function annotations, which would serve that purpose much better: http://legacy.python.org/dev/peps/pep-3107/ . Python currently does nothing with them (they're documentation), but if I were to write such a module, I'd use those, not docstrings.
My honest opinion is this: if you're gonna go through the (considerable) trouble of writing a (necessarily half-baked) static type system for Python, all the time it will take you would be put to better use by learning another programming language that supports static typing in a less insane way:
Clojure (http://clojure.org/) is incredibly dynamic and powerful (due to its nature as a Lisp) and supports optional static typing through core.typed (https://github.com/clojure/core.typed). It is geared towards concurrency and networking (it has STM and persistent data structures <3 ), has a great community, and is one of the most elegantly-designed languages I've seen. That said, it runs on the JVM, which is both a good and a bad thing.
Golang (http://golang.org/) feels sort-of Pythonic (or at least, it's attracting a lot of refugees from Python), is statically typed and compiles to native code.
Rust (http://www.rust-lang.org/) is lower-level than that, but it has one of the best type systems I've seen (type inference, pattern matching, traits, generics, zero-sized types...) and enforces memory and resource safety at compile time. It is being developed by Mozilla as a language to write their next browser (Servo) in, so performance and safety are its main goals. You can think of it as a modern take on C++. It compiles to native code, but hasn't hit 1.0 yet and as such, the language itself is still subject to change. Which is why I wouldn't recommend writing production code in it yet.

Python: can I safely unpickle untrusted data?

The pickle module documentation says right at the beginning:
Warning:
The pickle module is not intended to be secure against erroneous or
maliciously constructed data. Never unpickle data received from an
untrusted or unauthenticated source.
However, further down under restricting globals it seems to describe a way to make unpickling data safe using a whitelist of allowed objects.
Does this mean that I can safely unpickle untrusted data if I use a RestrictedUnpickler that allows only some "elementary" types, or are there additional security issues that are not addressed by this method? If there are, is there another way to make unpickling safe (obviously at the cost of not being able to unpickle every stream)?
With "elementary types" I mean precisely the following:
bool
str, bytes, bytearray
int, float, complex
tuple, list, dict, set and frozenset
In this answer we're going to explore what exactly the pickle protocol allows an attacker to do. This means we're only going to rely on documented features of the protocol, not implementation details (with a few exceptions). In other words, we'll assume that the source code of the pickle module is correct and bug-free and allows us to do exactly what the documentation says and nothing more.
What does the pickle protocol allow an attacker to do?
Pickle allows classes to customize how their instances are pickled. During the unpickling process, we can:
Call (almost) any class's __setstate__ method (as long as we manage to unpickle an instance of that class).
Invoke arbitrary callables with arbitrary arguments, thanks to the __reduce__ method (as long as we can gain access to the callable somehow).
Invoke (almost) any unpickled object's append, extend and __setitem__ methods, once again thanks to __reduce__.
Access any attribute that Unpickler.find_class allows us to.
Freely create instances of the following types: str, bytes, list, tuple, dict, int, float, bool. This is not documented, but these types are built into the protocol itself and don't go through Unpickler.find_class.
The most useful (from an attacker's perspective) feature here is the ability to invoke callables. If they can access exec or eval, they can make us execute arbitrary code. If they can access os.system or subprocess.Popen they can run arbitrary shell commands. Of course, we can deny them access to these with Unpickler.find_class. But how exactly should we implement our find_class method? Which functions and classes are safe, and which are dangerous?
An attacker's toolbox
Here I'll try to explain some methods an attacker can use to do evil things. Giving an attacker access to any of these functions/classes means you're in danger.
Arbitrary code execution during unpickling:
exec and eval (duh)
os.system, os.popen, subprocess.Popen and all other subprocess functions
types.FunctionType, which allows to create a function from a code object (can be created with compile or types.CodeType)
typing.get_type_hints. Yes, you read that right. How, you ask? Well, typing.get_type_hints evaluates forward references. So all you need is an object with __annotations__ like {'x': 'os.system("rm -rf /")'} and get_type_hints will run the code for you.
functools.singledispatch. I see you shaking your head in disbelief, but it's true. Single-dispatch functions have a register method, which internally calls typing.get_type_hints.
... and probably a few more
Accessing things without going through Unpickler.find_class:
Just because our find_class method prevents an attacker from accessing something directly doesn't mean there's no indirect way of accessing that thing.
Attribute access: Everything is an object in python, and objects have lots of attributes. For example, an object's class can accessed as obj.__class__, a class's parents can be accessed as cls.__bases__, etc.
getattr
operator.attrgetter
object.__getattribute__
Tools.scripts.find_recursionlimit.RecursiveBlowup5.__getattr__
... and many more
Indexing: Lots of things are stored in lists, tuples and dicts - being able to index data structures opens many doors for an attacker.
operator.itemgetter
list.__getitem__, dict.__getitem__, etc
... and almost certainly some more
See Ned Batchelder's Eval is really dangerous to find out how an attacker can use these to gain access to pretty much everything.
Code execution after unpickling:
An attacker doesn't necessarily have to do something dangerous during the unpickling process - they can also try to return a dangerous object and let you call a dangerous function on accident. Maybe you call typing.get_type_hints on the unpickled object, or maybe you expect to unpickle a CuteBunny but instead unpickle a FerociousDragon and get your hand bitten off when you try to .pet() it. Always make sure the unpickled object is of the type you expect, its attributes are of the types you expect, and it doesn't have any attributes you don't expect it to have.
At this point, it should be obvious that there aren't many modules/classes/functions you can trust. When you implement your find_class method, never ever write a blacklist - always write a whitelist, and only include things you're sure can't be abused.
So what's the answer to the question?
If you really only allow access to bool, str, bytes, bytearray, int, float, complex, tuple, list, dict, set and frozenset then you're most likely safe. But let's be honest - you should probably use JSON instead.
In general, I think most classes are safe - with exceptions like subprocess.Popen, of course. The worst thing an attacker can do is call the class - which generally shouldn't do anything more dangerous than return an instance of that class.
What you really need to be careful about is allowing access to functions (and other non-class callables), and how you handle the unpickled object.
I'd go so far as saying that there is no safe way to use pickle to handle untrusted data.
Even with restricted globals, the dynamic nature of Python is such that a determined hacker still has a chance of finding a way back to the __builtins__ mapping and from there to the Crown Jewels.
See Ned Batchelder's blog posts on circumventing restrictions on eval() that apply in equal measure to pickle.
Remember that pickle is still a stack language and you cannot foresee all possible objects produced from allowing arbitrary calls even to a limited set of globals. The pickle documentation also doesn't mention the EXT* opcodes that allow calling copyreg-installed extensions; you'll have to account for anything installed in that registry too here. All it takes is one vector allowing a object call to be turned into a getattr equivalent for your defences to crumble.
At the very least use a cryptographic signature to your data so you can validate the integrity. You'll limit the risks, but if an attacker ever managed to steal your signing secrets (keys) then they could again slip you a hacked pickle.
I would instead use an an existing innocuous format like JSON and add type annotations; e.g. store data in dictionaries with a type key and convert when loading the data.
This idea has been discussed also on the mailing list python-ideas when addressing the problem of adding a safe pickle alternative in the standard library. For example here:
To make it safer I would have a restricted unpickler as the default (for load/loads) and force people to override it if they want to loosen restrictions. To be really explicit, I would make load/loads only work with built-in types.
And also here:
I've always wanted a version of pickle.loads() that takes a list of classes that are allowed to be instantiated.
Is the following enough for you: http://docs.python.org/3.4/library/pickle.html#restricting-globals ?
Indeed, it is. Thanks for pointing it out! I've never gotten past the module interface part of the docs. Maybe the warning at the top of the page could also mention that there are ways to mitigate the safety concerns, and point to #restricting-globals?
Yes, that would be a good idea :-)
So I don't know why the documentation has not been changed but according to me, using a RestrictedUnpickler to restrict the types that can be unpickled is a safe solution. Of course there could exist bugs in the library that compromise the system, but there could be a bug also in OpenSSL that show random memory data to everyone who asks.

How to handle "duck typing" in Python?

I usually want to keep my code as generic as possible. I'm currently writing a simple library and being able to use different types with my library feels extra important this time.
One way to go is to force people to subclass an "interface" class. To me, this feels more like Java than Python and using issubclass in each method doesn't sound very tempting either.
My preferred way is to use the object in good faith, but this will raise some AttributeErrors. I could wrap each dangerous call in a try/except block. This, too, seems kind of cumbersome:
def foo(obj):
...
# it should be able to sleep
try:
obj.sleep()
except AttributeError:
# handle error
...
# it should be able to wag it's tail
try:
obj.wag_tail()
except AttributeError:
# handle this error as well
Should I just skip the error handling and expect people to only use objects with the required methods? If I do something stupid like [x**2 for x in 1234] I actually get a TypeError and not a AttributeError (ints are not iterable) so there must be some type checking going on somewhere -- what if I want to do the same?
This question will be kind of open ended, but what is the best way to handle the above problem in a clean way? Are there any established best practices? How is the iterable "type checking" above, for example, implemented?
Edit
While AttributeErrors are fine, the TypeErrors raised by native functions usually give more information about how to solve the errors. Take this for example:
>>> ['a', 1].sort()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()
I'd like my library to be as helpful as possible.
I'm not a python pro but I believe that unless you can try an alternative for when the parameter doesn't implement a given method, you shoudn't prevent exceptions from being thrown. Let the caller handle these exceptions. This way, you would be hidding problems from the developers.
As I have read in Clean Code, if you want to search for an item in a collection, don't test your parameters with ìssubclass (of a list) but prefer to call getattr(l, "__contains__"). This will give someone who is using your code a chance to pass a parameter that isn't a list but which has a __contains__ method defined and things should work equally well.
So, I think that you should code in an abstract, generic way, imposing as few restrictions as you can. For that, you'll have to make the fewest assumptions possible. However, when you face something that you can't handle, raise an exception and let the programmer know what mistake he made!
If your code requires a particular interface, and the user passes an object without that interface, then nine times out of ten, it's inappropriate to catch the exception. Most of the time, an AttributeError is not only reasonable but expected when it comes to interface mismatches.
Occasionally, it may be appropriate to catch an AttributeError for one of two reasons. Either you want some aspect of the interface to be optional, or you want to throw a more specific exception, perhaps a package-specific exception subclass. Certainly you should never prevent an exception from being thrown if you haven't honestly handled the error and any aftermath.
So it seems to me that the answer to this question must be problem- and domain- specific. It's fundamentally a question of whether using a Cow object instead of a Duck object ought to work. If so, and you handle any necessary interface fudging, then that's fine. On the other hand, there's no reason to explicitly check whether someone has passed you a Frog object, unless that will cause a disastrous failure (i.e. something much worse than a stack trace).
That said, it's always a good idea to document your interface -- that's what docstrings (among other things) are for. When you think about it, it's much more efficient to throw a general error for most cases and tell users the right way to do it in the docstring, than to try to foresee every possible error a user might make and create a custom error message.
A final caveat -- it's possible that you're thinking about UI here -- I think that's another story. It's good to check the input that an end user gives you to make sure it isn't malicious or horribly malformed, and provide useful feedback instead of a stack trace. But for libraries or things like that, you really have to trust the programmer using your code to use it intelligently and respectfully, and to understand the errors that Python generates.
If you just want the unimplemented methods to do nothing, you can try something like this, rather than the multi-line try/except construction:
getattr(obj, "sleep", lambda: None)()
However, this isn't necessarily obvious as a function call, so maybe:
hasattr(obj, "sleep") and obj.sleep()
or if you want to be a little more sure before calling something that it can in fact be called:
hasattr(obj, "sleep") and callable(obj.sleep) and obj.sleep()
This "look-before-you-leap" pattern is generally not the preferred way to do it in Python, but it is perfectly readable and fits on a single line.
Another option of course is to abstract the try/except into a separate function.
Good question, and quite open-ended. I believe typical Python style is not to check, either with isinstance or catching individual exceptions. Cerainly, using isinstance is quite bad style, as it defeats the whole point of duck typing (though using isinstance on primitives can be OK -- be sure to check for both int and long for integer inputs, and check for basestring for strings (base class of str and unicode). If you do check, you hould raise a TypeError.
Not checking is generally OK, as it typically raises either a TypeError or AttributeError anyway, which is what you want. (Though it can delay those errors making client code hard to debug).
The reason you see TypeErrors is that primitive code raises it, effectively because it does an isinstance. The for loop is hard-coded to raise a TypeError if something is not iterable.
First of all, the code in your question is not ideal:
try:
obj.wag_tail()
except AttributeError:
...
You don't know whether the AttributeError is from the lookup of wag_tail or whether it happened inside the function. What you are trying to do is:
try:
f = getattr(obj, 'wag_tail')
except AttributeError:
...
finally:
f()
Edit: kindall rightly points out that if you are going to check this, you should also check that f is callable.
In general, this is not Pythonic. Just call and let the exception filter down, and the stack trace is informative enough to fix the problem. I think you should ask yourself whether your rethrown exceptions are useful enough to justify all of this error-checking code.
The case of sorting a list is a great example.
List sorting is very common,
passing unorderable types happens for a significant proportion of those, and
throwing AttributeError in that case is very confusing.
If those three criteria apply to your problem (especially the third), I agree with building pretty exception rethrower.
You have to balance with the fact that throwing these pretty errors is going to make your code harder to read, which statistically means more bugs in your code. It's a question of balancing the pros and the cons.
If you ever need to check for behaviours (like __real__ and __contains__), don't forget to use the Python abstract base classes found in collections, io, and numbers.

How to deal with wrong parameters types in dynamic-languages?

I can't find the answer on SO but it's very likely that the argument has been already discussed.
I'm trying to write a quite small size program using the Python language. It's my first "real" experience with a dynamic language and I would prefer to do everything in the right way. One of the practice that I would try to apply since the beginning is unit-testing.
How can I quickly test that the parameters of a method are of the right type? Should I do it?
With right type I mean for instance to check that a method that works with float numbers is not called with a String. In this case consider the possibility that the method should obviously accept even integers and not only float.
How can I quickly test that the parameters of a method are of the right type?
The quickest way is to do nothing.
Seriously. The dynamic language interpreter (Python in this case) will check far more quickly than any code you could ever write. It will simply raise an exception and that's all you need to do. Nothing.
Should I do it?
Never test for the right type. You can't -- in general -- do it.
Let's say you have a function that requires a "number"
def some_func( x ):
assert isinstance(x, int)
Bad policy. Your function may work for long or float just as well as int.
assert instance( x, (int, long, float) )
Bad policy. You've still excluded complex. Indeed, you've also excluded decimal.Decimal and fractions.Rational which are also valid numbers.
By "type checking" you're going to exclude valid types. The only thing you can do is assume the types are proper and handle the exception gracefully when someone "misuses" your function or class and provides wrong types.
The most graceful way to handle TypeError?
Do nothing. The program should totally crash.
You shouldn't test for specific types. According to the docs, you should simply use the passed-in object as needed and give the user the opportunity to supply a custom implementation.
Depending on what your function does, it may be appropriate to convert the arguments to the expected type:
def my_func(a):
a = float(a)
# ...do stuff...
Another excellent option is to use hasattr() to check for the desired member before using it. That would let you throw a helpful exception, instead of the default AttributeError.
Unit testing with complete coverage is really the only way to handle any development work which relies on dynamic languages. Clearly it's very beneficial to have strong coverage tests for statically typed languages but in my experience it's even more important when you have dynamic typing.
If you aren't covering all the code that can run in your tests, then really you are asking for trouble. So you want to use a coverage analysis tool in tandem with your unit tests to prove that you are reaching all of your code.
Even that won't guard against all pitfalls – your tests really need to exercise all the possibly erroneous input data errors your program may receive.

Categories