Tests for Basic Python Data Structure Interfaces - python

A fairly small question: does anyone know about a pre-made suite of Python unit tests that just check if a class conforms to one of the standard Python data structure interfaces (e.g., lists, sets, dictionaries, queues, etc). It's not overly hard to write them, but I'd hate to bother doing so if someone has already done this. It seems like very basic functionality that someone has probably done already.
The use case is that I am using a factory pattern to create data structures due to different restrictions related to platforms. As such, I need to be able to test that the resulting created objects still conform to the standard interfaces on the surface. Also, I should note that by "conform" I mean that the tests should check not just that the interface functions exist, but also check that they work (e.g., can set and retrieve a value in a map, for instance). Python 2.7 tests would be preferred.

First, "the standard Python data structure interfaces" are not lists, sets, dictionaries, queues, etc. Those are specific implementations of the interfaces. (And queue isn't even a data structure in the sense you're thinking of—its salient features are that its operations are atomic, and put and get optionally synchronize on a Condition, and so on.)
Anyway, the interfaces are defined in five different not-quite-compatible ways.
The Built-in Types section of the documentation describes what it means to be an iterator type, a sequence type, etc. However, these are not nearly as rigorous as you'd expect for reference documentation (at least if you're used to, say, C++ or Java).
I'm not aware of any tests for such a thing, so I think you'd have to build them from scratch.
The collections module contains Collections Abstract Base Classes that define the interfaces, and provide a way to register "virtual subclasses" via the abc module. So, you can declare "I am a mapping" by inheriting from collections.Mapping, or calling collections.Mapping.register. But that doesn't actually prove that you are a mapping, just that you're claiming to be. (If you inherit from Mapping, it also acts as a mixin that helps you complete the interface by implementing, e.g., __contains__ on top of __getitem__.)
If you want to test the ABC meaning, defuz's answer is very close, and with a little more work I think he or someone else can complete it.
The CPython C API defines an Abstract Objects Layer. While this is not actually authoritative for the language, it's obviously intended that the C-API protocols and the language-level interfaces are supposed to match. And, unlike the latter, the former are rigorously defined. And of course the source code from CPython 2.7, and maybe other implementations like PyPy, may help.
There are tests for this that come with CPython, but really, they're for testing that calling PyMapping_GetItem from C properly calls your mymapping.__getitem__ in Python, which is really at a tangent to what you want to test, so I don't think it will help much.
The actual concrete classes have additional interface on top of the protocols, that you may want to test, but that's harder to describe. In particular, the way the __new__ and __init__ methods work is often important. Implementing the Mapping protocol means someone can construct an empty Foo instance and add items to it with foo[key] = value, but it doesn't mean someone can construct Foo(key=value), or Foo({key: value}) or Foo([(key, value)]).
And for this case, there are existing tests that come with all of the standard Python implementations. CPython comes with a very extensive test suite that includes things like test_dict.py. PyPy runs all the (Python-level) CPython tests, and some extra ones besides.
You will obviously have to modify these tests to run on an arbitrary class instead of one hardcoded into the tests, and you may also have to modify them to handle whichever definition you pick. Plus, they probably test more than you asked for. You just want to know if a class conforms to the protocol, not whether its methods do the right thing, right? But still, I think they're a good starting point.
Finally, the C API defines a Concrete Objects Layer that, although it's not authoritative, matches the previous definition and is more rigorously defined.
Unfortunately, the tests for this one are definitely not going to be very useful to you, because they're checking things like whether PyDict_Check and PyDict_GetItem work on your class, which they will not for any mapping defined in pure Python.
If you do build something complete for any of these definitions, I would strongly suggest putting it on PyPI, and posting about it to python-list, so you get feedback (and bug reports).

There are abstract base classes in standart module collections based on ABC module.
You have to inherit your classes from these classes to be sure that your classes correspond to the standard behavior:
import collections
class MyDict(collections.Mapping):
...
Also, your can test already existed class that does not obviously inherit the abstract class:
class MyPerfectDict(object):
... realization ...
def is_inherit(cls, abstract):
try:
class Test(abstract, cls): pass
test = Test()
except TypeError:
return False
else:
return True
is_inherit(MyPerfectDict, Mapping) # False
is_inherit(dict, Mapping) # True

Related

subclassing dict; dict.update returns incorrrect value - python bug?

I needed to make a class that extended dict and ran into an interesting problem illustrated by the dumb example in the image below.
Why is d.update() ignoring the class's __getitem__?
EDIT: This is in python2.7 which does not appear to contain collections.UserDict
Thinking UserDict.UserDict is the equivalent I tried this, and it gets closer, but still behaves interestingly.
This is an example of the open-closed-principle (the class is open for extension but closed for modification). It is good thing to have because it allows subclassers to extend or override a method without unintentionally triggering behavior changes in others and without breaking the classes's invariants.
We even do this in pure python code as well; for example, inside the pure python ordered dict code, the class local call from __init__() to update() is done using name mangling. This allows a subclasser to override update() without accidentally breaking __init__().
Sometimes, this is inconvenient. It means that a subclasser has to override every method whose behavior they want to change including get(), update(), and others. However, there are offsetting benefits (protection of internal invariants, preventing implementation details from leaking from the abstraction, and allowing users to assume the methods are independent of one another).
This style (chosen by Guido from the outset) is the default for the builtin types (otherwise we would forever be fighting segfaulting invariant violations) and for some pure python classes.
We do document when there is a departure from the default. For example, the cmd module uses the framework design pattern, letting the user define various do_action() methods. Also, some of the http modules do the same, specifically documenting that a user's do_GET() method is called and that is how you attach customized HTTP event handlers.
In the absence of specifically documented method hooks (i.e. those listed above or methods like dict.__missing__(), a subclasser should presume method independence. Otherwise, how are you to know whether __getitem__() calls get() under the hood or vice-versa?
FWIW, this isn't unique to Python. It comes up quite a bit in object oriented programming. Correctly designed classes either document root methods that affect the behavior of other methods or they are presumed to be independent.
There may need to be a FAQ for this, but nothing is broken or wrong here (other than Python having way too many dict variants to chose from). If someone mistakenly assumes or believes that __getitem__() must be called by the other accessor methods, they find out very quickly that assumption is wrong (that is if they run even minimal tests on the code).

abstract classes in python: Enforcing type

My question is related to this question Is enforcing an abstract method implementation unpythonic? . I am using abstract classes in python but I realize that there is nothing that stops the user from implementing a function that takes an argument that is totally different from what was intended for the base class. Therefore, I may have to check the type of the argument. Isn't that unpythonic?
How do people use abstract classes in python?
Short answer: yes it is unpythonic. It might still make sense to have an abstract base class to check your implementations for completeness. But the pythonic way to do it is mostly just duck-typing, i.e. assume that the object you work with satisfies the specification you expect. Just state these expectations explicitly in the documentation.
Actually you typically would not do a lot of these safety checks. Even if you would check types, one might still be able to fake that at runtime. It also allows users of your code to be more flexible as to what kind of objects they pass to your code. Having to implement an ABC every single time clutters their code a lot.

How to document and test interfaces required of formal parameters in Python 2?

To ask my very specific question I find I need quite a long introduction to motivate and explain it -- I promise there's a proper question at the end!
While reading part of a large Python codebase, sometimes one comes across code where the interface required of an argument is not obvious from "nearby" code in the same module or package. As an example:
def make_factory(schema):
entity = schema.get_entity()
...
There might be many "schemas" and "factories" that the code deals with, and "def get_entity()" might be quite common too (or perhaps the function doesn't call any methods on schema, but just passes it to another function). So a quick grep isn't always helpful to find out more about what "schema" is (and the same goes for the return type). Though "duck typing" is a nice feature of Python, sometimes the uncertainty in a reader's mind about the interface of arguments passed in as the "schema" gets in the way of quickly understanding the code (and the same goes for uncertainty about typical concrete classes that implement the interface). Looking at the automated tests can help, but explicit documentation can be better because it's quicker to read. Any such documentation is best when it can itself be tested so that it doesn't get out of date.
Doctests are one possible approach to solving this problem, but that's not what this question is about.
Python 3 has a "parameter annotations" feature (part of the function annotations feature, defined in PEP 3107). The uses to which that feature might be put aren't defined by the language, but it can be used for this purpose. That might look like this:
def make_factory(schema: "xml_schema"):
...
Here, "xml_schema" identifies a Python interface that the argument passed to this function should support. Elsewhere there would be code that defines that interface in terms of attributes, methods & their argument signatures, etc. and code that allows introspection to verify whether particular objects provide an interface (perhaps implemented using something like zope.interface / zope.schema). Note that this doesn't necessarily mean that the interface gets checked every time an argument is passed, nor that static analysis is done. Rather, the motivation of defining the interface is to provide ways to write automated tests that verify that this documentation isn't out of date (they might be fairly generic tests so that you don't have to write a new test for each function that uses the parameters, or you might turn on run-time interface checking but only when you run your unit tests). You can go further and annotate the interface of the return value, which I won't illustrate.
So, the question:
I want to do exactly that, but using Python 2 instead of Python 3. Python 2 doesn't have the function annotations feature. What's the "closest thing" in Python 2? Clearly there is more than one way to do it, but I suspect there is one (relatively) obvious way to do it.
For extra points: name a library that implements the one obvious way.
Take a look at plac that uses annotations to define a command-line interface for a script. On Python 2.x it uses plac.annotations() decorator.
The closest thing is, I believe, an annotation library called PyAnno.
From the project webpage:
"The Pyanno annotations have two functions:
Provide a structured way to document Python code
Perform limited run-time checking "

Why has Python decided against constant references?

Note: I'm not talking about preventing the rebinding of a variable. I'm talking about preventing the modification of the memory that the variable refers to, and of any memory that can be reached from there by following the nested containers.
I have a large data structure, and I want to expose it to other modules, on a read-only basis. The only way to do that in Python is to deep-copy the particular pieces I'd like to expose - prohibitively expensive in my case.
I am sure this is a very common problem, and it seems like a constant reference would be the perfect solution. But I must be missing something. Perhaps constant references are hard to implement in Python. Perhaps they don't quite do what I think they do.
Any insights would be appreciated.
While the answers are helpful, I haven't seen a single reason why const would be either hard to implement or unworkable in Python. I guess "un-Pythonic" would also count as a valid reason, but is it really? Python does do scrambling of private instance variables (starting with __) to avoid accidental bugs, and const doesn't seem to be that different in spirit.
EDIT: I just offered a very modest bounty. I am looking for a bit more detail about why Python ended up without const. I suspect the reason is that it's really hard to implement to work perfectly; I would like to understand why it's so hard.
It's the same as with private methods: as consenting adults authors of code should agree on an interface without need of force. Because really really enforcing the contract is hard, and doing it the half-assed way leads to hackish code in abundance.
Use get-only descriptors, and state clearly in your documentation that these data is meant to be read only. After all, a determined coder could probably find a way to use your code in different ways you thought of anyways.
In PEP 351, Barry Warsaw proposed a protocol for "freezing" any mutable data structure, analogous to the way that frozenset makes an immutable set. Frozen data structures would be hashable and so capable being used as keys in dictionaries.
The proposal was discussed on python-dev, with Raymond Hettinger's criticism the most detailed.
It's not quite what you're after, but it's the closest I can find, and should give you some idea of the thinking of the Python developers on this subject.
There are many design questions about any language, the answer to most of which is "just because". It's pretty clear that constants like this would go against the ideology of Python.
You can make a read-only class attribute, though, using descriptors. It's not trivial, but it's not very hard. The way it works is that you can make properties (things that look like attributes but call a method on access) using the property decorator; if you make a getter but not a setter property then you will get a read-only attribute. The reason for the metaclass programming is that since __init__ receives a fully-formed instance of the class, you actually can't set the attributes to what you want at this stage! Instead, you have to set them on creation of the class, which means you need a metaclass.
Code from this recipe:
# simple read only attributes with meta-class programming
# method factory for an attribute get method
def getmethod(attrname):
def _getmethod(self):
return self.__readonly__[attrname]
return _getmethod
class metaClass(type):
def __new__(cls,classname,bases,classdict):
readonly = classdict.get('__readonly__',{})
for name,default in readonly.items():
classdict[name] = property(getmethod(name))
return type.__new__(cls,classname,bases,classdict)
class ROClass(object):
__metaclass__ = metaClass
__readonly__ = {'a':1,'b':'text'}
if __name__ == '__main__':
def test1():
t = ROClass()
print t.a
print t.b
def test2():
t = ROClass()
t.a = 2
test1()
While one programmer writing code is a consenting adult, two programmers working on the same code seldom are consenting adults. More so if they do not value the beauty of the code but them deadlines or research funds.
For such adults there is some type safety, provided by Enthought's Traits.
You could look into Constant and ReadOnly traits.
For some additional thoughts, there is a similar question posed about Java here:
Why is there no Constant feature in Java?
When asking why Python has decided against constant references, I think it's helpful to think of how they would be implemented in the language. Should Python have some sort of special declaration, const, to create variable references that can't be changed? Why not allow variables to be declared a float/int/whatever then...these would surely help prevent programming bugs as well. While we're at it, adding class and method modifiers like protected/private/public/etc. would help enforce compile-type checking against illegal uses of these classes. ...pretty soon, we've lost the beauty, simplicity, and elegance that is Python, and we're writing code in some sort of bastard child of C++/Java.
Python also currently passes everything by reference. This would be some sort of special pass-by-reference-but-flag-it-to-prevent-modification...a pretty special case (and as the Tao of Python indicates, just "un-Pythonic").
As mentioned before, without actually changing the language, this type of behaviour can be implemented via classes & descriptors. It may not prevent modification from a determined hacker, but we are consenting adults. Python didn't necessarily decide against providing this as an included module ("batteries included") - there was just never enough demand for it.

Module vs object-oriented programming in vba

My first "serious" language was Java, so I have comprehended object-oriented programming in sense that elemental brick of program is a class.
Now I write on VBA and Python. There are module languages and I am feeling persistent discomfort: I don't know how should I decompose program in a modules/classes.
I understand that one module corresponds to one knowledge domain, one module should ba able to test separately...
Should I apprehend module as namespace(c++) only?
I don't do VBA but in python, modules are fundamental. As you say, the can be viewed as namespaces but they are also objects in their own right. They are not classes however, so you cannot inherit from them (at least not directly).
I find that it's a good rule to keep a module concerned with one domain area. The rule that I use for deciding if something is a module level function or a class method is to ask myself if it could meaningfully be used on any objects that satisfy the 'interface' that it's arguments take. If so, then I free it from a class hierarchy and make it a module level function. If its usefulness truly is restricted to a particular class hierarchy, then I make it a method.
If you need it work on all instances of a class hierarchy and you make it a module level function, just remember that all the the subclasses still need to implement the given interface with the given semantics. This is one of the tradeoffs of stepping away from methods: you can no longer make a slight modification and call super. On the other hand, if subclasses are likely to redefine the interface and its semantics, then maybe that particular class hierarchy isn't a very good abstraction and should be rethought.
It is matter of taste. If you use modules your 'program' will be more procedural oriented. If you choose classes it will be more or less object oriented. I'm working with Excel for couple of months and personally I choose classes whenever I can because it is more comfortable to me. If you stop thinking about objects and think of them as Components you can use them with elegance. The main reason why I prefer classes is that you can have it more that one. You can't have two instances of module. It allows me use encapsulation and better code reuse.
For example let's assume that you like to have some kind of logger, to log actions that were done by your program during execution. You can write a module for that. It can have for example a global variable indicating on which particular sheet logging will be done. But consider the following hypothetical situation: your client wants you to include some fancy report generation functionality in your program. You are smart so you figure out that you can use your logging code to prepare them. But you can't do log and report simultaneously by one module. And you can with two instances of logging Component without any changes in their code.
Idioms of languages are different and thats the reason a problem solved in different languages take different approaches.
"C" is all about procedural decomposition.
Main idiom in Java is about "class or Object" decomposition. Functions are not absent, but they become a part of exhibited behavior of these classes.
"Python" provides support for both Class based problem decomposition as well as procedural based.
All of these uses files, packages or modules as concept for organizing large code pieces together. There is nothing that restricts you to have one module for one knowledge domain.
These are decomposition and organizing techniques and can be applied based on the problem at hand.
If you are comfortable with OO, you should be able to use it very well in Python.
VBA also allows the use of classes. Unfortunately, those classes don't support all the features of a full-fleged object oriented language. Especially inheritance is not supported.
But you can work with interfaces, at least up to a certain degree.
I only used modules like "one module = one singleton". My modules contain "static" or even stateless methods. So in my opinion a VBa module is not namespace. More often a bunch of classes and modules would form a "namespace". I often create a new project (DLL, DVB or something similar) for such a "namespace".

Categories