The most pythonic way to implement two constructors - python

Pardon incompetence of style from Python novice here.
I have a class that takes one parameter for establishing the initial data. There are two ways how the initial data can come in: either a list of strings, or a dictionary with string keys and integer values.
Right now I implement only one version of the constructor, the one that takes the dictionary for parameter, with {} as default value. The list parameter init is implemented as a method, ie
myClass = MyClass()
myClass.initList(listVar)
I can surely live with this, but this certainly is not perfect. So, I decided to turn here for some Pythonic wisdom: how such polymorphic constructors should be implemented? Should I try and fail to read initData.keys() in order to sniff if this is dictionary? Or maybe sniffing parameter types to implement lousy polymorphism where it's not welcome by design is considered non-pythonic?

In an ideal world you'd write one constructor that could take either a list or dict without knowing the difference (i.e. duck typed). Of course, this isn't very realistic since these are pretty different ducks.
Understandably, too, you have a little heartburn about the idea of checking the actual instance types, because it breaks with the idea of duck typing. But, in python 2.6 an interesting module called abc was introduced which allows the definition of "abstract base classes". To be considered an instance of an abstract base class, one doesn't actually have to inherit from it, but rather only has to implement all its abstract methods.
The collections module includes some abstract base classes that would be of interest here, namely collections.Sequence and collections.Mapping. Thus you could write your __init__ functions like:
def __init__(self, somedata):
if isinstance(somedata, collections.Sequence):
# somedata is a list or some other Sequence
elif isinstance(somedata, collections.Mapping):
# somedata is a dict or some other Mapping
http://docs.python.org/2/library/collections.html#collections-abstract-base-classes contains the specifics of which methods are provided by each ABC. If you stick to these, then your code can now accept any object which fits one of these abstract base classes. And, as far as taking the builtin dict and list types, you can see that:
>>> isinstance([], collections.Sequence)
True
>>> isinstance([], collections.Mapping)
False
>>> isinstance({}, collections.Sequence)
False
>>> isinstance({}, collections.Mapping)
True
And, almost by accident, you just made it work for tuple too. You probably didn't care if it was really a list, just that you can read the elements out of it. But, if you had checked isinstance(somedata, list) you would have ruled out tuple. This is what using an ABC buys you.

As #Jan-PhilipGehrcke notes, pythonic can be hard to quantify. To me it means:
easy to read
easy to maintain
simple is better than complex is better than complicated
etcetera, etcetera, and so forth (see the Zen of Python for the complete list, which you get by typing import this in the interpreter)
So, the most pythonic solution depends on what you have to do for each supported initializer, and how many of them you have. I would say if you have only a handful, and each one can be handled by only a few lines of code, then use isinstance and __init__:
class MyClass(object):
def __init__(self, initializer):
"""
initialize internal data structures with 'initializer'
"""
if isinstance(initializer, dict):
for k, v in itit_dict.items():
# do something with k & v
setattr(self, k, v)
elif isinstance(initializer, (list, tuple)):
for item in initializer:
setattr(self, item, None)
On the other hand, if you have many possible initializers, or if any one of them requires a lot of code to handle, then you'll want to have one classmethod constructor for each possible init type, with the most common usage being in __init__:
class MyClass(object):
def __init__(self, init_dict={}):
"""
initialize internal data structures with 'init_dict'
"""
for k, v in itit_dict.items():
# do something with k & v
setattr(self, k, v)
#classmethod
def from_sequence(cls, init_list):
"""
initialize internal data structures with 'init_list'
"""
result = cls()
for item in init_list:
setattr(result, item, None)
return result
This keeps each possible constructor simple, clean, and easy to understand.
As a side note: using mutable objects as defaults (like I do in the above __init__) needs to be done with care; the reason is that defaults are only evaluated once, and then whatever the result is will be used for every subsequent invocation. This is only a problem when you modify that object in your function, because those modifications will then be seen by every subsequent invocation -- and unless you wanted to create a cache that's probably not the behavior you were looking for. This is not a problem with my example because I am not modifying init_dict, just iterating over it (which is a no-op if the caller hasn't replaced it as it's empty).

There is no function overloading in Python. Using if isinstance(...) in your __init__() method would be very simple to read and understand:
class Foo(object):
def __init__(self, arg):
if isinstance(arg, dict):
...
elif isinstance(arg, list):
...
else:
raise Exception("big bang")

You can use *args and **kwargs to do it.
But if you want to know what type of parametr is you should use type() or isinstance().

Related

How to have a class return a list when list() is called on an instance of it

I am trying to have a list returned when I call list() on a class. What's the best way to do this.
class Test():
def __init__(self):
self.data = [1,2,3]
def aslist(self):
return self.data
a = Test()
list(a)
[1,2,3]
I want when list(a) is called for it to run the aslist function and ideally I'd like to implement asdict that works when dict() is called
I'd like to be able to do this with dict, int and all other type casts
Unlike many other languages you might be used to (e.g., C++), Python doesn't have any notion of "type casts" or "conversion operators" or anything like that.
Instead, Python types' constructors are generally written to some more generic (duck-typed) protocol.
The first thing to do is to go to the documentation for whichever constructor you care about and see what it wants. Start in Builtin Functions, even if most of them will link you to an entry in Builtin Types.
Many of them will link to an entry for the relevant special method in the Data Model chapter.
For example, int says:
… If x defines __int__(), int(x) returns x.__int__(). If x defines __trunc__(), it returns x.__trunc__() …
You can then follow the link to __int__, although in this case there's not much extra information:
Called to implement the built-in functions complex(), int() and float(). Should return a value of the appropriate type.
So, you want to define an __int__ method, and it should return an int:
class MySpecialZero:
def __int__(self):
return 0
The sequence and set types (like list, tuple, set, frozenset) are a bit more complicated. They all want an iterable:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an __iter__() method or with a __getitem__() method that implements Sequence semantics.
This is explained a bit better under the iter function, which may not be the most obvious place to look:
… object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0) …
And under __iter__ in the Data Model:
This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container.
Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types.
So, for your example, you want to be an object that iterates over the elements of self.data, which means you want an __iter__ method that returns an iterator over those elements. The easiest way to do that is to just call iter on self.data—or, if you want that aslist method for other reasons, maybe call iter on what that method returns:
class Test():
def __init__(self):
self.data = [1,2,3]
def aslist(self):
return self.data
def __iter__(self):
return iter(self.aslist())
Notice that, as Edward Minnix explained, Iterator and Iterable are separate things. An Iterable is something that can produce an Iterator when you call its __iter__ method. All Iterators are Iterables (they produce themselves), but many Iterables are not Iterators (Sequences like list, for example).
dict (and OrderedDict, etc.) is also a bit complicated. Check the docs, and you'll see that it wants either a mapping (that is, something like a dict) or an iterable of key-value pairs (those pairs themselves being iterables). In this case, unless you're implementing a full mapping, you probably want the fallback:
class Dictable:
def __init__(self):
self.names, self.values = ['a', 'b', 'c'], [1, 2, 3]
def __iter__(self):
return zip(self.names, self.values)
Almost everything else is easy, like int—but notice that str, bytes, and bytearray are sequences.
Meanwhile, if you want your object to be convertible to an int or to a list or to a set, you might want it to also act a lot like one in other ways. If that's the case, look at collections.abc and numbers, which not provide helpers that are not only abstract base classes (used if you need to check whether some type meets some protocol), but also mixins (used to help you implement the protocol).
For example, a full Sequence is expected to provide most of the same methods as a tuple—about 7 of them—but if you use the mixin, you only need to define 2 yourself:
class MySeq(collections.abc.Sequence):
def __init__(self, iterable):
self.data = tuple(iterable)
def __getitem__(self, idx):
return self.data[idx]
def __len__(self):
return len(self.data)
Now you can use a MySeq almost anywhere you could use a tuple—including constructing a list from it, of course.
For some types, like MutableSequence, the shortcuts help even more—you get 17 methods for the price of 5.
If you want the same object to be list-able and dict-able… well, then you run into a limitation of the design. list wants an iterable. dict wants an iterable of pairs, or a mapping—which is a kind of iterable. So, rather than infinite choices, you only really have two:
Iterate keys and implement __getitem__ with those keys for dict, so list gives a list of those keys.
Iterate key-value pairs for dict, so list gives a list of those key-value pairs.
Obviously if you want to actually act like a Mapping, you only have one choice, the first one.
The fact that the sequence and mapping protocols overlap has been part of Python from the beginning, inherent in the fact that you can use the [] operator on both of them, and has been retained with every major change since, even though it's made other features (like the whole ABC model) more complicated. I don't know if anyone's ever given a reason, but presumably it's similar to the reason for the extended-slicing design. In other words, making dicts and other mappings a lot easier and more readable to use is worth the cost of making them a little more complicated and less flexible to implement.
This can be done with overloading special methods. You will need to define the __iter__ method for your class, making it iterable. This means anything expecting an iterable (like most collections constructors like list, set, etc.) will then work with your object.
class Test:
...
def __iter__(self):
return iter(self.data)
Note: You will need to wrap the returned object with iter() so that it is an iterator (there is a difference between iterable and iterator). A list is iterable (can be iterated over), but not an iterator (supports __next__, raises StopIteration when done etc.)

Overload Methods in Python (Workarounds)

A simplified version of my problem: I want to write a method in python that takes in one parameter, either a list of strings or my custom object which holds a list of strings. Then return the size of the list. The method is specifically for a user to call so I want it to be simple for the user (essentially I don't want two methods doing the same exact thing except for a single line of code and I don't want to import non python standard libraries)
I realize overloading is not possible in python like it is in Java.
What is a good way to go about this/what is the standard way? The solutions I have thought of are:
Write two different methods.
Write one method with two parameters and defaults, check for defaults move accordingly.
Write one method with one parameter, check what kind of object is passed in, move accordingly (not entirely sure if this type checking is possible)
From a design perspective if statements for each type of object I want to handle does not seem great in the long run, but I don't see any other solutions (besides separate methods)
Thank you for suggestions!
In python, you use a single dispatch function to establish a single method with different implementations based on the argument type (specifically, on the type of the first argument).
from functools import singledispatch
#singledispatch
def process_str_list(str_list):
raise NotImplementedError
#process_str_list.register(list)
def _(str_list):
# process raw list of strings
#process_str_list.register(MyStrListClass)
def _(str_list)
# process my object
To invoke the function, simply call process_str_list with your raw list or object. The type determination and implementation multiplexing takes place internally.
EDIT: Just wanted to add that the PEP that introduced single dispatch says:
It is currently a common anti-pattern for Python code to inspect the types of received arguments, in order to decide what to do with the objects.
Single dispatch is the pythonic way to approach that behavior.
As alfasin suggested, you can implement a __len__ function for your object;
class A(object):
def __init__(self, mylist):
self.mylist = mylist
def __len__(self):
return len(self.mylist)
For more complex distinctions, you can use isinstance at the function level:
def myfunction(obj):
if isinstance(obj, list):
# when the object is a list
elif isinstance(obj, MyClass):
# when the object is something else
else:
raise ValueError('wrong type!')

Custom double star operator for a class?

How does one implement custom double star operator (**) for unpacking, similar to how __iter__ works with single star operator (*)?
For example:
class PlayerManager(object):
def __init__(self, players=None):
self.players = players or []
# Made up method to support ** operator
def __dict_iter__(self):
for player in self.players:
yield get_steamid(player), player
def print_players(**players):
print(players)
player_manager = PlayerManager([list, of, players])
print_players(**player_manager)
Output:
{
'STEAM_0:0:02201': <Player object at 0x0000000000>,
'STEAM_0:0:10232': <Player object at 0x0000000064>,
'STEAM_0:0:73602': <Player object at 0x0000000128>
}
As #ShadowRanger says, implement Mapping. Here's an example:
from collections.abc import Mapping
class Foo(Mapping):
def __iter__(self):
yield "a"
yield "b"
def __len__(self):
return 2
def __getitem__(self, item):
return ord(item)
f = Foo()
print(*f)
print(dict(**f))
The program outputs:
a b
{'a': 97, 'b': 98}
Implement the Mapping ABC. Technically, the language docs don't specify which Mapping methods are used, so assuming you only need some subset used by the current implementation is a bad idea. All it says is:
If the syntax **expression appears in the function call, expression must evaluate to a mapping, the contents of which are treated as additional keyword arguments. In the case of a keyword appearing in both expression and as an explicit keyword argument, a TypeError exception is raised.
So if you implement the Mapping ABC, you definitely have the right interfaces, regardless of whether it relies on .items(), direct iteration and __getitem__ calls, etc.
FYI, on checking, the behavior in CPython 3.5 definitely dependent on how you implement Mapping (if you inherit from dict, it uses an optimized path that directly accesses dict internals, if you don't, it iterates .keys() and looks up each key as it goes). So yeah, don't cut corners, implement the whole ABC. Thanks to default implementations inherited from the Mapping ABC and it's parents, this can be done with as little as:
class MyMapping(Mapping):
def __getitem__(self, key):
...
def __iter__(self):
...
def __len__(self):
...
The default implementations you inherit may be suboptimal in certain cases (e.g. items and values would do semi-evil stuff involving iteration and look up, where direct accessors might be faster depending on internals), so if you're using it for other purposes, I'd suggest overriding those with optimized versions.

Why doesn't Python have a hybrid getattr + __getitem__ built in?

I have methods that accept dicts or other objects and the names of "fields" to fetch from those objects. If the object is a dict then the method uses __getitem__ to retrieve the named key, or else it uses getattr to retrieve the named attribute. This is pretty common in web templating languages. For example, in a Chameleon template you might have:
<p tal:content="foo.keyname">Stuff goes here</p>
If you pass in foo as a dict like {'keyname':'bar'}, then foo.keyname fetches the 'keyname' key to get 'bar'. If foo is an instance of a class like:
class Foo(object):
keyname = 'baz'
then foo.keyname fetches the value from the keyname attribute. Chameleon itself implements that function (in the chameleon.py26 module) like this:
def lookup_attr(obj, key):
try:
return getattr(obj, key)
except AttributeError as exc:
try:
get = obj.__getitem__
except AttributeError:
raise exc
try:
return get(key)
except KeyError:
raise exc
I've implemented it in my own package like:
try:
value = obj[attribute]
except (KeyError, TypeError):
value = getattr(obj, attribute)
The thing is, that's a pretty common pattern. I've seen that method or one awfully similar to it in a lot of modules. So why isn't something like it in the core of the language, or at least in one of the core modules? Failing that, is there a definitive way of how that could should be written?
I sort of half-read your question, wrote the below, and then reread your question and realized I had answered a subtly different question. But I think the below actually still provides an answer after a sort. If you don't think so, pretend instead that you had asked this more general question, which I think includes yours as a sub-question:
"Why doesn't Python provide any built-in way to treat attributes and items as interchangable?"
I've given a fair bit of thought to this question, and I think the answer is very simple. When you create a container type, it's very important to distinguish between attributes and items. Any reasonably well-developed container type will have a number of attributes -- often though not always methods -- that enable it to manage its contents in graceful ways. So for example, a dict has items, values, keys, iterkeys and so on. These attributes are all accessed using . notation. Items, on the other hand, are accessed using [] notation. So there can be no collisions.
What happens when you enable item access using . notation? Suddenly you have overlapping namespaces. How do you handle collisions now? If you subclass a dict and give it this functionality, either you can't use keys like items as a rule, or you have to create some kind of namespace hierarchy. The first option creates a rule that is onerous, hard to follow, and hard to enforce. The second option creates an annoying amount of complexity, without fully resolving the collision problem, since you still have to have an alternative interface to specify whether you want items the item or items the attribute.
Now, for certain kinds of very primitive types, this is acceptable. That's probably why there's namedtuple in the standard library, for example. (But note that namedtuple is subject to these very problems, which is probably why it was implemented as a factory function (prevents inheritance) and uses weird, private method names like _asdict.)
It's also very, very, very easy to create a subclass of object with no (public) attributes and use setattr on it. It's even pretty easy to override __getitem__, __setitem__, and __delitem__ to invoke __getattribute__, __setattr__ and __delattr__, so that item access just becomes syntactic sugar for getattr(), setattr(), etc. (Though that's a bit more questionable since it creates somewhat unexpected behavior.)
But for any kind of well-developed container class that you want to be able to expand and inherit from, adding new, useful attributes, a __getattr__ + __getitem__ hybrid would be, frankly, an enormous PITA.
The closest thing in the python standard library is a namedtuple(), http://docs.python.org/dev/library/collections.html#collections.namedtuple
Foo = namedtuple('Foo', ['key', 'attribute'])
foo = Foo(5, attribute=13)
print foo[1]
print foo.key
Or you can easily define your own type that always actually stores into it's dict but allows the appearance of attribute setting and getting:
class MyDict(dict):
def __getattr__(self, attr):
return self[attr]
def __setattr__(self, attr, value):
self[attr] = value
d = MyDict()
d.a = 3
d[3] = 'a'
print(d['a']) # 3
print(d[3]) # 'a'
print(d['b']) # Returns a keyerror
But don't do d.3 because that's a syntax error. There are of course more complicated ways out there of making a hybrid storage type like this, search the web for many examples.
As far as how to check both, the Chameleon way looks thorough. When it comes to 'why isn't there a way to do both in the standard library' it's because ambiguity is BAD. Yes, we have ducktyping and all other kinds of masquerading in python, and classes are really just dictionaries anyway, but at some point we want different functionality from a container like a dict or list than we want from a class, with it's method resolution order, overriding, etc.
You can pretty easily write your own dict subclass that natively behaves this way. A minimal implementation, which I like to call a "pile" of attributes, is like so:
class Pile(dict):
# raise AttributeError for missing key here to fulfill API
def __getattr__(self, key):
if key in self:
return self[key]
else:
raise AttributeError(key)
def __setattr__(self, key, value):
self[key] = value
Unfortunately if you need to be able to deal with either dictionaries or attribute-laden objects passed to you, rather than having control of the object from the beginning, this won't help.
In your situation I would probably use something very much like what you have, except break it out into a function so I don't have to repeat it all the time.

Wrapping a Python Object

I'd like to serialize Python objects to and from the plist format (this can be done with plistlib). My idea was to write a class PlistObject which wraps other objects:
def __init__(self, anObject):
self.theObject = anObject
and provides a "write" method:
def write(self, pathOrFile):
plistlib.writeToPlist(self.theObject.__dict__, pathOrFile)
Now it would be nice if the PlistObject behaved just like wrapped object itself, meaning that all attributes and methods are somehow "forwarded" to the wrapped object. I realize that the methods __getattr__ and __setattr__ can be used for complex attribute operations:
def __getattr__(self, name):
return self.theObject.__getattr__(name)
But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
How can I avoid this? If the whole idea seems like a bad one, tell me too.
Unless I'm missing something, this will work just fine:
def __getattr__(self, name):
return getattr(self.theObject, name)
Edit: for those thinking that the lookup of self.theObject will result in an infinite recursive call to __getattr__, let me show you:
>>> class Test:
... a = "a"
... def __init__(self):
... self.b = "b"
... def __getattr__(self, name):
... return 'Custom: %s' % name
...
>>> Test.a
'a'
>>> Test().a
'a'
>>> Test().b
'b'
>>> Test().c
'Custom: c'
__getattr__ is only called as a last resort. Since theObject can be found in __dict__, no issues arise.
But then of course I run into the problem that the constructor now produces an infinite recursion, since also self.theObject = anObject tries to access the wrapped object.
That's why the manual suggests that you do this for all "real" attribute accesses.
theobj = object.__getattribute__(self, "theObject")
I'm glad to see others have been able to help you with the recursive call to __getattr__. Since you've asked for comments on the general approach of serializing to plist, I just wanted to chime in with a few thoughts.
Python's plist implementation handles basic types only, and provides no extension mechanism for you to instruct it on serializing/deserializing complex types. If you define a custom class, for example, writePlist won't be able to help, as you've discovered since you're passing the instance's __dict__ for serialization.
This has a couple implications:
You won't be able to use this to serialize any objects that contain other objects of non-basic type without converting them to a __dict__, and so-on recursively for the entire network graph.
If you roll your own network graph walker to serialize all non-basic objects that can be reached, you'll have to worry about circles in the graph where one object has another in a property, which in turn holds a reference back to the first, etc etc.
Given then, you may wish to look at pickle instead as it can handle all of these and more. If you need the plist format for other reasons, and you're sure you can stick to "simple" object dicts, then you may wish to just use a simple function... trying to have the PlistObject mock every possible function in the contained object is an onion with potentially many layers as you need to handle all the possibilities of the wrapped instance.
Something as simple as this may be more pythonic, and keep the usability of the wrapped object simpler by not wrapping it in the first place:
def to_plist(obj, f_handle):
writePlist(obj.__dict__, f_handle)
I know that doesn't seem very sexy, but it is a lot more maintainable in my opinion than a wrapper given the severe limits of the plist format, and certainly better than artificially forcing all objects in your application to inherit from a common base class when there's nothing in your business domain that actually indicates those disparate objects are related.

Categories