Inheritage from ndarray calls __getitem__ - python

Hi I'm trying to derive a class from ndarray. I'm sticking to the recipe found in docs but I get an error I do not understand, when I override a __getiem__() function. I'm sure this is how it is supposed to work but I do not understand how to do it correctly. My class that basically adds a "dshape" property looks like:
class Darray(np.ndarray):
def __new__(cls, input_array, dshape, *args, **kwargs):
obj = np.asarray(input_array).view(cls)
obj.SelObj = SelObj
obj.dshape = dshape
return obj
def __array_finalize__(self, obj):
if obj is None: return
self.info = getattr(obj, 'dshape', 'N')
def __getitem__(self, index):
return self[index]
when I now try to do:
D = Darray( ones((10,10)), ("T","N"))
the interpreter will fail with a maximum depth recursion, because he calls __getitem__ over and over again.
can someone explain to me why and how one would implement a getitem function?
cheers,
David

can someone explain to me why and how one would implement a getitem function?
For your current code, a __getitem__ isn't needed. Your class works fine (except for the undefined SelObj) when I remove the __getitem__ implementation.
The reason for the maximum recursion depth error is the definition of __getitem__, which uses self[index]: a shorthand notation for self.__getitem__(index). If you must override __getitem__, then make sure you call the superclass implementation of __getitem__:
def __getitem__(self, index):
return super(Darray, self).__getitem__(index)
As for why you'd do this: there are lots of reasons for overriding this function, e.g. you might associate names with the rows of an array:
class NamedRows(np.ndarray):
def __new__(cls, rows, *args, **kwargs):
obj = np.asarray(*args, **kwargs).view(cls)
obj.__row_name_idx = dict((n, i) for i, n in enumerate(rows))
return obj
def __getitem__(self, idx):
if isinstance(idx, basestring):
idx = self.__row_name_idx[idx]
return super(NamedRows, self).__getitem__(idx)
Demo:
>>> a = NamedRows(["foo", "bar"], [[1,2,3], [4,5,6]])
>>> a["foo"]
NamedRows([1, 2, 3])

The problem is here:
def __getitem__(self, index):
return self[index]
foo[index] just calls foo.__getitem__(index). But in your case, that just returns foo[index], which just calls foo.__getitem__(index). Which repeats in an infinite loop until you run out of stack space.
If you want to defer to your parent class, you have to do this:
def __getitem__(self, index):
return super(Darray, self)[index]
… or, maybe more explicitly:
def __getitem__(self, index):
return super(Darray, self).__getitem__(index)

I don't understand why you want to inherit a class from np.ndarray type. You can implement the same idea as above with the standard OOP approach. The following example does the same thing as your code, but more elegent. Instead of subclassing, I am just treating the numpy array as a member of my special object that also contains dshape. It simply creates __getitem__() and __setitem__() to behave exactly like we would subscript a np.ndarray object.
class Darray:
def __init__(self, input_array, dshape):
self.array = np.array(input_array)
self.dshape = dshape
def __getitem__(self, item):
return self.array[item]
def __setitem__(self, item, val):
self.array[item] = val
Now you can write further methods to describe the exact behaviour that you want. Whatever dhape was supposed to do to the inherited array, now to do to self.array member.
The added benefit of this approach is that there is no headache of recursion depth, or __array_finalize__, or super(), or any other pitfalls that can occur in this process of subclassing and overloading. There is always a simpler way for intended use cases.
Edit: In my example above, the __getitem__ method does not work for , separated indices for N dimensional arrays. A fix for that,
def __getitem__(self, *args):
return self.array.__getitem__(*args)

Related

Python Expert: how to inherit built-in class and override every member function w.r.t. the base-class member function?

It is known that in Python, due to optimization concerns, we cannot add/modify member functions of a built-in class, e.g., adding an sed function to the built-in str class to perform re.sub(). Thus, the only way to achieve so is to inherit the class (or subclassing). i.e.,
class String(str):
def __init__(self, value='', **kwargs):
super().__init__()
def sed(self, src, tgt):
return String(re.sub(src, tgt, self))
The problem with this is that after sub-classing, member functions return base-class instance instead of the inherited class instance. For example, I would like to chain String edits String(' A b C d E [!] ').sed(...).lower().sed(...).strip().sed('\[.*\]', '').split() and so on. However, functions such as .lower() and .strip() returns an str instead of String, so cannot perform .sed(...) afterwards. And I do not want to keep casting to String after every function call.
So I did a manual over-ride of every base-class methods as follows:
class String(str):
for func in dir(str):
if not func.startswith('_'):
exec(f'{func}=lambda *args, **kwargs: [(String(i) if type(i)==str else i) for i in [str.{func}(*args, **kwargs)]][0]')
def __init__(self, value='', **kwargs):
super().__init__()
def sed(self, src, tgt):
return String(re.sub(src, tgt, self))
However, not every member function returns a simple str object, e.g., for functions such as .split(), they return a list of str; other functions like .isalpha() or .find() return boolean or integer. In general, I want to add more string-morphing functions and do not want to manually over-ride member functions of each return type in order to return inherited-class objects rather than base-class objects. So is there a more elegant way of doing this? Thanks!
Python's built-in classes are not designed to support that style of inheritance
easily. Also, the whole idea seems flawed to my eye. Even if you do figure out
a way to solve the problem as you've framed it, what's the advantage over good
old functions?
# Special String objects with new methods.
s = String('foo bar')
result = s.sed('...', '...')
# Regular str instances passed to ordinary functions.
s = 'foo bar'
result = sed(s, '...', '...')
That said, here's one way to try. I have not tested it
extensively, it might have a flaw, and I would never use it in real code.
The basic idea is to capture objects returned during low-level
attribute access, and if the object is callable return
a wrapped version of it that will perform the needed
data conversions.
import re
from functools import wraps
class String(str):
def __getattribute__(self, attr):
obj = object.__getattribute__(self, attr)
return wrapped(obj) if callable(obj) else obj
def __init__(self, value='', **kwargs):
super().__init__()
def sed(self, src, tgt):
return re.sub(src, tgt, self)
def wrapped(func):
#wraps(func)
def wrapper(*xs, **kws):
obj = func(*xs, **kws)
return convert(obj)
return wrapper
def convert(obj):
if isinstance(obj, str):
return String(obj)
elif isinstance(obj, list):
return [convert(x) for x in obj]
elif isinstance(obj, tuple):
return tuple(convert(x) for x in obj)
else:
return obj
Demo:
s = String('foo bar')
got = s.sed('foo', 'bzz').upper().split()
print(got)
print(type(got))
print(type(got[0]))
Output:
['BZZ', 'BAR']
<class 'list'>
<class '__main__.String'>

How can I return self and another variable in a python class method while method chaining?

I understand what I am asking here is probably not the best code design, but the reason for me asking is strictly academic. I am trying to understand how to make this concept work.
Typically, I will return self from a class method so that the following methods can be chained together. My understanding is by returning self, I am simply returning an instance of the class, for the following methods to work on.
But in this case, I am trying to figure out how to return both self and another value from the method. The idea is if I do not want to chain, or I do not call any class attributes, I want to retrieve the data from the method being called.
Consider this example:
class Test(object):
def __init__(self):
self.hold = None
def methoda(self):
self.hold = 'lol'
return self, 'lol'
def newmethod(self):
self.hold = self.hold * 2
return self, 2
t = Test()
t.methoda().newmethod()
print(t.hold)
In this case, I will get an AttributeError: 'tuple' object has no attribute 'newmethod' which is to be expected because the methoda method is returning a tuple which does not have any methods or attributes called newmethod.
My question is not about unpacking multiple returns, but more about how can I continue to chain methods when the preceding methods are returning multiple values. I also understand that I can control the methods return with an argument to it, but that is not what I am trying to do.
As mentioned previously, I do realize this is probably a bad question, and I am happy to delete the post if the question doesnt make any sense.
Following the suggestion by #JohnColeman, you can return a special tuple with attribute lookup delegated to your object if it is not a normal tuple attribute. That way it acts like a normal tuple except when you are chaining methods.
You can implement this as follows:
class ChainResult(tuple):
def __new__(cls, *args):
return super(ChainResult, cls).__new__(cls, args)
def __getattribute__(self, name):
try:
return getattr(super(), name)
except AttributeError:
return getattr(super().__getitem__(0), name)
class Test(object):
def __init__(self):
self.hold = None
def methoda(self):
self.hold = 'lol'
return ChainResult(self, 'lol')
def newmethod(self):
self.hold = self.hold * 2
return ChainResult(self, 2)
Testing:
>>> t = Test()
>>> t.methoda().newmethod()
>>> print(t.hold)
lollol
The returned result does indeed act as a tuple:
>>> t, res = t.methoda().newmethod()
>>> print(res)
2
>>> print(isinstance(t.methoda().newmethod(), tuple))
True
You could imagine all sorts of semantics with this, such as forwarding the returned values to the next method in the chain using closure:
class ChainResult(tuple):
def __new__(cls, *args):
return super(ChainResult, cls).__new__(cls, args)
def __getattribute__(self, name):
try:
return getattr(super(), name)
except AttributeError:
attr = getattr(super().__getitem__(0), name)
if callable(attr):
chain_results = super().__getitem__(slice(1, None))
return lambda *args, **kw: attr(*(chain_results+args), **kw)
else:
return attr
For example,
class Test:
...
def methodb(self, *args):
print(*args)
would produce
>>> t = Test()
>>> t.methoda().methodb('catz')
lol catz
It would be nice if you could make ChainResults invisible. You can almost do it by initializing the tuple base class with the normal results and saving your object in a separate attribute used only for chaining. Then use a class decorator that wraps every method with ChainResults(self, self.method(*args, **kw)). It will work okay for methods that return a tuple but a single value return will act like a length 1 tuple, so you will need something like obj.method()[0] or result, = obj.method() to work with it. I played a bit with delegating to tuple for a multiple return or to the value itself for a single return; maybe it could be made to work but it introduces so many ambiguities that I doubt it could work well.

Wrapping homogeneous Python objects

I'm looking for a way to have a collection of homogeneous objects, wrap them in another object, but have the wrapper object have the same API as the original and forward the corresponding API call to its object members.
class OriginalApi:
def __init__(self):
self.a = 1
self.b = "bee"
def do_something(self, new_a, new_b, put_them_together=None):
self.a = new_a or self.a
self.b = new_b or self.b
if put_them_together is not None:
self.b = "{}{}".format(self.a, self.b)
# etc.
class WrappedApi:
def __init__(self):
self.example_1 = OriginalApi()
self.example_2 = OriginalApi()
Some possible solutions that have been considered, but are inadequate:
Rewriting the whole API Why not? Not adequate because the API is fairly large and expanding. Having to maintain the API in multiple spots is not realistic.
Code example:
class WrappedApi:
def __init__(self):
self.example_1 = OriginalApi()
self.example_2 = OriginalApi()
def do_something(self, new_a, new_b, put_them_together=None):
self.example_1.do_something(new_a, new_b, put_them_together)
self.example_2.do_something(new_a, new_b, put_them_together)
Using a list and a for-loop This changes the API on the object. That said, this is the backup solution in the event I can't find something more elegant. In this case, the WrappedApi class would not exist.
Code example:
wrapped_apis = [OriginalApi(), OriginalApi()]
for wrapped_api in wrapped_apis:
wrapped_api.do_something(1, 2, True)
I tried using
Python Object Wrapper, but I could not see how to have it call multiple sub-objects with the same arguments.
And for anyone curious about the use case, it's actually a collection of several matplotlib axes objects. I don't want to reimplement to entire axes API (it's big), and I don't want to change all the code that makes calls on axes (like plot, step, etc.)
If you're only implementing methods then a generic __getattr__ can do the trick
class Wrapper:
def __init__(self, x):
self.x = x
def __getattr__(self, name):
def f(*args, **kwargs):
for y in self.x:
getattr(y, name)(*args, **kwargs)
return f
For example with x = Wrapper([[], [], []]) after calling x.append(12) all the three list objects will have 12 as last element.
Note that the return value will always be None... an option could be collecting return values and returning them as a list but this of course would "break the API".
I think you have the right idea here
wrapped_apis = [OriginalApi(), OriginalApi()]
for wrapped_api in wrapped_apis:
wrapped_api.do_something(1, 2, True)
You can define your wrapper class by inheriting from list and then handle the API calls to its items once it is created.
class WrapperClass(list):
def __init__(self, api_type):
self.api_type = api_type
for func in dir(api_type):
if callable(getattr(api_type, func)) and not func.startswith("__"):
setattr(self, func, lambda *args, **kwargs:
[getattr(o, func)(*args, **kwargs) for o in self])
w = WrapperClass(OriginalApi)
o1, o2 = [OriginalApi()]*2
w.append(o1)
w.append(o2)
print(w.do_something(1, 2, True))
# [None, None]
print(w[0].b)
# 12
print(w[1].b)
# 12
print(o1.b)
# 12
Here, I'm iterating every method in your API class and creating a method in the wrapper class that applies its arguments to all its list items. It then returns a list comprehension consisting of the results.
Needless to say, you should probably validate the type of a new object being appended to this WrapperClass like so,
def append(self, item):
if not isinstance(item, self.api_type):
raise TypeError('Wrong API type. Expected %s'.format(self.api_type))
super(WrapperClass, self).append(item)

Accessing self from outside of a class

I'm attempting to implement a decorator on certain methods in a class so that if the value has NOT been calculated yet, the method will calculate the value, otherwise it will just return the precomputed value, which is stored in an instance defaultdict. I can't seem to figure out how to access the instance defaultdict from inside of a decorator declared outside of the class. Any ideas on how to implement this?
Here are the imports (for a working example):
from collections import defaultdict
from math import sqrt
Here is my decorator:
class CalcOrPass:
def __init__(self, func):
self.f = func
#if the value is already in the instance dict from SimpleData,
#don't recalculate the values, instead return the value from the dict
def __call__(self, *args, **kwargs):
# can't figure out how to access/pass dict_from_SimpleData to here :(
res = dict_from_SimpleData[self.f.__name__]
if not res:
res = self.f(*args, **kwargs)
dict_from_SimpleData[self.f__name__] = res
return res
And here's the SimpleData class with decorated methods:
class SimpleData:
def __init__(self, data):
self.data = data
self.stats = defaultdict() #here's the dict I'm trying to access
#CalcOrPass
def mean(self):
return sum(self.data)/float(len(self.data))
#CalcOrPass
def se(self):
return [i - self.mean() for i in self.data]
#CalcOrPass
def variance(self):
return sum(i**2 for i in self.se()) / float(len(self.data) - 1)
#CalcOrPass
def stdev(self):
return sqrt(self.variance())
So far, I've tried declaring the decorator inside of SimpleData, trying to pass multiple arguments with the decorator(apparently you can't do this), and spinning around in my swivel chair while trying to toss paper airplanes into my scorpion tank. Any help would be appreciated!
The way you define your decorator the target object information is lost. Use a function wrapper instead:
def CalcOrPass(func):
#wraps(func)
def result(self, *args, **kwargs):
res = self.stats[func.__name__]
if not res:
res = func(self, *args, **kwargs)
self.stats[func.__name__] = res
return res
return result
wraps is from functools and not strictly necessary here, but very convenient.
Side note: defaultdict takes a factory function argument:
defaultdict(lambda: None)
But since you're testing for the existence of the key anyway, you should prefer a simple dict.
You can't do what you want when your function is defined, because it is unbound. Here's a way to achieve it in a generic fashion at runtime:
class CalcOrPass(object):
def __init__(self, func):
self.f = func
def __get__(self, obj, type=None): # Cheat.
return self.__class__(self.f.__get__(obj, type))
#if the value is already in the instance dict from SimpleData,
#don't recalculate the values, instead return the value from the dict
def __call__(self, *args, **kwargs):
# I'll concede that this doesn't look very pretty.
# TODO handle KeyError here
res = self.f.__self__.stats[self.f.__name__]
if not res:
res = self.f(*args, **kwargs)
self.f.__self__.stats[self.f__name__] = res
return res
A short explanation:
Our decorator defines __get__ (and is hence said to be a descriptor). Whereas the default behaviour for an attribute access is to get it from the object's dictionary, if the descriptor method is defined, Python will call that instead.
The case with objects is that object.__getattribute__ transforms an access like b.x into type(b).__dict__['x'].__get__(b, type(b))
This way we can access the bound class and its type from the descriptor's parameters.
Then we create a new CalcOrPass object which now decorates (wraps) a bound method instead of the old unbound function.
Note the new style class definition. I'm not sure if this will work with old-style classes, as I haven't tried it; just don't use those. :) This will work for both functions and methods, however.
What happens to the "old" decorated functions is left as an exercise.

A python class that acts like dict

I want to write a custom class that behaves like dict - so, I am inheriting from dict.
My question, though, is: Do I need to create a private dict member in my __init__() method?. I don't see the point of this, since I already have the dict behavior if I simply inherit from dict.
Can anyone point out why most of the inheritance snippets look like the one below?
class CustomDictOne(dict):
def __init__(self):
self._mydict = {}
# other methods follow
Instead of the simpler...
class CustomDictTwo(dict):
def __init__(self):
# initialize my other stuff here ...
# other methods follow
Actually, I think I suspect the answer to the question is so that users cannot directly access your dictionary (i.e. they have to use the access methods that you have provided).
However, what about the array access operator []? How would one implement that? So far, I have not seen an example that shows how to override the [] operator.
So if a [] access function is not provided in the custom class, the inherited base methods will be operating on a different dictionary?
I tried the following snippet to test out my understanding of Python inheritance:
class myDict(dict):
def __init__(self):
self._dict = {}
def add(self, id, val):
self._dict[id] = val
md = myDict()
md.add('id', 123)
print md[id]
I got the following error:
KeyError: < built-in function id>
What is wrong with the code above?
How do I correct the class myDict so that I can write code like this?
md = myDict()
md['id'] = 123
[Edit]
I have edited the code sample above to get rid of the silly error I made before I dashed away from my desk. It was a typo (I should have spotted it from the error message).
class Mapping(dict):
def __setitem__(self, key, item):
self.__dict__[key] = item
def __getitem__(self, key):
return self.__dict__[key]
def __repr__(self):
return repr(self.__dict__)
def __len__(self):
return len(self.__dict__)
def __delitem__(self, key):
del self.__dict__[key]
def clear(self):
return self.__dict__.clear()
def copy(self):
return self.__dict__.copy()
def has_key(self, k):
return k in self.__dict__
def update(self, *args, **kwargs):
return self.__dict__.update(*args, **kwargs)
def keys(self):
return self.__dict__.keys()
def values(self):
return self.__dict__.values()
def items(self):
return self.__dict__.items()
def pop(self, *args):
return self.__dict__.pop(*args)
def __cmp__(self, dict_):
return self.__cmp__(self.__dict__, dict_)
def __contains__(self, item):
return item in self.__dict__
def __iter__(self):
return iter(self.__dict__)
def __unicode__(self):
return unicode(repr(self.__dict__))
o = Mapping()
o.foo = "bar"
o['lumberjack'] = 'foo'
o.update({'a': 'b'}, c=44)
print 'lumberjack' in o
print o
In [187]: run mapping.py
True
{'a': 'b', 'lumberjack': 'foo', 'foo': 'bar', 'c': 44}
Like this
class CustomDictOne(dict):
def __init__(self,*arg,**kw):
super(CustomDictOne, self).__init__(*arg, **kw)
Now you can use the built-in functions, like dict.get() as self.get().
You do not need to wrap a hidden self._dict. Your class already is a dict.
Check the documentation on emulating container types. In your case, the first parameter to add should be self.
UserDict from the Python standard library is designed for this purpose.
Here is an alternative solution:
class AttrDict(dict):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.__dict__ = self
a = AttrDict()
a.a = 1
a.b = 2
This is my best solution. I used this many times.
class DictLikeClass:
...
def __getitem__(self, key):
return getattr(self, key)
def __setitem__(self, key, value):
setattr(self, key, value)
...
You can use like:
>>> d = DictLikeClass()
>>> d["key"] = "value"
>>> print(d["key"])
A python class that acts like dict
What's wrong with this?
Can anyone point out why most of the inheritance snippets look like the one below?
class CustomDictOne(dict):
def __init__(self):
self._mydict = {}
Presumably there's a good reason to inherit from dict (maybe you're already passing one around and you want a more specific kind of dict) and you have a good reason to instantiate another dict to delegate to (because this will instantiate two dicts per instance of this class.) But doesn't that sound incorrect?
I never run into this use-case myself. I do like the idea of typing dicts where you are using dicts that are type-able. But in that case I like the idea of typed class attributes even moreso - and the whole point of a dict is you can give it keys of any hashable type, and values of any type.
So why do we see snippets like this? I personally think it's an easily made mistake that went uncorrected and thus perpetuated over time.
I would rather see, in these snippets, this, to demonstrate code reuse through inheritance:
class AlternativeOne(dict):
__slots__ = ()
def __init__(self):
super().__init__()
# other init code here
# new methods implemented here
or, to demonstrate re-implementing the behavior of dicts, this:
from collections.abc import MutableMapping
class AlternativeTwo(MutableMapping):
__slots__ = '_mydict'
def __init__(self):
self._mydict = {}
# other init code here
# dict methods reimplemented and new methods implemented here
By request - adding slots to a dict subclass.
Why add slots? A builtin dict instance doesn't have arbitrary attributes:
>>> d = dict()
>>> d.foo = 'bar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'foo'
If we create a subclass the way most are doing it here on this answer, we see we don't get the same behavior, because we'll have a __dict__ attribute, causing our dicts to take up to potentially twice the space:
my_dict(dict):
"""my subclass of dict"""
md = my_dict()
md.foo = 'bar'
Since there's no error created by the above, the above class doesn't actually act, "like dict."
We can make it act like dict by giving it empty slots:
class my_dict(dict):
__slots__ = ()
md = my_dict()
So now attempting to use arbitrary attributes will fail:
>>> md.foo = 'bar'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'my_dict' object has no attribute 'foo'
And this Python class acts more like a dict.
For more on how and why to use slots, see this Q&A: Usage of __slots__?
I really don't see the right answer to this anywhere
class MyClass(dict):
def __init__(self, a_property):
self[a_property] = a_property
All you are really having to do is define your own __init__ - that really is all that there is too it.
Another example (little more complex):
class MyClass(dict):
def __init__(self, planet):
self[planet] = planet
info = self.do_something_that_returns_a_dict()
if info:
for k, v in info.items():
self[k] = v
def do_something_that_returns_a_dict(self):
return {"mercury": "venus", "mars": "jupiter"}
This last example is handy when you want to embed some kind of logic.
Anyway... in short class GiveYourClassAName(dict) is enough to make your class act like a dict. Any dict operation you do on self will be just like a regular dict.
The problem with this chunk of code:
class myDict(dict):
def __init__(self):
self._dict = {}
def add(id, val):
self._dict[id] = val
md = myDict()
md.add('id', 123)
...is that your 'add' method (...and any method you want to be a member of a class) needs to have an explicit 'self' declared as its first argument, like:
def add(self, 'id', 23):
To implement the operator overloading to access items by key, look in the docs for the magic methods __getitem__ and __setitem__.
Note that because Python uses Duck Typing, there may actually be no reason to derive your custom dict class from the language's dict class -- without knowing more about what you're trying to do (e.g, if you need to pass an instance of this class into some code someplace that will break unless isinstance(MyDict(), dict) == True), you may be better off just implementing the API that makes your class sufficiently dict-like and stopping there.
Don’t inherit from Python built-in dict, ever! for example update method woldn't use __setitem__, they do a lot for optimization. Use UserDict.
from collections import UserDict
class MyDict(UserDict):
def __delitem__(self, key):
pass
def __setitem__(self, key, value):
pass

Categories