I'm working on an abstraction layer to a database, and I have a super class defined similar to this:
class Test():
__init__(self, object):
self.obj = object
#classmethod
def find_object(cls, **kwargs):
# Code to search for object to put in parameter using kwargs.
return cls(found_object)
I then break down that superclass into subclasses that are more specific to the objects they represent.
class Test_B(Test):
# Subclass defining more specific version of Test.
Now, each separate subclass of Test has predefined search criteria. For example, Test_B needs an object with a = 10, b = 30, c = "Pie".
Which would be more "Pythonic"? Using the find_object method from the super class:
testb = Test_B.find_object(a=10, b=30, c="Pie")
or to overwrite the find_object method to expect a, b, and c as parameters:
#classmethod
def find_object(cls, a, b, c):
return super().find_object(a=a, b=b, c=c)
testb = Test_B.find_object(10, 30, "Pie")
First one. "Explicit is better than implicit" - Zen of Python: line 2
Test.find_object isn't intended to be used directly, so I would name it
#classmethod
def _find_object(cls, **kwargs):
...
then have each child class call it to implement its own find_object:
#classmethod
def find_object(cls, a, b, c):
return super()._find_object(a=a, b=b, c=c)
When using super, it's a good idea to preserve the signature of a method if overriding it, because you can never be certain for which class super will return a proxy.
skilsuper - you're right about
Explicit is better than implicit
However, that doesn't mean the first answer is better - you can still apply the same principal on the second solution: find_object(10, 30, "Pie") is implicit, but nothing is stopping you from using find_object(a=10, b=30, c="Pie") (you should use it).
The first solution is problematic, because you might forget an argument (for example, find_object(a=10, b=30)). In that case, the first solution will let it slide, but the second solution will issue a TypeError saying that you're missing an argument.
Related
In this class definition, every parameter occurs three times, which seems to violate the DRY (don't repeat yourself) principle:
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
DRY could be applied like this (Python 3):
class Foo:
def __init__(self, **kwargs):
defaults = dict(a=1, b=2.0, c=[3, 4, 5])
for k, v in defaults.items():
setattr(self, k, type(v)(kwargs[k]) if k in kwargs else v)
# ...detect illegal keywords here...
However, this breaks IDE autocomplete (tried Spyder and Elpy) and pylint will complain if I try to access the attributes later on.
Is there a clean way to handle this?
Edit: The example has three parameters, but I find myself dealing with this when there are 15 parameters, where I only rarely need to override the defaults; often with more complicated types, where I would need to do
if not isinstance(kwargs['x'], SomeClass):
raise TypeError('x: must be SomeClass')
self.x = kwargs['x']
for each of them. Moreover, I can't use mutables as default values for keyword arguments.
Principles like DRY are important, but it's important to keep in mind the rationale for such a principle before blindly applying it -- arguably the biggest advantage of DRY code is that you increase the maintainability of the code by only having to modify it in one place and not having to risk the subtle bugs that can occur with code that is modified in one place and not another. DRY can be antithetical to other common principles like YAGNI and KISS, and choosing the correct balance for your application is important.
In particular, DRY often applies to default values, application logic, and other things that could cause bugs if changed in one place and not another. IMO variable names don't fit in the same way since refactoring the code to change every occurrence of Foo's instance variable of a won't actually break anything by not changing the name in the initializer as well.
With that in mind, we have a simple test for your code. Are these variables likely to change together, or is the initializer for Foo a layer of abstraction that allows a refactoring of the inputs independently of the class's instance variables?
Change Together: I rather like #chepner's answer, and I'd take it one step further. If your class is anything more than a data transfer object you can use #chepner's solution as a way to logically group related pieces of data (which admittedly could be unnecessary in your situation, and without some context it's difficult to choose an optimal way to introduce such an idea), e.g.
from dataclasses import dataclass, field
#dataclass
class MyData:
a: int
b: float
c: list
class Foo:
def __init__(self, my_data):
self.wrapped = my_data
Change Separately: Then just leave it alone, or KISS as they say.
As a preface, your code
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
is, as mentioned in several comments, fine as it is. Code is read far more than it is written, and aside from needing to be careful to avoid typos in the names when first defining this, the intent is perfectly clear. (Though see the end of the answer regarding the default value of c.)
If you are using Python 3.7, you can use a data class to reduce the number of references you make to each variable.
from dataclasses import dataclass, field
from typing import List
#dataclass
class Foo:
a: int = 1
b: float = 2.0
c: List[int] = field(default_factory=lambda: [3,4,5])
This doesn't prevent you from violating the type hints (Foo("1") will happily set a = "1" instead of a = 1 or raising an error), but it's typically the responsibility of the caller to provide arguments of the correct type.) If you really want to enforce this at run-time, you can add a __post_init__ method:
def __post_init__(self):
self.a = int(self.a)
self.b = float(self.b)
self.c = list(self.c)
But if you do that, you may as well go back to your original hand-coded __init__ method.
As an aside, the standard idiom for mutable default arguments is
def __init__(self, a=1, b=2.0, c=None):
...
if c is None:
c = [3, 4, 5]
Your approach has two problem:
It requires that list be run for every instantiation, rather than letting the compiler hard-code [3,4,5].
If you were type-hinting the arguments to __init__, your default value doesn't match the intended type. You'd have to write something like
def init(a: int = 1, b: float = 2.0, c : Union[List[Int], Tuple[Int,Int,Int]] = (3,4,5))
A default value of None automatically causes a "promotion" of the type to a corresponding optional type. The following are equivalent:
def __init__(a: int = 1, b: float = 2.0, c : List[Int] = None):
def __init__(a: int = 1, b: float = 2.0, c : Optional[List[Int]] = None):
I have a base class A with some heavy attributes (actually large numpy arrays) that are derived from data given to A's __init__() method.
First, I would like to subclass A into a new class B to perform modifications on these attributes with some B's specific methods. As these attributes are quite intensive to obtain, I don't want to instantiate B the same way as A but better use an A instance to initialize a B object. This is a type casting between A and B and I think I should use the __new__() method to return a B object.
Second, before every computations on B's attributes, I must be sure that the initial state of B has been restored to the current state of the instance of A that has been used for B instantiation, without creating a B object every time, a kind of dynamic linkage...
Here is an example code I wrote:
from copy import deepcopy
import numpy as np
class A(object):
def __init__(self, data):
self.data=data
def generate_derived_attributes(self):
print "generating derived attributes..."
self.derived_attributes = data.copy()
class B(A):
def __new__(cls, obj_a):
assert isinstance(obj_a, A)
cls = deepcopy(obj_a)
cls.__class__ = B
cls._super_cache = obj_a # This is not a copy... no additional memory required
return cls
def compute(self):
# First reset the state (may use a decorator ?)
self.reset()
print "Doing some computations..."
def reset(self):
print "\nResetting object to its initial state"
_super_cache = self._super_cache # For not being destroyed...
self.__dict__ = deepcopy(self._super_cache.__dict__)
self._super_cache = _super_cache
if __name__ == '__main__':
a = A(np.zeros(100000000, dtype=np.float))
a.generate_derived_attributes()
print a
b = B(a)
print b
b.compute()
b.compute()
Is this implementation a kind way to reach my objective with python or is there more Pythonic ways... ? Could I be more generic ? (I know that using __dict__ will not be a good choice in every cases, especially while using __slots__()...). Do you think that using a decorator around B.compute() would give me more flexibility for using this along with other classes ?
In my code I have a class, where one method is responsible for filtering some data. To allow customization for descendants I would like to define filtering function as a class attribute as per below:
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
filter_func = my_filter_func
def filter_data(self, data):
return filter(self.filter_func, data)
class FilterClassDescendant(FilterClass):
filter_func = my_filter_func2
However, such code leads to TypeError, as filter_func receives "self" as first argument.
What is a pythonic way to handle such use cases? Perhaps, I should define my "filter_func" as a regular class method?
You could just add it as a plain old attribute?
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
def __init__(self):
self.filter_func = my_filter_func
def filter_data(self, data):
return filter(self.filter_func, data)
Alternatively, force it to be a staticmethod:
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
filter_func = staticmethod(my_filter_func)
def filter_data(self, data):
return filter(self.filter_func, data)
Python has a lot of magic within. One of those magics has something to do with transforming functions into UnboundMethod objects (when assigned to the class, and not to an class' instance).
When you assign a function (And I'm not sure whether it applies to any callable or just functions), Python converts it to an UnboundMethod object (i.e. an object which can be called using an instance or not).
Under normal conditions, you can call your UnboundMethod as normal:
def myfunction(a, b):
return a + b
class A(object):
a = myfunction
A.a(1, 2)
#prints 3
This will not fail. However, there's a distinct case when you try to call it from an instance:
A().a(1, 2)
This will fail since when an instance gets (say, internal getattr) an attribute which is an UnboundMethod, it returns a copy of such method with the im_self member populated (im_self and im_func are members of UnboundMethod). The function you intended to call, is in the im_func member. When you call this method, you're actually calling im_func with, additionally, the value in im_self. So, the function needs an additional parameter (the first one, which will stand for self).
To avoid this magic, Python has two possible decorators:
If you want to pass the function as-is, you must use #staticmethod. In this case, you will have the function not converted to UnboundMethod. However, you will not be able to access the calling class, except as a global reference.
If you want to have the same, but be able to access the current class (disregarding whether the function it is called from an instance or from a class), then your function should have another first argument (INSTEAD of self: cls) which is a reference to the class, and the decorator to use is #classmethod.
Examples:
class A(object):
a = staticmethod(lambda a, b: a + b)
A.a(1, 2)
A().a(1, 2)
Both will work.
Another example:
def add_print(cls, a, b):
print cls.__name__
return a + b
class A(object):
ap = classmethod(add_print)
class B(A):
pass
A.ap(1, 2)
B.ap(1, 2)
A().ap(1, 2)
B().ap(1, 2)
Check this by yourseld and enjoy the magic.
I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)
Edit: There was some confusion, but I want to ask a general question about object oriented design in Python.
Consider a class that lets you map data values to counts or frequencies:
class DataMap(dict):
pass
Now consider a subclass that allows you to construct a histogram from a list of data:
class Histogram(DataMap):
def __init__(self, list_of_values):
# 1. Put appropriate super(...) call here if necessary
# 2. Build the map of values to counts in self
pass
Now consider a class that lets you make a smoothed probability mass table rather than a Histogram.
class ProbabilityMass(DataMap):
pass
What is the best way to allow a ProbabilityMass to be constructed from either a Histogram or a list of values?
I "grew up" programming in C++, and in this case I would use an overloaded constructor. In Python I've thought of doing this with:
The constructor takes multiple arguments (all but one of these should == None)
I define from_Histogram and from_list methods
In the second case (which I believe is better), what is the best way to allow the from_list method to use the shared code from the Histogram constructor? A ProbabilityMass table is nearly identical to a Histogram table, but it is scaled so that the sum of all value is 1.0.
If you have come across a similar problem, please share your expertise!
To start with, if you think you want #staticmethod, you almost always don't. Either the function is not part of the class, in which case it should just be a free function, or it is part of the class, but not tied to an instance, and it should be a #classmethod. Your named constructor is a good candidate for a #classmethod.
Also note that you should invoke A.__init__ from B via super(), otherwise multiple inheritance can bite you bad.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
In this case, you don't need a B.from_values_to_counts, it inherits from A, and it will return an instance of B, since that's how it was called.
If you need to do more complex initialization in B, you can, using super(), which looks very similar to the way it would when you use it with instances. after all, a classmethod really isn't anything more complex than an instancemethod where the im_self attribute is assigned to the class itself.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = super(B, cls).from_values_to_counts(values_to_counts)
do_more_initialization(self)
return self