Suppose I have the following classes:
class base(object):
def __init__(self, name):
self.name = name
self.last_x = 0.0
def calc(self, x):
return x
class A(base):
def calc(self, x):
return f_A(x)
class B(base):
def calc(self, x):
return f_B(x)
...
Each of the lettered classes is basically a wrapper for a corresponding lettered function f_A, f_B. The class instances include a state variable self.last_x as well as the lettered functions are assumed to be state-dependent (i.e. a Markov Chain type process).
What I would like to do is to define dependency chains between instances of these classes in order to try out different functional convolutions. For example, if we wanted to calculate a chain [a, b] on a numerical input value x we would have to do
a = A('firstnode')
b = B('secondnode')
res = b.calc(a.calc(x))
The goal is to do this with arbitrarily long chains, while also being able to access results from each intermediate calculation. I.e. if the chain is [a, b, c] I would like to make accessible results of [a] and [a, b] as well (which is why I included a name string for each node in my current implementation).
What would be the right way to setup my classes and data structures for this use case?
So far I have a fairly heavy-handed solution involving multiple dictionaries to keep track of things, but it feels inelegant and I think I might be missing out on something obvious.
Unfortunately you're improperly reusing names (thus hiding their previous values). E.g, after:
a = A('firstnode')
calling a.calc will try to call this instance (since the assignment has replaced the fact that previously name a was bound to a function) and fail. Best would be to use more sensible naming. If for some reason that's not practical, you need to bind the function names internally at class definition time:
class A(base):
def calc(self, x, a=a):
return a(x)
where the a=a does the trick, and so forth.
Having passed that hurdle, the second one is that you want the last result of each class to be saved, but, you don't save it. So, change the code to e.g
class A(base):
def calc(self, x, a=a):
self.last_result = a(x)
return self.last_result
Once that is done, performing your desired operation on a list of class instances is the least of your problems. E.g
def doit(instances, x):
curr = x
for inst in instances: curr=inst.calc(curr)
return curr
and after this
[inst.last_result for inst in instances]
will give you the intermediate results you're looking for.
Related
I would like to define a class that does something like:
Class computer():
def __init__(self, x):
# compute first the 'helper' properties
self.prop1 = self.compute_prop1(x)
self.prop2 = self.compute_prop2(x)
# then compute the property that depends on 'helpers'
self.prop3 = self.compute_prop3(x)
def compute_prop1(self, x):
return x
def compute_prop2(self, x):
return x*x
def compute_prop3(self, x):
return self.prop1 + self.prop2
Then, when I initialize an instance, I get all properties computed in order (first helpers, then everything depending on helpers later):
>>> computer = Computer(3)
>>> computer.__dict__
{'prop1': 3, 'prop2': 9, 'prop3': 12}
However, I think there is a better practice of writing this code, for example using decorators. Could you please give me some hints? Thank you!
Here's your class using properties instead (with an added method for returning each property):
Class PropertyComputer:
def __init__(self, x):
self._x = x
#property
def prop1(self):
return self._x
#property
def prop2(self):
return self._x * self._x
#property
def prop3(self):
return self.prop1 + self.prop2
def get_props(self):
return self.prop1, self.prop2, self.prop3
Design-wise, I believe this is better because:
storing x as an instance variable makes more sense: the point of using objects is to avoid having to pass variables around, especially those that the object itself can keep track of;
the attribute assignment and its corresponding calculation are bundled together in each property-decorated method: we'll never have to think whether the problem is in the init method (where you define the attribute) or in the compute method (where the logic for the attribute's calculation is laid out).
Note that the concept of "first calculate helpers, then the properties depending on them" does not really apply to this code: we only need to evaluate prop3 if/when we actually need it. If we never access it, we never need to compute it.
A "bad" side-effect of using properties, compared to your example, is that these properties are not "stored" anywhere (hence why I added the last method):
c = PropertyComputer(x=2)
c.__dict__ # outputs {'_x': 2}
Also note that, using decorators, the attributes are calculated on-the-fly whenever you access them, instead of just once in the init method. In this manner, property-decorated methods work like methods, but are accessed like attributes (it's the whole point of using them):
c = PropertyComputer(x=2)
c.prop1 # outputs 2
c._x = 10
c.prop1 # outputs 10
As a side note, you can use functools.cached_property to cache the evaluation of one of these properties, in case it's computationally expensive.
I think the following would be the easiest way to avoid redundancy
class computer():
def __init__(self, x):
self.prop_dict = self.compute_prop_dict(x)
def compute_prop_dict(self, x):
prop1 = x
prop2 = x*x
return {'prop1': prop1, 'prop2': prop2, 'prop3': prop1 + prop2}
So anything that would come after instantiation could have access to these helpers via the prop_dict
But as said by Brian as a comment this order is just a language specification for Python 3.7
I have a class where I want to initialize an attribute self.listN and an add_to_listN method for each element of a list, e.g. from attrs = ['list1', 'list2'] I want list1 and list2 to be initialized as empty lists and the methods add_to_list1 and add_to_list2 to be created. Each add_to_listN method should take two parameters, say value and unit, and append a tuple (value, unit) to the corresponding listN.
The class should therefore look like this in the end:
class Foo():
def __init__(self):
self.list1 = []
self.list1 = []
def add_to_list1(value, unit):
self.list1.append((value, unit))
def add_to_list2(value, unit):
self.list2.append((value, unit))
Leaving aside all the checks and the rest of the class, I came up with this:
class Foo():
def __init__(self):
for attr in ['list1', 'list2']:
setattr(self, attr, [])
setattr(self, 'add_to_%s' % attr, self._simple_add(attr))
def _simple_add(self, attr):
def method(value, unit=None):
getattr(self, attr).append((value, unit))
return method
I also checked other solutions such as the ones suggested here and I would like to do it "right", so my questions are:
Are/Should these methods (be) actually classmethods or not?
Is there a cost in creating the methods in __init__, and in this case is there an alternative?
Where is the best place to run the for loop and add these methods? Within the class definition? Out of it?
Is the use of metaclasses recommended in this case?
Update
Although Benjamin Hodgson makes some good points, I'm not asking for a (perhaps better) alternative way to do this but for the best way to use the tools that I mentioned. I'm using a simplified example in order not to focus on the details.
To further clarify my questions: the add_to_listN methods are meant to be additional, not to replace setters/getters (so I still want to be able to do l1 = f.list1 and f.list1 = [] with f = Foo()).
You are making a design error. You could override __getattr__, parse the attribute name, and return a closure which does what you want, but it's strange to dynamically generate methods, and strange code is bad code. There are often situations where you need to do it, but this is not one of them.
Instead of generating n methods which each do the same thing to one of n objects, why not just write one method which is parameterised by n? Something roughly like this:
class Foo:
def __init__(self):
self.lists = [
[],
[]
]
def add(self, row, value):
self.lists[row].append(value)
Then foo.add1(x) becomes simply foo.add(1, x); foo.add2(x) becomes foo.add(2, x), and so on. There's one method, parameterised along the axis of variation, which serves all cases - rather than a litany of ad-hoc generated methods. It's much simpler.
Don't mix up the data in your system with the names of the data in your system.
I have started learning python classes some time ago, and there is something that I do not understand when it comes to usage of self.variables inside of a class. I googled, but couldn't find the answer. I am not a programmer, just a python hobbyist.
Here is an example of a simple class, with two ways of defining it:
1)first way:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def firstMethod(self):
self.d = self.a + 1
self.e = self.b + 2
def secondMethod(self):
self.f = self.c + 3
def addMethod(self):
return self.d + self.e + self.f
myclass = Testclass(10,20,30)
myclass.firstMethod()
myclass.secondMethod()
addition = myclass.addMethod()
2)second way:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def firstMethod(self):
d = self.a + 1
e = self.b + 2
return d,e
def secondMethod(self):
f = self.c + 3
return f
def addMethod(self, d, e, f):
return d+e+f
myclass = Testclass(10,20,30)
d, e = myclass.firstMethod()
f= myclass.secondMethod()
addition = myclass.addMethod(d,e,f)
What confuses me is which of these two is valid?
Is it better to always define the variables inside the methods (the variables we expect to use later) as self.variables (which would make them global inside of class) and then just call them inside some other method of that class (that would be the 1st way in upper code)?
Or is it better not to define variables inside methods as self.variables, but simply as regular variables, then return at the end of the method. And then "reimport" them back into some other method as its arguments (that would be 2nd way in upper code)?
EDIT: just to make it clear, I do not want to define the self.d, self.e, self.f or d,e,f variables under the init method. I want to define them at some other methods like showed in the upper code.
Sorry for not mentioning that.
Both are valid approaches. Which one is right completely depends on the situation.
E.g.
Where you are 'really' getting the values of a, b, c from
Do you want/need to use them multiple times
Do you want/need to use them within other methods of the class
What does the class represent
Are a b and c really 'fixed' attributes of the class, or do they depend on external factors?
In the example you give in the comment below:
Let's say that a,b,c depend on some outer variables (for example a = d+10, b = e+20, c = f+30, where d,e,f are supplied when instantiating a class: myclass = Testclass("hello",d,e,f)). Yes, let's say I want to use a,b,c (or self.a,self.b,self.c) variables within other methods of the class too.
So in that case, the 'right' approach depends mainly on whether you expect a, b, c to change during the life of the class instance. For example, if you have a class where hte attributes (a,b,c) will never or rarely change, but you use the derived attribures (d,e,f) heavily, then it makes sense to calculate them once and store them. Here's an example:
class Tiger(object):
def __init__(self, num_stripes):
self.num_stripes = num_stripes
self.num_black_stripes = self.get_black_stripes()
self.num_orange_stripes = self.get_orange_stripes()
def get_black_stripes(self):
return self.num_stripes / 2
def get_orange_stripes(self):
return self.num_stripes / 2
big_tiger = Tiger(num_stripes=200)
little_tiger = Tiger(num_stripes=30)
# Now we can do logic without having to keep re-calculating values
if big_tiger.num_black_stripes > little_tiger.num_orange_stripes:
print "Big tiger has more black stripes than little tiger has orange"
This works well because each individual tiger has a fixed number of stripes. If we change the example to use a class for which instances will change often, then out approach changes too:
class BankAccount(object):
def __init__(self, customer_name, balance):
self.customer_name = customer_name
self.balance = balance
def get_interest(self):
return self.balance / 100
my_savings = BankAccount("Tom", 500)
print "I would get %d interest now" % my_savings.get_interest()
# Deposit some money
my_savings.balance += 100
print "I added more money, my interest changed to %d" % my_savings.get_interest()
So in this (somewhat contrived) example, a bank account balance changes frequently - therefore there is no value in storing interest in a self.interest variable - every time balance changes, the interest amount will change too. Therefore it makes sense to calculate it every time we need to use it.
There are a number of more complex approaches you can take to get some benefit from both of these. For example, you can make your program 'know' that interest is linked to balance and then it will temporarily remember the interest value until the balance changes (this is a form of caching - we use more memory but save some CPU/computation).
Unrelated to original question
A note about how you declare your classes. If you're using Python 2, it's good practice to make your own classes inherit from python's built in object class:
class Testclass(object):
def __init__(self, printHello):
Ref NewClassVsClassicClass - Python Wiki:
Python 3 uses there new-style classes by default, so you don't need to explicitly inherit from object if using py3.
EDITED:
If you want to preserve the values inside the object after perform addMethod, for exmaple, if you want call addMethod again. then use the first way. If you just want to use some internal values of the class to perform the addMethod, use the second way.
You really can't draw any conclusions on this sort of question in the absence of a concrete and meaningful example, because it's going to depend on the facts and circumstances of what you're trying to do.
That being said, in your first example, firstMethod() and secondMethod() are just superfluous. They serve no purpose at all other than to compute values that addMethod() uses. Worse, to make addMethod() function, the user has to first make two inexplicable and apparently unrelated calls to firstMethod() and secondMethod(), which is unquestionably bad design. If those two methods actually did something meaningful it might make sense (but probably doesn't) but in the absence of a real example it's just bad.
You could replace the first example by:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def addMethod(self):
return self.a + self.b + self.c + 6
myclass = Testclass(10,20,30)
addition = myclass.addMethod()
The second example is similar, except firstMethod() and secondMethod() actually do something, since they return values. If there was some reason you'd want these values separately for some reason other than passing them to addMethod(), then again, it might make sense. If there wasn't, then again you could define addMethod() as I just did, and dispense with those two additional functions altogether, and there wouldn't be any difference between the two examples.
But this is all very unsatisfactory in the absence of a concrete example. Right now all we can really say is that it's a slightly silly class.
In general, objects in the OOP sense are conglomerates of data (instance variables) and behavior (methods). If a method doesn't access instance variables - or doesn't need to - then it generally should be a standalone function, and not be in a class at all. Once in a while it will make sense to have a class or static method that doesn't access instance variables, but in general you should err towards preferring standalone functions.
I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)
Edit: There was some confusion, but I want to ask a general question about object oriented design in Python.
Consider a class that lets you map data values to counts or frequencies:
class DataMap(dict):
pass
Now consider a subclass that allows you to construct a histogram from a list of data:
class Histogram(DataMap):
def __init__(self, list_of_values):
# 1. Put appropriate super(...) call here if necessary
# 2. Build the map of values to counts in self
pass
Now consider a class that lets you make a smoothed probability mass table rather than a Histogram.
class ProbabilityMass(DataMap):
pass
What is the best way to allow a ProbabilityMass to be constructed from either a Histogram or a list of values?
I "grew up" programming in C++, and in this case I would use an overloaded constructor. In Python I've thought of doing this with:
The constructor takes multiple arguments (all but one of these should == None)
I define from_Histogram and from_list methods
In the second case (which I believe is better), what is the best way to allow the from_list method to use the shared code from the Histogram constructor? A ProbabilityMass table is nearly identical to a Histogram table, but it is scaled so that the sum of all value is 1.0.
If you have come across a similar problem, please share your expertise!
To start with, if you think you want #staticmethod, you almost always don't. Either the function is not part of the class, in which case it should just be a free function, or it is part of the class, but not tied to an instance, and it should be a #classmethod. Your named constructor is a good candidate for a #classmethod.
Also note that you should invoke A.__init__ from B via super(), otherwise multiple inheritance can bite you bad.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
In this case, you don't need a B.from_values_to_counts, it inherits from A, and it will return an instance of B, since that's how it was called.
If you need to do more complex initialization in B, you can, using super(), which looks very similar to the way it would when you use it with instances. after all, a classmethod really isn't anything more complex than an instancemethod where the im_self attribute is assigned to the class itself.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = super(B, cls).from_values_to_counts(values_to_counts)
do_more_initialization(self)
return self