DRY principle in Python __init__ method - python

In this class definition, every parameter occurs three times, which seems to violate the DRY (don't repeat yourself) principle:
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
DRY could be applied like this (Python 3):
class Foo:
def __init__(self, **kwargs):
defaults = dict(a=1, b=2.0, c=[3, 4, 5])
for k, v in defaults.items():
setattr(self, k, type(v)(kwargs[k]) if k in kwargs else v)
# ...detect illegal keywords here...
However, this breaks IDE autocomplete (tried Spyder and Elpy) and pylint will complain if I try to access the attributes later on.
Is there a clean way to handle this?
Edit: The example has three parameters, but I find myself dealing with this when there are 15 parameters, where I only rarely need to override the defaults; often with more complicated types, where I would need to do
if not isinstance(kwargs['x'], SomeClass):
raise TypeError('x: must be SomeClass')
self.x = kwargs['x']
for each of them. Moreover, I can't use mutables as default values for keyword arguments.

Principles like DRY are important, but it's important to keep in mind the rationale for such a principle before blindly applying it -- arguably the biggest advantage of DRY code is that you increase the maintainability of the code by only having to modify it in one place and not having to risk the subtle bugs that can occur with code that is modified in one place and not another. DRY can be antithetical to other common principles like YAGNI and KISS, and choosing the correct balance for your application is important.
In particular, DRY often applies to default values, application logic, and other things that could cause bugs if changed in one place and not another. IMO variable names don't fit in the same way since refactoring the code to change every occurrence of Foo's instance variable of a won't actually break anything by not changing the name in the initializer as well.
With that in mind, we have a simple test for your code. Are these variables likely to change together, or is the initializer for Foo a layer of abstraction that allows a refactoring of the inputs independently of the class's instance variables?
Change Together: I rather like #chepner's answer, and I'd take it one step further. If your class is anything more than a data transfer object you can use #chepner's solution as a way to logically group related pieces of data (which admittedly could be unnecessary in your situation, and without some context it's difficult to choose an optimal way to introduce such an idea), e.g.
from dataclasses import dataclass, field
#dataclass
class MyData:
a: int
b: float
c: list
class Foo:
def __init__(self, my_data):
self.wrapped = my_data
Change Separately: Then just leave it alone, or KISS as they say.

As a preface, your code
class Foo:
def __init__(self, a=1, b=2.0, c=(3, 4, 5)):
self.a = int(a)
self.b = float(b)
self.c = list(c)
is, as mentioned in several comments, fine as it is. Code is read far more than it is written, and aside from needing to be careful to avoid typos in the names when first defining this, the intent is perfectly clear. (Though see the end of the answer regarding the default value of c.)
If you are using Python 3.7, you can use a data class to reduce the number of references you make to each variable.
from dataclasses import dataclass, field
from typing import List
#dataclass
class Foo:
a: int = 1
b: float = 2.0
c: List[int] = field(default_factory=lambda: [3,4,5])
This doesn't prevent you from violating the type hints (Foo("1") will happily set a = "1" instead of a = 1 or raising an error), but it's typically the responsibility of the caller to provide arguments of the correct type.) If you really want to enforce this at run-time, you can add a __post_init__ method:
def __post_init__(self):
self.a = int(self.a)
self.b = float(self.b)
self.c = list(self.c)
But if you do that, you may as well go back to your original hand-coded __init__ method.
As an aside, the standard idiom for mutable default arguments is
def __init__(self, a=1, b=2.0, c=None):
...
if c is None:
c = [3, 4, 5]
Your approach has two problem:
It requires that list be run for every instantiation, rather than letting the compiler hard-code [3,4,5].
If you were type-hinting the arguments to __init__, your default value doesn't match the intended type. You'd have to write something like
def init(a: int = 1, b: float = 2.0, c : Union[List[Int], Tuple[Int,Int,Int]] = (3,4,5))
A default value of None automatically causes a "promotion" of the type to a corresponding optional type. The following are equivalent:
def __init__(a: int = 1, b: float = 2.0, c : List[Int] = None):
def __init__(a: int = 1, b: float = 2.0, c : Optional[List[Int]] = None):

Related

how to compare field value with previous one in pydantic validator?

What I tried so far:
from pydantic import BaseModel, validator
class Foo(BaseModel):
a: int
b: int
c: int
class Config:
validate_assignment = True
#validator("b", always=True)
def validate_b(cls, v, values, field):
# field - doesn't have current value
# values - has values of other fields, but not for 'b'
if values.get("b") == 0: # imaginary logic with prev value
return values.get("b") - 1
return v
f = Foo(a=1, b=0, c=2)
f.b = 3
assert f.b == -1 # fails
Also looked up property setters but apparently they don't work with pydantic.
Looks like bug to me, so I made an issue on github: https://github.com/pydantic/pydantic/issues/4888
The way validation is intended to work is stateless. When you create a model instance, validation is run before the instance is even fully initialized.
You mentioned relevant sentence from the documentation about the values parameter:
values: a dict containing the name-to-value mapping of any previously-validated fields
If we ignore assignment for a moment, for your example validator, this means the values for fields that were already validated before b, will be present in that dictionary, which is only the value for a. (Because validators are run in the order fields are defined.) This description is evidently meant for the validators run during initialization, not assignment.
What I would concede is that the documentation is leaves way too much room for interpretation as to what should happen, when validating value assignment. If we take a look at the source code of BaseModel.__setattr__, we can see the intention very clearly though:
def __setattr__(self, name, value):
...
known_field = self.__fields__.get(name, None)
if known_field:
# We want to
# - make sure validators are called without the current value for this field inside `values`
# - keep other values (e.g. submodels) untouched (using `BaseModel.dict()` will change them into dicts)
# - keep the order of the fields
if not known_field.field_info.allow_mutation:
raise TypeError(f'"{known_field.name}" has allow_mutation set to False and cannot be assigned')
dict_without_original_value = {k: v for k, v in self.__dict__.items() if k != name}
value, error_ = known_field.validate(value, dict_without_original_value, loc=name, cls=self.__class__)
...
As you can see, it explicitly states in that comment that values should not contain the current value.
We can observe that this is actually the displayed behavior here:
from pydantic import BaseModel, validator
class Foo(BaseModel):
a: int
b: int
c: int
class Config:
validate_assignment = True
#validator("b")
def validate_b(cls, v: object, values: dict[str, object]) -> object:
print(f"{v=}, {values=}")
return v
if __name__ == "__main__":
print("initializing...")
f = Foo(a=1, b=0, c=2)
print("assigning...")
f.b = 3
Output:
initializing...
v=0, values={'a': 1}
assigning...
v=3, values={'a': 1, 'c': 2}
Ergo, there is no bug here. This is the intended behavior.
Whether this behavior is justified or sensible may be debatable. If you want to debate this, you can open an issue as a question and ask why it was designed this way and argue for a plausible alternative approach.
Though in my personal opinion, what is more strange in the current implementation is that values contains anything at all during assignment. I would argue this is strange since only that one specific value being assigned is what is validated. The way I understand the intent behind values it should only be available during initialization. But that is yet another debate.
What is undoubtedly true is that this behavior of validator methods upon assignment should be explicitly documented. This is also something that may be worth mentioning in the aforementioned issue.

how do I insert a long sequence of "return" variables in many diff locations?

I have a function called preset_parser in mylibrary.py that takes argument filename, i.e. preset_parser(filename), and returns a long list of variables, e.g.
def preset_parser(filename):
... defines variable values based on reading the file
return presetsdf, preset_name, preset_description, preset_instructions, preset_additional_notes, preset_placeholder, pre_user_input, post_user_input, prompt, engine, finetune_model, temperature, max_tokens, top_p, fp, pp, stop_sequence, echo_on, preset_pagetype, preset_db, user, organization
So then I call this function from many other programs, where I do this:
from mylibrary import presets_parser
presetsdf, preset_name, preset_description, preset_instructions, preset_additional_notes, preset_placeholder, pre_user_input, post_user_input, prompt, engine, finetune_model, temperature, max_tokens, top_p, fp, pp, stop_sequence, echo_on, preset_pagetype, preset_db, user, organization = presets_parser(filename)
This is redundant and fragile (it breaks if the list of variables changes). What is the better way to do it? I know it must be simple.
The "general" solution to your problem is to make a class.
class ParseResult:
def __init__(self, a, b, c, d):
self.a = a
self.b = b
self.c = c
self.d = d
There are several ways in Python to automate this pattern. If your return values have some natural sort of "sequencing", you might use namedtuple or its typed cousin NamedTuple.
ParseResult = namedtuple("ParseResult", 'a b c d')
or
class ParseResult(NamedTuple):
a: TypeOfA
b: TypeOfB
c: TypeOfC
d: TypeOfD
This creates a richer "tuple-like" construct with named arguments.
If your return values don't make sense as a tuple and don't have any sort of natural notion of sequence, you can use the more general-purpose dataclass decorator.
#dataclass(frozen=True)
class ParseResult:
a: TypeOfA
b: TypeOfB
c: TypeOfC
d: TypeOfD
Then, in any case, return a value of this new type which has rich names (and possibly types) for its values.
You could return a dictionary instead:
def myfunction():
...
mydict = {
'variable1': value1,
'variable2': value2,
'anothervariable': anothervalue,
'somevariable': somevalue,
}
return mydict
I could think of a couple better ways, but I'm not sure what your limitations are? You could create a class then return the instance you would need to initiate in the parser function:
class Preset:
__init__(self, **kwargs)
self.sdf
self.name
self.description
self.etc...
then in your function:
preset = Preset
preset.sdf = file_data[0] //however you've parsed you data?
preset.name = file_data[1]
or you could make it a dictionary:
preset {
sdf: file_data[0]
name: file_data[1]
}
not sure if you have some limitation on what is returned from your function?

Overwriting Class Methods to define Kwargs - Which is Pythonic?

I'm working on an abstraction layer to a database, and I have a super class defined similar to this:
class Test():
__init__(self, object):
self.obj = object
#classmethod
def find_object(cls, **kwargs):
# Code to search for object to put in parameter using kwargs.
return cls(found_object)
I then break down that superclass into subclasses that are more specific to the objects they represent.
class Test_B(Test):
# Subclass defining more specific version of Test.
Now, each separate subclass of Test has predefined search criteria. For example, Test_B needs an object with a = 10, b = 30, c = "Pie".
Which would be more "Pythonic"? Using the find_object method from the super class:
testb = Test_B.find_object(a=10, b=30, c="Pie")
or to overwrite the find_object method to expect a, b, and c as parameters:
#classmethod
def find_object(cls, a, b, c):
return super().find_object(a=a, b=b, c=c)
testb = Test_B.find_object(10, 30, "Pie")
First one. "Explicit is better than implicit" - Zen of Python: line 2
Test.find_object isn't intended to be used directly, so I would name it
#classmethod
def _find_object(cls, **kwargs):
...
then have each child class call it to implement its own find_object:
#classmethod
def find_object(cls, a, b, c):
return super()._find_object(a=a, b=b, c=c)
When using super, it's a good idea to preserve the signature of a method if overriding it, because you can never be certain for which class super will return a proxy.
skilsuper - you're right about
Explicit is better than implicit
However, that doesn't mean the first answer is better - you can still apply the same principal on the second solution: find_object(10, 30, "Pie") is implicit, but nothing is stopping you from using find_object(a=10, b=30, c="Pie") (you should use it).
The first solution is problematic, because you might forget an argument (for example, find_object(a=10, b=30)). In that case, the first solution will let it slide, but the second solution will issue a TypeError saying that you're missing an argument.

Python classes self.variables

I have started learning python classes some time ago, and there is something that I do not understand when it comes to usage of self.variables inside of a class. I googled, but couldn't find the answer. I am not a programmer, just a python hobbyist.
Here is an example of a simple class, with two ways of defining it:
1)first way:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def firstMethod(self):
self.d = self.a + 1
self.e = self.b + 2
def secondMethod(self):
self.f = self.c + 3
def addMethod(self):
return self.d + self.e + self.f
myclass = Testclass(10,20,30)
myclass.firstMethod()
myclass.secondMethod()
addition = myclass.addMethod()
2)second way:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def firstMethod(self):
d = self.a + 1
e = self.b + 2
return d,e
def secondMethod(self):
f = self.c + 3
return f
def addMethod(self, d, e, f):
return d+e+f
myclass = Testclass(10,20,30)
d, e = myclass.firstMethod()
f= myclass.secondMethod()
addition = myclass.addMethod(d,e,f)
What confuses me is which of these two is valid?
Is it better to always define the variables inside the methods (the variables we expect to use later) as self.variables (which would make them global inside of class) and then just call them inside some other method of that class (that would be the 1st way in upper code)?
Or is it better not to define variables inside methods as self.variables, but simply as regular variables, then return at the end of the method. And then "reimport" them back into some other method as its arguments (that would be 2nd way in upper code)?
EDIT: just to make it clear, I do not want to define the self.d, self.e, self.f or d,e,f variables under the init method. I want to define them at some other methods like showed in the upper code.
Sorry for not mentioning that.
Both are valid approaches. Which one is right completely depends on the situation.
E.g.
Where you are 'really' getting the values of a, b, c from
Do you want/need to use them multiple times
Do you want/need to use them within other methods of the class
What does the class represent
Are a b and c really 'fixed' attributes of the class, or do they depend on external factors?
In the example you give in the comment below:
Let's say that a,b,c depend on some outer variables (for example a = d+10, b = e+20, c = f+30, where d,e,f are supplied when instantiating a class: myclass = Testclass("hello",d,e,f)). Yes, let's say I want to use a,b,c (or self.a,self.b,self.c) variables within other methods of the class too.
So in that case, the 'right' approach depends mainly on whether you expect a, b, c to change during the life of the class instance. For example, if you have a class where hte attributes (a,b,c) will never or rarely change, but you use the derived attribures (d,e,f) heavily, then it makes sense to calculate them once and store them. Here's an example:
class Tiger(object):
def __init__(self, num_stripes):
self.num_stripes = num_stripes
self.num_black_stripes = self.get_black_stripes()
self.num_orange_stripes = self.get_orange_stripes()
def get_black_stripes(self):
return self.num_stripes / 2
def get_orange_stripes(self):
return self.num_stripes / 2
big_tiger = Tiger(num_stripes=200)
little_tiger = Tiger(num_stripes=30)
# Now we can do logic without having to keep re-calculating values
if big_tiger.num_black_stripes > little_tiger.num_orange_stripes:
print "Big tiger has more black stripes than little tiger has orange"
This works well because each individual tiger has a fixed number of stripes. If we change the example to use a class for which instances will change often, then out approach changes too:
class BankAccount(object):
def __init__(self, customer_name, balance):
self.customer_name = customer_name
self.balance = balance
def get_interest(self):
return self.balance / 100
my_savings = BankAccount("Tom", 500)
print "I would get %d interest now" % my_savings.get_interest()
# Deposit some money
my_savings.balance += 100
print "I added more money, my interest changed to %d" % my_savings.get_interest()
So in this (somewhat contrived) example, a bank account balance changes frequently - therefore there is no value in storing interest in a self.interest variable - every time balance changes, the interest amount will change too. Therefore it makes sense to calculate it every time we need to use it.
There are a number of more complex approaches you can take to get some benefit from both of these. For example, you can make your program 'know' that interest is linked to balance and then it will temporarily remember the interest value until the balance changes (this is a form of caching - we use more memory but save some CPU/computation).
Unrelated to original question
A note about how you declare your classes. If you're using Python 2, it's good practice to make your own classes inherit from python's built in object class:
class Testclass(object):
def __init__(self, printHello):
Ref NewClassVsClassicClass - Python Wiki:
Python 3 uses there new-style classes by default, so you don't need to explicitly inherit from object if using py3.
EDITED:
If you want to preserve the values inside the object after perform addMethod, for exmaple, if you want call addMethod again. then use the first way. If you just want to use some internal values of the class to perform the addMethod, use the second way.
You really can't draw any conclusions on this sort of question in the absence of a concrete and meaningful example, because it's going to depend on the facts and circumstances of what you're trying to do.
That being said, in your first example, firstMethod() and secondMethod() are just superfluous. They serve no purpose at all other than to compute values that addMethod() uses. Worse, to make addMethod() function, the user has to first make two inexplicable and apparently unrelated calls to firstMethod() and secondMethod(), which is unquestionably bad design. If those two methods actually did something meaningful it might make sense (but probably doesn't) but in the absence of a real example it's just bad.
You could replace the first example by:
class Testclass:
def __init__(self, a,b,c):
self.a = a
self.b = b
self.c = c
def addMethod(self):
return self.a + self.b + self.c + 6
myclass = Testclass(10,20,30)
addition = myclass.addMethod()
The second example is similar, except firstMethod() and secondMethod() actually do something, since they return values. If there was some reason you'd want these values separately for some reason other than passing them to addMethod(), then again, it might make sense. If there wasn't, then again you could define addMethod() as I just did, and dispense with those two additional functions altogether, and there wouldn't be any difference between the two examples.
But this is all very unsatisfactory in the absence of a concrete example. Right now all we can really say is that it's a slightly silly class.
In general, objects in the OOP sense are conglomerates of data (instance variables) and behavior (methods). If a method doesn't access instance variables - or doesn't need to - then it generally should be a standalone function, and not be in a class at all. Once in a while it will make sense to have a class or static method that doesn't access instance variables, but in general you should err towards preferring standalone functions.

Python - __eq__ method not being called

I have a set of objects, and am interested in getting a specific object from the set. After some research, I decided to use the solution provided here: http://code.activestate.com/recipes/499299/
The problem is that it doesn't appear to be working.
I have two classes defined as such:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
class Bar(Foo):
def __init__(self, a, b, c, d, e):
self.a = a
self.b = b
self.c = c
self.d = d
self.e = e
Note: equality of these two classes should only be defined on the attributes a, b, c.
The wrapper _CaptureEq in http://code.activestate.com/recipes/499299/ also defines its own __eq__ method. The problem is that this method never gets called (I think). Consider,
bar_1 = Bar(1,2,3,4,5)
bar_2 = Bar(1,2,3,10,11)
summary = set((bar_1,))
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
bar_equiv.d should equal 4 and likewise bar_equiv .e should equal 5, but they are not. Like I mentioned, it looks like the __CaptureEq __eq__ method does not get called when the statement bar_2 in summary is executed.
Is there some reason why the __CaptureEq __eq__ method is not being called? Hopefully this is not too obscure of a question.
Brandon's answer is informative, but incorrect. There are actually two problems, one with
the recipe relying on _CaptureEq being written as an old-style class (so it won't work properly if you try it on Python 3 with a hash-based container), and one with your own Foo.__eq__ definition claiming definitively that the two objects are not equal when it should be saying "I don't know, ask the other object if we're equal".
The recipe problem is trivial to fix: just define __hash__ on the comparison wrapper class:
class _CaptureEq:
'Object wrapper that remembers "other" for successful equality tests.'
def __init__(self, obj):
self.obj = obj
self.match = obj
# If running on Python 3, this will be a new-style class, and
# new-style classes must delegate hash explicitly in order to populate
# the underlying special method slot correctly.
# On Python 2, it will be an old-style class, so the explicit delegation
# isn't needed (__getattr__ will cover it), but it also won't do any harm.
def __hash__(self):
return hash(self.obj)
def __eq__(self, other):
result = (self.obj == other)
if result:
self.match = other
return result
def __getattr__(self, name): # support anything else needed by __contains__
return getattr(self.obj, name)
The problem with your own __eq__ definition is also easy to fix: return NotImplemented when appropriate so you aren't claiming to provide a definitive answer for comparisons with unknown objects:
class Foo(object):
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __key(self):
return (self.a, self.b, self.c)
def __eq__(self, other):
if not isinstance(other, Foo):
# Don't recognise "other", so let *it* decide if we're equal
return NotImplemented
return self.__key() == other.__key()
def __hash__(self):
return hash(self.__key())
With those two fixes, you will find that Raymond's get_equivalent recipe works exactly as it should:
>>> from capture_eq import *
>>> bar_1 = Bar(1,2,3,4,5)
>>> bar_2 = Bar(1,2,3,10,11)
>>> summary = set((bar_1,))
>>> assert(bar_1 == bar_2)
>>> bar_equiv = get_equivalent(summary, bar_2)
>>> bar_equiv.d
4
>>> bar_equiv.e
5
Update: Clarified that the explicit __hash__ override is only needed in order to correctly handle the Python 3 case.
The problem is that the set compares two objects the “wrong way around” for this pattern to intercept the call to __eq__(). The recipe from 2006 evidently was written against containers that, when asked if x was present, went through the candidate y values already present in the container doing:
x == y
comparisons, in which case an __eq__() on x could do special actions during the search. But the set object is doing the comparison the other way around:
y == x
for each y in the set. Therefore this pattern might simply not be usable in this form when your data type is a set. You can confirm this by instrumenting Foo.__eq__() like this:
def __eq__(self, other):
print '__eq__: I am', self.d, self.e, 'and he is', other.d, other.e
return self.__key() == other.__key()
You will then see a message like:
__eq__: I am 4 5 and he is 10 11
confirming that the equality comparison is posing the equality question to the object already in the set — which is, alas, not the object wrapped with Hettinger's _CaptureEq object.
Update:
And I forgot to suggest a way forward: have you thought about using a dictionary? Since you have an idea here of a key that is a subset of the data inside the object, you might find that splitting out the idea of the key from the idea of the object itself might alleviate the need to attempt this kind of convoluted object interception. Just write a new function that, given an object and your dictionary, computes the key and looks in the dictionary and returns the object already in the dictionary if the key is present else inserts the new object at the key.
Update 2: well, look at that — Nick's answer uses a NotImplemented in one direction to force the set to do the comparison in the other direction. Give the guy a few +1's!
There are two issues here. The first is that:
t = _CaptureEq(item)
if t in container:
return t.match
return default
Doesn't do what you think. In particular, t will never be in container, since _CaptureEq doesn't define __hash__. This becomes more obvious in Python 3, since it will point this out to you rather than providing a default __hash__. The code for _CaptureEq seems to believe that providing an __getattr__ will solve this - it won't, since Python's special method lookups are not guaranteed to go through all the same steps as normal attribute lookups - this is the same reason __hash__ (and various others) need to be defined on a class and can't be monkeypatched onto an instance. So, the most direct way around this is to define _CaptureEq.__hash__ like so:
def __hash__(self):
return hash(self.obj)
But that still isn't guaranteed to work, because of the second issue: set lookup is not guaranteed to test equality. sets are based on hashtables, and only do an equality test if there's more than one item in a hash bucket. You can't (and don't want to) force items that hash differently into the same bucket, since that's all an implementation detail of set. The easiest way around this issue, and to neatly sidestep the first one, is to use a list instead:
summary = [bar_1]
assert(bar_1 == bar_2)
bar_equiv = get_equivalent(summary, bar_2)
assert(bar_equiv is bar_1)

Categories