Most pythonic way to provide defaults for class constructor - python

I am trying to stick to Google's styleguide to strive for consistency from the beginning.
I am currently creating a module and within this module I have a class. I want to provide some sensible default values for different standard use cases. However, I want to give the user the flexibility to override any of the defaults. What I am currently doing is I provide a module scoped "constant" dictionary with the default values (for the different use cases) and in my class I give the parameters in the constructor precedence over the defaults.
Finally, I want to make sure that we end with valid values for the parameters.
That's what I have done:
MY_DEFAULTS = {"use_case_1": {"x": 1, "y": 2},
"use_case_2": {"x": 4, "y": 3}}
class MyClass:
def __init__(self, use_case = None, x = None, y = None):
self.x = x
self.y = y
if use_case:
if not self.x:
self.x = MY_DEFAULTS[use_case]["x"]
if not self.y:
self.y = MY_DEFAULTS[use_case]["y"]
assert self.x, "no valid values for 'x' provided"
assert self.y, "no valid values for 'y' provided"
def __str__(self):
return "(%s, %s)" % (self.x, self.y)
print(MyClass()) # AssertionError: no valid values for 'x' provided
print(MyClass("use_case_1")) # (1, 2)
print(MyClass("use_case_2", y = 10) # (4, 10)
Questions
While technically working, I was wondering whether this is the most pythonic way of doing it?
With more and more default values for my class the code becomes very repetitive, what could I do to simplify that?
assert seems also for me not the best option at it is rather a debugging statement than a validation check. I was toying with the #property decorator, where I would raise an Exception in case there are invalid parameters, but with the current pattern I want to allow x and y for a short moment to be not truthy to implement the precedence properly (that is I only want to check the truthiness at the end of the constructor. Any hints on that?

In general if there is more than one way to reasonably construct your object type, you can provide classmethods for alternate construction (dict.fromkeys is an excellent example of this). Note that this approach is more applicable if your use cases are finite and well defined statically.
class MyClass:
def __init__(self, x, y):
self.x = x
self.y = y
#classmethod
def make_use_case1(cls, x=1, y=2):
return cls(x,y)
#classmethod
def make_use_case2(cls, x=4, y=3):
return cls(x,y)
def __str__(self):
return "(%s, %s)" % (self.x, self.y)
If the only variation in the use cases is default arguments then re-writing the list of positional arguments each time is a lot of overhead. Instead we can write one classmethod to take the use case and the optional overrides as keyword only.
class MyClass:
DEFAULTS_PER_USE_CASE = {
"use_case_1": {"x": 1, "y": 2},
"use_case_2": {"x": 4, "y": 3}
}
#classmethod
def make_from_use_case(cls, usecase, **overrides):
args = {**cls.DEFAULTS_PER_USE_CASE[usecase], **overrides}
return cls(**args)
def __init__(self, x,y):
self.x = x
self.y = y
def __str__(self):
return "(%s, %s)" % (self.x, self.y)
x = MyClass.make_from_use_case("use_case_1", x=5)
print(x)
If you wanted the arguments to be passed positionally that would be more difficult but I imagine this would suit your needs.

Python is a very flexible language. If your code runs, there is no technically wrong way of doing things. However, if you want to be "Pythonic", here are a few tips for you. First of all, you should never use AssertionErrors for verifying the presence or value of a parameter. If a parameter is not passed and it should be there, you should raise a TypeError. If the value passed is not acceptable, you should raise a ValueError. Assertions are mainly used for testing.
When you want to verify the presence of a value in the parameter a, it is best to do a is not None, rather than not a. You can do not a when None and 0 or other Falsy values are equally invalid for you. However, when the purpose is to check the presence of a value, 0 and None are not the same.
Regarding your class, I believe that a nicer way of doing this is unwrapping the values of the dictionary upon the class initalization. If you remove use_case from the function signature, and call your class like this:
MyClass(**MY_DEFAULTS["use_case_1"])
Python will unwrap the values of the nested dictionary and pass them as keyword arguments to your __init__ method. If you do not want the values to be optional, remove the default value and Python will raise a TypeError for you if the parameters provided do not match the function signature.
If you still want your parameters to not be Falsy, perhaps you should want to provide a more concrete scope for the possible values of the parameters. If the type of x is int, and you don't want 0 values, then you should compare x with 0:
def __init__(x, y):
if x == 0 or y == 0:
raise ValueError("x or y cannot be 0")

keeping your original interface, you could use kwargs to read parameters. If some are missing, set the defaults, only if the use case matches.
MY_DEFAULTS = {"use_case_1": {"x": 1, "y": 2},
"use_case_2": {"x": 4, "y": 3}}
class MyClass:
def __init__(self, use_case = None, **kwargs):
for k,v in kwargs.items():
setattr(self,k,v)
if use_case:
for k,v in MY_DEFAULTS[use_case].items():
if k not in kwargs:
setattr(self,k,v)
unassigned = {'x','y'}
unassigned.difference_update(self.__dict__)
if unassigned:
raise TypeError("missing params: {}".format(unassigned))
def __str__(self):
return "(%s, %s)" % (self.x, self.y)
print(MyClass("use_case_1")) # (1, 2)
print(MyClass("use_case_2", y = 10)) # (4, 10)
print(MyClass())
executing this:
(1, 2)
(4, 10)
Traceback (most recent call last):
File "<string>", line 566, in run_nodebug
File "C:\Users\T0024260\Documents\module1.py", line 22, in <module>
print(MyClass())
File "C:\Users\T0024260\Documents\module1.py", line 15, in __init__
raise TypeError("missing params: {}".format(unassigned))
TypeError: missing params: {'y', 'x'}
With more and more default values for my class the code becomes very repetitive, what could I do to simplify that?
This solution allows to have many parameters.

Related

Python method calls in constructor and variable naming conventions inside a class

I try to process some data in Python and I defined a class for a sub-type of data. You can find a very simplified version of the class definition below.
class MyDataClass(object):
def __init__(self, input1, input2, input3):
"""
input1 and input2 are a 1D-array
input3 is a 2D-array
"""
self._x_value = None # int
self._y_value = None # int
self.data_array_1 = None # 2D array
self.data_array_2 = None # 1D array
self.set_data(input1, input2, input3)
def set_data(self, input1, input2, input3):
self._x_value, self._y_value = self.get_x_and_y_value(input1, input2)
self.data_array_1 = self.get_data_array_1(input1)
self.data_array_2 = self.get_data_array_2(input3)
#staticmethod
def get_x_and_y_value(input1, input2):
# do some stuff
return x_value, y_value
def get_data_array_1(self, input1):
# do some stuff
return input1[self._x_value:self._y_value + 1]
def get_data_array_2(self, input3):
q = self.data_array_1 - input3[self._x_value:self._y_value + 1, :]
return np.linalg.norm(q, axis=1)
I'm trying to follow the 'Zen of Python' and thereby to write beautiful code. I'm quite sceptic, whether the class definition above is a good pratice or not. While I was thinking about alternatives I came up with the following questions, to which I would like to kindly get your opinions and suggestions.
Does it make sense to define ''get'' and ''set'' methods?
IMHO, as the resulting data will be used several times (in several plots and computation routines), it is more convenient to create and store them once. Hence, I calculate the data arrays once in the constructor.
I do not deal with huge amount of data and therefore processing takes not more than a second, however I cannot estimate its potential implications on RAM if someone would use the same procedure for huge data.
Should I put the function get_x_and_y_value() out of the class scope and convert static method to a function?
As the method is only called inside the class definition, it is better to use it as a static method. If I should define it as a function, should I put all the lines relevant to this class inside a script and create a module of it?
The argument naming of the function get_x_and_y_value() are the same as __init__ method. Should I change it?
It would ease refactoring but could confuse others who read it.
In Python, you do not need getter and setter functions. Use properties instead. This is why you can access attributes directly in Python, unlike other languages like Java where you absolutely need to use getters and setters and to protect your attributes.
Consider the following example of a Circle class. Because we can use the #property decorator, we don't need getter and setter functions like other languages do. This is the Pythonic answer.
This should address all of your questions.
class Circle(object):
def __init__(self, radius):
self.radius = radius
self.x = 0
self.y = 0
#property
def diameter(self):
return self.radius * 2
#diameter.setter
def diameter(self, value):
self.radius = value / 2
#property
def xy(self):
return (self.x, self.y)
#xy.setter
def xy(self, xy_pair):
self.x, self.y = xy_pair
>>> c = Circle(radius=10)
>>> c.radius
10
>>> c.diameter
20
>>> c.diameter = 10
>>> c.radius
5.0
>>> c.xy
(0, 0)
>>> c.xy = (10, 20)
>>> c.x
10
>>> c.y
20

When to store things as part of an instance vs returning them?

I was just wondering when to store things as part of a class instance versus when to use a method to return things. For example, which of the following would be better:
class MClass():
def __init__(self):
self.x = self.get_x()
self.get_y()
self.z = None
self.get_z()
def get_x(self):
return 2
def get_y(self):
self.y = 5 * self.x
def get_z(self):
return self.get_x() * self.x
What are the conventions regarding this sort of thing and when should I assign things to self and when should I return values? Is this essentially a public/private sort of distinction?
You shouldn't return anything from __init__.
Python is not Java. You don't need to include get for everything.
If x is always 2 and y is always 10 and z is always 12, that is a lot of code.
Making some assumptions, I would write that class:
class MClass(object):
def __init__(self, x):
self.x = x
def y(self):
return self.x * 5
def z(self):
return self.x + self.y()
>>> c = MClass(2)
>>> c.x
2
>>> c.y() # note parentheses
10
>>> c.z()
12
This allows x to change later (e.g. c.x = 4) and still give the correct values for y and z.
You can use the #property decorator:
class MClass():
def __init__(self):
self.x = 2
#property
def y(self):
return 5 * self.x
#here a plus method for the setter
#y.setter
def y(self,value):
self.x = y/5
#property
def z(self):
return self.x * self.x
It's a good way of organizing yours acessors
There's no "conventions" regarding this, AFAIK, although there're common practices, different from one language to the next.
In python, the general belief is that "everything is public", and there's no reason at all to have a getter method just to return the value of a instance variable. You may, however, need such a method if you need to perform operations on the instance when such variable is accessed.
Your get_y method, for example, only makes sense if you need to recalculate the expression (5 * self.x) every time you access the value. Otherwise, you should simply define the y variable in the instance in __init__ - it's faster (because you don't recalculate the value every time) and it makes your intentions clear (because anyone looking at your code will immediately know that the value does not change)
Finally, some people prefer using properties instead of writing bare get/set methods. There's more info in this question
I read your question as a general Object Oriented development question, rather than a python specific one. As such, the general rule of member data would be to save the data as a member of the class only if it's relevant as part of a particular instance.
As an example, if you have a Screen object which has two dimensions, height and width. Those two should be stored as members. The area associated with a particular instance would return the value associated with a particular instance's height and width.
If there are certain things that seem like they should be calculated on the fly, but might be called over and over again, you can cache them as members as well, but that's really something you should do after you determine that it is a valid trade off (extra member in exchange for faster run time).
get should always do what it says. get_y() and get_z() don't do that.
Better do:
class MClass(object):
def __init__(self):
self.x = 2
#property
def y(self):
return 5 * self.x
#property
def z(self):
return self.x * self.x
This makes y and z always depend on the value of x.
You can do
c = MClass()
print c.y, c.z # 10, 4
c.x = 20
print c.y, c.z # 100, 400

How to initialize an instance of a subclass of tuple in Python? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Subclassing Python tuple with multiple __init__ arguments
I want to define a class which inherits from tuple, and I want to be able to instantiate it using a syntax not supported by tuple. For a simple example, let's say I want to define a class MyTuple which inherits from tuple, and which I can instantiate by passing two values, x and y, to create the (my) tuple (x, y). I've tried the following code:
class MyTuple(tuple):
def __init__(self, x, y):
print("debug message")
super().__init__((x, y))
But when I tried, for example, MyTuple(2, 3) I got an error: TypeError: tuple() takes at most 1 argument (2 given). It seems my __init__ function was not even called (based on the error I got and on the fact my "debug message" was not printed).
So what's the right way to do this?
I'm using Python 3.2.
class MyTuple(tuple):
def __new__(cls, x, y):
return tuple.__new__(cls, (x, y))
x = MyTuple(2,3)
print(x)
# (2, 3)
One of the difficulties of using super is that you do not control which classes's method of the same name is going to be called next. So all the classes' methods have to share the same call signature -- at least the same number of items. Since you are changing the number of arguments sent to __new__, you can not use super.
Or as Lattyware suggests, you could define a namedtuple,
import collections
MyTuple = collections.namedtuple('MyTuple', 'x y')
p = MyTuple(2,3)
print(p)
# MyTuple(x=2, y=3)
print(p.x)
# 2
another approach would be to encapsulate a tuple rather than inheriting from it:
>>> class MyTuple(object):
count = lambda self, *args: self._tuple.count(*args)
index = lambda self, *args: self._tuple.index(*args)
__repr__ = lambda self: self._tuple.__repr__()
# wrap other methods you need, or define them yourself,
# or simply forward all unknown method lookups to _tuple
def __init__(self, x, y):
self._tuple = x,y
>>> x = MyTuple(2,3)
>>> x
(2, 3)
>>> x.index(3)
1
How practical this is, depends on how many capabilities and modifications you need, and wheter you need to have isinstance(MyTuple(2, 3), tuple).

Why is foo(*arg, x) not allowed in Python?

Look at the following example
point = (1, 2)
size = (2, 3)
color = 'red'
class Rect(object):
def __init__(self, x, y, width, height, color):
pass
It would be very tempting to call:
Rect(*point, *size, color)
Possible workarounds would be:
Rect(point[0], point[1], size[0], size[1], color)
Rect(*(point + size), color=color)
Rect(*(point + size + (color,)))
But why is Rect(*point, *size, color) not allowed, is there any semantic ambiguity or general disadvantage you could think of?
EDIT: Specific Questions
Why are multiple *arg expansions not allowed in function calls?
Why are positional arguments not allowed after *arg expansions?
I'm not going to speak to why multiple tuple unpacking isn't part of Python, but I will point out that you're not matching your class to your data in your example.
You have the following code:
point = (1, 2)
size = (2, 3)
color = 'red'
class Rect(object):
def __init__(self, x, y, width, height, color):
self.x = x
self.y = y
self.width = width
self.height = height
self.color = color
but a better way to express your Rect object would be as follows:
class Rect:
def __init__(self, point, size, color):
self.point = point
self.size = size
self.color = color
r = Rect(point, size, color)
In general, if your data is in tuples, have your constructor take tuples. If your data is in a dict, have your constructor take a dict. If your data is an object, have your constructor take an object, etc.
In general, you want to work with the idioms of the language, rather than try to work around them.
EDIT
Seeing how popular this question is, I'll give you an decorator that allows you to call the constructor however you like.
class Pack(object):
def __init__(self, *template):
self.template = template
def __call__(self, f):
def pack(*args):
args = list(args)
for i, tup in enumerate(self.template):
if type(tup) != tuple:
continue
for j, typ in enumerate(tup):
if type(args[i+j]) != typ:
break
else:
args[i:i+j+1] = [tuple(args[i:i+j+1])]
f(*args)
return pack
class Rect:
#Pack(object, (int, int), (int, int), str)
def __init__(self, point, size, color):
self.point = point
self.size = size
self.color = color
Now you can initialize your object any way you like.
r1 = Rect(point, size, color)
r2 = Rect((1,2), size, color)
r3 = Rect(1, 2, size, color)
r4 = Rect((1, 2), 2, 3, color)
r5 = Rect(1, 2, 2, 3, color)
While I wouldn't recommend using this in practice (it violates the principle that you should have only one way to do it), it does serve to demonstrate that there's usually a way to do anything in Python.
As far as I know, it was a design choice, but there seems to be a logic behind it.
EDIT: the *args notation in a function call was designed so you could pass in a tuple of variables of an arbitrary length that could change between calls. In that case, having something like f(*a, *b, c) doesn't make sense as a call, as if a changes length all the elements of b get assigned to the wrong variables, and c isn't in the right place either.
Keeping the language simple, powerful, and standardized is a good thing. Keeping it in sync with what actually goes on in processing the arguments is also a very good thing.
Think about how the language unpacks your function call. If multiple *arg are allowed in any order like Rect(*point, *size, color), note that all that matters to properly unpack is that point and size have a total of four elements. So point=(), size=(1,2,2,3), andcolor='red') would allow Rect(*point, *size, color) to work as a proper call. Basically, the language when it parses the *point and *size is treating it as one combined *arg tuple, so Rect(*(point + size), color=color) is more faithful representation.
There never needs to be two tuples of arguments passed in the form *args, you can always represent it as one. Since assignment of parameters is only dependent on the order in this combined *arg list, it makes sense to define it as such.
If you can make function calls like f(*a, *b), the language almost begs to allow you to define functions with multiple *args in the parameter list, and those couldn't be processed. E.g.,
def f(*a, *b):
return (sum(a), 2*sum(b))
How would f(1,2,3,4) be processed?
I think this is why for syntactical concreteness, the language forces function calls and definitions to be in the following specific form; like f(a,b,x=1,y=2,*args,**kwargs) which is order dependent.
Everything there has a specific meaning in a function definition and function call. a and b are parameters defined without default values, next x and y are parameters defined with default values (that could be skipped; so come after the no default parameters). Next, *args is populated as a tuple with all the args filled with the rest of the parameters from a function call that weren't keyword parameters. This comes after the others, as this could change length, and you don't want something that could change length between calls to affect assignment of variables. At the end **kwargs takes all the keyword arguments that weren't defined elsewhere. With these concrete definitions you never need to have multiple *args or **kwargs.
*point says that you are passing in a whole sequence of items - something like all the elements in a list, but not as a list.
In this case, you cannot limit how many elements are being passed in. Therefore, there is no way for the interpreter to know which elements of the sequence are part of *points and which are of *size
For example, if you passed the following as input: 2, 5, 3, 4, 17, 87, 4, 0, can you tell me, which of those numbers are represented by *points and which by *size? This is the same problem that the interpreter would face as well
Hope this helps
Python is full of these subtle glitches. For example you can do:
first, second, last = (1, 2, 3)
And you can't do:
first, *others = (1, 2, 3)
But in Python 3 you now can.
Your suggestion probably is going to be suggested in a PEP and integrated or rejected one day.
Well, in Python 2, you can say:
point = 1, 2
size = 2, 3
color = 'red'
class Rect(object):
def __init__(self, (x, y), (width, height), color):
pass
Then you can say:
a_rect= Rect(point, size, color)
taking care that the first two arguments are sequences of len == 2.
NB: This capability has been removed from Python 3.

Possible to use more than one argument on __getitem__?

I am trying to use
__getitem__(self, x, y):
on my Matrix class, but it seems to me it doesn't work (I still don't know very well to use python).
I'm calling it like this:
print matrix[0,0]
Is it possible at all to use more than one argument? Thanks. Maybe I can use only one argument but pass it as a tuple?
__getitem__ only accepts one argument (other than self), so you get passed a tuple.
You can do this:
class matrix:
def __getitem__(self, pos):
x,y = pos
return "fetching %s, %s" % (x, y)
m = matrix()
print m[1,2]
outputs
fetching 1, 2
See the documentation for object.__getitem__ for more information.
Indeed, when you execute bla[x,y], you're calling type(bla).__getitem__(bla, (x, y)) -- Python automatically forms the tuple for you and passes it on to __getitem__ as the second argument (the first one being its self). There's no good way[1] to express that __getitem__ wants more arguments, but also no need to.
[1] In Python 2.* you can actually give __getitem__ an auto-unpacking signature which will raise ValueError or TypeError when you're indexing with too many or too few indices...:
>>> class X(object):
... def __getitem__(self, (x, y)): return x, y
...
>>> x = X()
>>> x[23, 45]
(23, 45)
Whether that's "a good way" is moot... it's been deprecated in Python 3 so you can infer that Guido didn't consider it good upon long reflection;-). Doing your own unpacking (of a single argument in the signature) is no big deal and lets you provide clearer errors (and uniform ones, rather than ones of different types for the very similar error of indexing such an instance with 1 vs, say, 3 indices;-).
No, __getitem__ just takes one argument (in addition to self). In the case of matrix[0, 0], the argument is the tuple (0, 0).
You can directly call __getitem__ instead of using brackets.
Example:
class Foo():
def __init__(self):
self.a = [5, 7, 9]
def __getitem__(self, i, plus_one=False):
if plus_one:
i += 1
return self.a[I]
foo = Foo()
foo[0] # 5
foo.__getitem__(0) # 5
foo.__getitem__(0, True) # 7
I learned today that you can pass double index to your object that implements getitem, as the following snippet illustrates:
class MyClass:
def __init__(self):
self.data = [[1]]
def __getitem__(self, index):
return self.data[index]
c = MyClass()
print(c[0][0])

Categories