This is a question regarding the best practice for creating an instance of a class or type from different forms of the same data using python. Is it better to use a class method or is it better to use a separate function altogether? Let's say I have a class used to describe the size of a document. (Note: This is simply an example. I want to know the best way to create an instance of the class not the best way to describe the size of a document.)
class Size(object):
"""
Utility object used to describe the size of a document.
"""
BYTE = 8
KILO = 1024
def __init__(self, bits):
self._bits = bits
#property
def bits(self):
return float(self._bits)
#property
def bytes(self):
return self.bits / self.BYTE
#property
def kilobits(self):
return self.bits / self.KILO
#property
def kilobytes(self):
return self.bytes / self.KILO
#property
def megabits(self):
return self.kilobits / self.KILO
#property
def megabytes(self):
return self.kilobytes / self.KILO
My __init__ method takes a size value represented in bits (bits and only bits and I want to keep it that way) but lets say I have a size value in bytes and I want to create an instance of my class. Is it better to use a class method or is it better to use a separate function altogether?
class Size(object):
"""
Utility object used to describe the size of a document.
"""
BYTE = 8
KILO = 1024
#classmethod
def from_bytes(cls, bytes):
bits = bytes * cls.BYTE
return cls(bits)
OR
def create_instance_from_bytes(bytes):
bits = bytes * Size.BYTE
return Size(bits)
This may not seem like an issue and perhaps both examples are valid but I think about it every time I need to implement something like this. For a long time I have preferred the class method approach because I like the organisational benefits of tying the class and the factory method together. Also, using a class method preserves the ability to create instances of any subclasses so it's more object orientated. On the other hand, a friend once said "When in doubt, do what the standard library does" and I am yet to find an example of this in the standard library.
First, most of the time you think you need something like this, you don't; it's a sign that you're trying to treat Python like Java, and the solution is to step back and ask why you need a factory.
Often, the simplest thing to do is to just have a constructor with defaulted/optional/keyword arguments. Even cases that you'd never write that way in Java—even cases where overloaded constructors would feel wrong in C++ or ObjC—may look perfectly natural in Python. For example, size = Size(bytes=20), or size = Size(20, Size.BYTES) look reasonable. For that matter, a Bytes(20) class that inherits from Size and adds absolutely nothing but an __init__ overload looks reasonable. And these are trivial to define:
def __init__(self, *, bits=None, bytes=None, kilobits=None, kilobytes=None):
Or:
BITS, BYTES, KILOBITS, KILOBYTES = 1, 8, 1024, 8192 # or object(), object(), object(), object()
def __init__(self, count, unit=Size.BITS):
But, sometimes you do need factory functions. So, what do you do then? Well, there are two kinds of things that are often lumped together into "factories".
A #classmethod is the idiomatic way to do an "alternate constructor"—there are examples all over the stdlib—itertools.chain.from_iterable, datetime.datetime.fromordinal, etc.
A function is the idiomatic way to do an "I don't care what the actual class is" factory. Look at, e.g., the built-in open function. Do you know what it returns in 3.3? Do you care? Nope. That's why it's a function, not io.TextIOWrapper.open or whatever.
Your given example seems like a perfectly legitimate use case, and fits pretty clearly into the "alternate constructor" bin (if it doesn't fit into the "constructor with extra arguments" bin).
Related
I have some functions that have implementation details that depend on which type of object is passed to them (specifically, it's to pick the proper method to link Django models to generate QuerySets). Which of the two following options is the more Pythonic way to implement things?
If ladders
def do_something(thing: SuperClass) -> "QuerySet[SomethingElse]":
if isinstance(thing, SubClassA):
return thing.property_set.all()
if isinstance(thing, SubClassB):
return thing.method()
if isinstance(thing, SubClassC):
return a_function(thing)
if isinstance(thing, SubClassD):
return SomethingElse.objects.filter(some_property__in=thing.another_property_set.all())
return SomethingElse.objects.none()
Dictionary
def do_something(thing: SuperClass) -> "QuerySet[SomethingElse]":
return {
SubClassA: thing.property_set.all(),
SubClassB: thing.method(),
SubClassC: a_function(thing),
SubClassD: SomethingElse.objects.filter(some_property__in=thing.another_property_set.all()),
}.get(type(thing), SomethingElse.objects.none())
The dictionary option has less repeated code and fewer lines but the if ladders make PyCharm & MyPy happier (especially with type-checking).
I assume that any performance difference between the two would be negligible unless it's in an inner loop of a frequently-called routine (as in >>1 request/second).
This is exactly the type of problem polymorphism aims to solve, and the "Pythonic" way to solve this problem is to use polymorphism. Following the notion to "encapsulate what varies", I'd recommend creating a base "interface" that all classes implement, then just call a method of the same name on all classes.
I put "interface" in quotation marks, because Python doesn't really have interfaces as they're commonly known in OOP. So, you'll have to make do with using subclasses, and enforcing the method signature manually (i.e. by being careful).
To demonstrate:
class SuperClass:
# define the method signature here (mostly for documentation purposes)
def do_something(self):
pass
class SubClassA(SuperClass):
# Be careful to override this method with the same signature as shown in
# SuperClass. (In this case, there aren't any arguments.)
def do_something(self):
print("Override A")
class SubClassB(SuperClass):
def do_something(self):
print("Override B")
if __name__ == '__main__':
import random
a = SubClassA()
b = SubClassB()
chosen = random.choice([a, b])
# We don't have to worry about which subclass was chosen, because they
# share the same interface. That is, we _know_ there will be a
# `do_something` method on it that takes no arguments.
chosen.do_something()
I'm working on a code base that has multiple Python modules that provide specific functionality each having a class. The classes are imported elsewhere in the code and they take a single argument which is a custom parameters object that is created from a configuration file.
This works fine in the application, but it's not great for importing the classes on their own to use their functionality elsewhere because you would have to create a parameters object for each class even if the particular class has a single parameter.
To simplify I have the idea of checking the type of the single argument:
if it's a parameters object, proceed as already implemented
if it's a string, instantiate class in a custom way
class Ruler:
def __init__(self, parameters):
if isinstance(parameters, paramsObject):
self.config = parameters
elif isinstance(parameters, str):
self.length = parameters
After this I could handle ruler = Ruler('30cm') without needing to create a parameters object.
The question is: is that good architecture and if there are some principles I'm missing here.
I would say that you have proposed an anti-pattern solution to an anti-pattern problem.
It is somewhat unhelpful that the existing architecture (over)uses paramsObject. After all, named function parameters are there for a reason and this just obfuscates what Ruler really needs in order to instantiate. It isn't much different to having all functions take *args and **kwargs.
Your proposed solution is a sort-of manual function overloading, which Python doesn't have because of the type system. Python has duck typing, which, to paraphrase, says that if it walks like a paramsObject and quacks like a paramsObject then it is a paramsObject.
In other words, the simpler solution would be to work out which values Ruler is looking for in parameters and adding only those to a new class:
class RulerParameter:
def __init__(self, length):
self.length = length
class Ruler:
def __init__(self, parameters):
self.config = parameters
def get_length(self):
return self.config.length
my_ruler = Ruler(RulerParameter(30))
print(my_ruler.get_length())
Often when I create classes in Python (and other languages) I struggle to decide which is a better practice: (a) using instance variables in my method functions, or (b) listing the input variables in the function definition.
(a)
class ClassA(object):
def __init__(self, a):
self.variable_a = a
def square_a(self):
return self.variable_a ** 2
(b)
class ClassB(object):
def __init__(self, b):
self.variable_b = b
def square_b(self, input_var):
return input_var ** 2
These examples are very simple and obvious, but highlight what I find confusing in regard to which is the better idea. Furthermore, is it taboo to set an instance variable outside of the __init__ method? For example:
class ClassC(object):
def __init__(self, c):
self.variable_c = c
def square_c(self):
self.square_of_c = self.variable_c ** 2
EDIT: I understand the somewhat-vague nature of this question, but I asked it because it's difficult to know what people expect to see in source code that I write for, say, collaborative projects. If one or more of the examples I gave is an anti-pattern, my thinking was that this question would provide me with helpful insight.
From PEP 20:
There should be one-- and preferably only one --obvious way to do it.
In this example, (b) is not very useful as a member function. It could just as easily be a free function:
def square(input_var):
return input_var ** 2
This is arguably a better interface as it can be used in any context, not just from an instance of ClassB.
Generally I would go with (a) if I know self.variable_a is the only input the function should need. If I want it to work with anything and it doesn't depend on anything in the class, I would make it a free function. If I want it to work with anything but it does depend on some class state, then make it a member that takes the input as a parameter. As an example, what if ClassA contained both a variable_a and a variable_b? You couldn't use square_a to modify variable_b, which may or may not be desired depending on the actual use case.
Furthermore, is it taboo to set an instance variable outside of the init method?
No, but it's generally a good idea to make sure all members are initialized somewhere around the time of class instantiation. Even if you just initialize your members to None. It is much easier to check if a member variable is None rather than trying to determine whether or not it is defined.
EDIT: Another few examples:
# Here a free function makes the most sense because the operation is 'pure'
# i.e. it has no side effects and requires no state besides its input arguments
def square(value):
return value ** 2
class LoggedCalculator(object):
def __init__(self, logger):
self.__logger = logger
# (a) makes more sense here because it depends on class state and doesn't need to change
# its behavior by taking in some parameter
def get_logger(self):
return self.__logger
# (b) makes more sense here because we rely on a mixture of class state and some other input
def square(self, value):
result = square(value) # Re-use free function above
self.__logger.info('{}^2 = {}'.format(value, result))
return result
calc = LoggedCalculator(logging.getLogger())
calc.square(4) # This requires an instance of LoggedCalculator
square(2) # This can be called anywhere, even if you don't have a logger available
Your question is really vague and therefore difficult to answer. All classes seem to have correct syntax.
However b) looks more useful to me than a)
It looks like you just want to return the square of the input variable, so I'd assume you want to calculate the square of different values all the time. If you do that in case a) you will have to set the variable every time before you call the method.
b) makes me wonder what you use variable_b for as it's never used really. But I guess that's due to simplification.
Also you might consider making square_b a static method, as it doesn't use any object attributes. (never calls self)
class ClassB(object):
def __init__(self, b):
self.variable_b = b
#staticmethod
def square_b(input_var):
return input_var ** 2
Variant c) is also valid in terms of syntax, but some might regard it bad practice. A question of taste really, but many books and websites will advise you to declare variables in the init method.
so I have a set of distance functions and respective calculations, e.g. average, comparison
and I want to be able to iterate over those different distances to compute their values/averages/whatevers, and make it easy to add new distances
Right now, I'm doing that by using nested dictionaries, however this depends on all the functions existing and working properly, so I was wondering whether there is a design pattern that solves that?
My first idea was a metaclass that defines which functions need to exist and classes that implement these functions. However, then there would be no meaningful instances of those Distance classes.
My second idea then was defining a Distance class and have the functions as attributes of that class, but that seems bad style.
Example for the second Idea:
class Distance:
def __init__(self, distf, meanf):
self.distf = distf
self.meanf = meanf
def dist(self, x1,x2):
return self.distf(x1,x2)
def mean(self, xs):
return self.meanf(xs)
d = Distance(lambda x,y: abs(x-y), np.mean)
d.dist(1,2) ##returns 1 as expected
d.dist([1,2]) ## returns 1.5 as expected
this works (and also enforces the existence/maybe even properties of the functions), but as stated above feels like rather bad style. I do not plan to publish this code, its just about keeping it clean and organized if that is relevant.
I hope the question is clear, if not pleas dont hesitate to comment and I will try to clarify.
EDIT:
- #victor: Everything should be initially set. At runtime only selection should occur.
- #abarnert Mostly habitual, also to restrict usage (np.mean needs to be called without axis argument in this example), but that should hopefully not be relevant since I'm not publishing this
- #juanpa gonna look into that
It seems that simple inheritance is what you need.
So, you create a base class BaseSpace which is basically an interface:
from abc import ABC
class BaseSpace(ABC):
#staticmethod
def dist(x1, x2):
raise NotImplementedError()
#staticmethod
def mean(xs):
raise NotImplementedError()
Then you just inherit this interface with all different combinations of the functions you need, implementing the methods either inside the class (if you are using them once only) or outside, and just assigning them in the class definition:
class ExampleSpace(BaseSpace):
#staticmethod
def dist(x1, x2):
return abs(x1 - x2)
mean = staticmethod(np.mean)
Because of the Python's duck typing approach (which is also applicable to interface definition) you don't really need the base class actually defined, but it helps to show what is expected of each of your "Space" classes.
I have this problem in a fun project where I have to tell a class to execute an operation for a certain time period with the start and end times specified (along with the interval ofc).
For the sake of argument consider that class A (the class that has the API that I call) has to call classes B (which calls class C and which in turn calls class D) and E (which in turn calls class F and class G).
A -> B -> C -> D
A -> E -> G
Now class B C and E require the context about the time. I've currently set it up so that class A passes the context to class B and E and they in turn pass the context around as needed.
I'm trying to figure out the best way to solve this requirement without passing context around and as much as I hate it, I was considering using the Highlander (or the Singleton) or the Borg pattern (a variant on the Monostate). I just wanted to know what my options were with regard to these patterns.
Option 1
Use a traditional borg e.g.:
class Borg:
__shared_state = {}
def __init__(self):
self.__dict__ = self.__shared_state
I could simply instantiate this class everywhere and have access to the global state that I want.
Option 2
Another variant on the monostate:
class InheritableBorg:
__time = (start, end, interval)
def __init__(self):
pass
#property
def time(self):
return __time
#time.setter
def time(self, time):
__time = time
This in theory would allow me to simply extend by doing:
class X(InheritableBorg):
pass
and to extend I would just do:
class NewInheritableBorg(InheritableBorg):
__time = (0, 0, 'd')
Then in theory I could leverage multiple inheritance and then would be able to get access to multiple borgs in one go e.g.:
class X(InheritableBorg1, InheritableBorg2):
pass
I could even selectively override stuff as well.
Option 3
Use a wrapped nested function as a class decorator/ wrapper if possible. However, I could only use this once and would need to pass the function handle around. This is based offa a mix of the proxy/delegation idea.
This is not concrete in my head but in essence something like:
def timer(time)
def class_wrapper(class_to_wrap):
class Wrapper(class_to_wrap):
def __init__(self, *a, **kw):
super().__init__(*a, **kw)
def get_time(self):
return time
return Wrapper
return class_wrapper
#timer(time)
class A_that_needs_time_information:
pass
I think that might work... BUT I still need to pass the function handle.
Summary
All of these are possible solutions, and I'm leaning towards the multiple inheritance Borg pattern (though the class wrapper is cool).
The regular Borg pattern has to be instantiated soo many times that it seems like too much overhead just to store one set of values.
The Borg mixin is instantiated as many times as the class is instantiated. I don't see how it would be any harder to test.
The wrapper is ultra generic and would be relatively easy to test. Theoretically since its a function closure I should be able to mutate stuff but then it essentially becomes a singleton (which just seems too complicated in which case I might as well just use the regular singleton pattern). Also, passing the function handle around rather defeats the purpose.
Barring these three ways, are there any other ways of doing this? Any better ways (the Borg doesn't seem easily testable since no DI)? Are there any drawbacks that I seem to have missed?
Alternatively I could just stick with what I have now... passing the time around as required. It satisfies the loose coupling requirement and DI best practices... but its just soo cumbersome. There has to be a better way!
As a general rule of thumb, try to keep things simple. You're considering very complex solutions to a relatively simple problem.
With the information you've given, it seems you could just wrap the request for execution of an operation in an object that also includes the context. Then you'd just be passing around one object between the different classes. For example:
class OperationRequestContext:
def __init__(self, start_time, end_time):
self.start_time = start_time
self.end_time = end_time
class OperationRequest:
def __init__(self, operation, context):
self.operation = operation
self.context = context
If there are additional requirements that justify considering more complex solutions, you should specify them.