How to implement a strategy pattern with runtime selection of a method? - python

Context
I'm trying to implement some variant of strategy pattern in Python 2.7.
I want to be able to instantiate a 'my_strategy' base class, but switch between different implementations of a 'score' method at run-time.
I will have many common methods in 'my_strategy' but a bunch of 'score' implementations.
The main illustrates how I want to use it.
Here the scoring implementation is dummy of course.
What I tried (i.e. My code so far)
strategy.py:
from algo_one import *
#from algo_two import *
class my_strategy ( object ):
def __init__(self, candidate = ""):
self.candidate = candidate
self.method = 'default'
self.no = 10
self._algo = algo_one
def set_strategy(self, strategy='default'):
self.strategy = strategy
if self.strategy == 'algo_one':
self._algo = algo_one
elif self.strategy == 'algo_two':
# self._algo = algo_two
pass
else:
self._algo = None
def score(self, *args):
if len(args) > 0:
self.candidate = args[0]
self._algo.score(self.candidate)
if __name__ == "__main__":
s = my_strategy()
s.strategy = 'algo_one'
s.candidate = "hello world"
print s.score()
print s.score("hi")
# s.set_method('algo_two')
# print s.score("hi")
I want to save the selected strategy in some sort of private pointer to the sub-class method.
algo_one.py:
from strategy import my_strategy
class algo_one ( my_strategy ):
def score(self, candidate):
return len(candidate)*self.no
I could have a class-less method, but later I'll need to access public variables of the base class.
algo_two.py:
from strategy import my_strategy
class algo_two ( my_strategy ):
def score(self, candidate):
return len(candidate)*3
I have an empty init.py too.
The errors
1.
in score self._algo.score(self.candidate)
TypeError: unbound method score() must be called with algo_one
instance as first argument (got str instance instead)
2.
If I uncomment the import of the second strategy:
from algo_two import *
I get the following error.
ImportError: cannot import name my_strategy
My guess is that I run into some sort of circular dependency.
3.
from algo_one import *
This is obviously not pretty (unable to detect undefined names), but if I
from algo_one import algo_one
I get
ImportError: cannot import name algo_one
Question
I think the errors are intertwined and that my approach, as a whole, may be flawed. If not just addressing the error, I'm looking for suggestions to improve the design. Or any comment, really. Also I'm open to suggestions regarding the title of this question. Thank you!

You make it much more complicated than it needs to be. Python functions are first class objects so the simplest way to implement the strategy pattern in Python is to pass a 'strategy' function to your "context" object (the one that uses the strategy). The fine part is that any callable object (ie: any object implementing the __call__ method) will work.
def default_score_strategy(scorer):
return len(scorer.candidate) * 3
def universal_answer_score_strategy(scorer):
return 42 # definitly the universal answer <g>
class ComplicatedStrategy(object):
def __init__(self, factor):
self.factor = factor
def __call__(self, scorer):
return len(scorer.candidate) * self.factor
class Scorer(object):
def __init__(self, candidate="", strategy=default_score_strategy):
self.candidate = candidate
self.strategy = strategy
def score(self):
return self.strategy(self)
s1 = Scorer("foo")
s2 = Scorer("bar", strategy=universal_answer_score_strategy)
s3 = Scorer("baaz", strategy=ComplicatedStrategy(365))
Note that your strategies dont have to be in the same module as the Scorer class (well, except the default one of course), and that the module containing the Scorer class doesn't have to import the stratgeies modules - nor know anything about where the strategies are defined:
# main.py
from mylib.scores import Scorer
from myapp.strategies import my_custom_strategy
s = Scorer("yadda", my_custom_strategy)

You don't instantiate your algo object in the __init__ method. Remember, to instantiate a class object, you need to call it:
self._algo = algo_one()
Yes, that's a circular dependency. I don't see however why algo_one and algo_two need to inherit from my_strategy at all. Just make them plain objects, or inherit a base class stored somewhere else. Or, keep them all in the same file - there's no reason to necessarily have classes in separate files in Python.
This is the same problem as 2.

One of your main problems are that your algorithms try to subclass from your base class, which is a huge design flaw (you already noticed that). Use a simple method binding instead, which deals with all the necessary things:
def algo_one(candidate):
# do stuff
return "A fluffy unicorn"
def algo_two(candidate):
# do some other stuff
return "Awesome rabbits"
# not really necessary, just to make it easier to add new algorithms
STRATEGIES = { "one": algo_one, "two": algo_two }
class Strategy(object):
def __init__(self):
...
def set_strategy(self, which):
if which not in STRATEGIES:
raise ValueError("'%s' is an unknown strategy" % which)
# compatibility checks about the entries in STRATEGIES omitted here
self._algo = STRATEGIES[which]
def score(self, *args):
# ...
return self._algo(...)
If you need a more complex approach (this however depends on your requirements), in which everyone knows about each other, split the algorithms and strategy chooser into different classes referencing each other (shortened version below):
class ScoreAlgo(object):
def __init__(self, parent):
self._strategy = parent # if you need a back-reference, just be aware of circular dependencies in the garbage collection
def __del__(self):
self._strategy = None # resolve circular dependency for the GC
def score(self, candidate):
return None
class Strategy(object):
def __init__(self):
...
def set_strategy(self, ...):
...
self._algo = ScoreAlgo(self)
def score(self, ...):
return self._algo.score(...)
(If you need a huge variety of algorithms, you should make ScoreAlgo an abstract base class, for which subclasses have to implement the score() method).
You also could use a mixin pattern (which is a bit more formal than the method binding) or several other ways. This however depends on your overall requirements.
EDIT: I just added a returnto both def score(): stubs to avoid confusion about why those might not return anything.

Related

How to annotate a return type as either a class instance or its (unique) subclass instance?

I am writing a python library used by importing and (optionally) sub-classing some of the 'helper classes' it provides. I fail to come up with a design that would properly let static analysis tools properly recognise the types that my 'helper classes' methods deal with. Here's a MWE illustrating (one of) the issues I run into:
My lib
from typing import Dict
class Thing:
def shout(self):
print(f"{self} says AAAAAAAAAaaaaaaaa")
class ContainerOfThings:
def __init__(self):
thing_cls = self._thing_cls = get_unique_subclass(Thing)
self._things: Dict[str, thing_cls] = {}
def add_thing(self, id_: str):
self._things[id_] = self._thing_cls()
def get_thing(self, id_: str):
return self._things[id_]
def get_unique_subclass(cls):
# this works but maybe there's a better way to do this?
classes = cls.__subclasses__()
if len(classes) == 0:
return cls
elif len(classes) == 1:
return classes[0]
elif len(classes) > 1:
raise RuntimeError(
"This class should only be subclassed once", cls, classes
)
What I expect users to do with it
class BetterThing(Thing):
def be_civilized(self):
print(f"{self} says howdy!")
container = ContainerOfThings()
container.add_thing("some_id")
thingy = container.get_thing("some_id")
thingy.be_civilized()
thingy.do_something_invalid() # here I would like mypy to detect that this will not work
This snippet does not alarm static analysis tools, because thingy is detected as Any, but fails at runtime on the last line because do_something_invalid() is not defined. Isn't it possible to give hints that thingy is in fact an instance of BetterThing here?
My attempts so far:
Attempt 1
Annotate ContainerOfThings._things as Dict[str, Thing] instead of Dict[str, thing_cls]
This passes mypy, but pycharm detects thingy as an instance of Thing and thus complains about "Unresolved attribute reference 'be_civilized' for class 'Thing'"
Attempt 2
Annotate ContainerOfThings.get_thing() return value as Thing
Less surprisingly, this triggers errors from both pycharm and mypy about Thing not having the 'be_civilized' attribute.
Attempt 3
Use ThingType = TypeVar("ThingType", bound=Thing) as return value for ContainerOfThings.get_thing()
I believe (?) that this is what TypeVar is intended for, and it works, except for the fact that mypy then requires thingy to be be annotated with BetterThing, along with every return value of ContainerOfThings.get_thing(), which will be quite cumbersome with my 'real' library.
Is there an elegant solution for this? Is get_unique_subclass() too dirty a trick to play nice with static analysis? Is there something clever to do with typing_extensions.Protocol that I could not come up with?
Thanks for your suggestions.
Basically you need ContainerOfThings to be generic:
https://mypy.readthedocs.io/en/stable/generics.html#defining-generic-classes
And then I think it would be better for ContainerOfThings to be explicit about the type of thing that it will generate instead of auto-magically locating some sub-class that has been defined.
We can put this together in a way that will satisfy mypy (and I would expect pycharm too, though I haven't tried it)...
from typing import Dict, Generic, Type, TypeVar
class Thing:
def shout(self):
print(f"{self} says AAAAAAAAAaaaaaaaa")
T = TypeVar('T', bound=Thing)
class ContainerOfThings(Generic[T]):
def __init__(self, thing_cls: Type[T]):
self._thing_cls = thing_cls
self._things: Dict[str, T] = {}
def add_thing(self, id_: str):
self._things[id_] = self._thing_cls()
def get_thing(self, id_: str) -> T:
return self._things[id_]
class BetterThing(Thing):
def be_civilized(self):
print(f"{self} says howdy!")
container = ContainerOfThings(BetterThing)
container.add_thing("some_id")
thingy = container.get_thing("some_id")
thingy.be_civilized() # OK
thingy.do_something_invalid() # error: "BetterThing" has no attribute "do_something_invalid"

Multiple Python Inheritance with abstract class calls __new__ method wrong number of arguments

I have the following situation:
class AbstractSpeaker(ABC):
#abstractmethod
def say_hello(self)->None:
pass
"""... other abstract and maybe even concrete methods ..."""
class BasicGreeter:
"""implements a basic say_hello method so that Speakers
don't need to implement it"""
def __new__(cls):
r=object.__new__(cls)
r._language=Language.English #assume that Language is an appropriate Enum
return r
#property
def language(self)->Language:
return self._language
#language.setter
def language(self, newLanguage:Language):
self._language=newLanguage
def say_hello(self)->None:
if self.language is Language.English:
print("Hello")
elif self.language is Language.German:
print("Hallo")
elif self.language is Language.Italian:
print("Ciao")
else:
print("Hi")
class Person(BasicGreeter, AbstractSpeaker):
def __init__(self, name):
self.name=name
"""goes on to implement everything AbstractSpeaker speaker commands but should
use the say_hello method from BasicGreeter, who should also hold the language
information.
"""
But when I want to generate a new Person I get an TypeError:
>>> peter=Person('Peter')
Traceback (most recent call last):
File "<pyshell#19>", line 1, in <module>
peter=Person('Peter')
TypeError: __new__() takes 1 positional argument but 2 were given
Now I get that it has something to do with the fact that I have implemented a new method in BasicGreeter because when I rewrite BasicGreeter to use an init method instead of the new-method and then explicitly call super.init() in the Person's init it works. But I wanted to know if there is any way to do this with the new method as having to invoke super.init() in every Speakers init can be a bug source if ever forgotten. Any ideas, can this (a class implementing basic compliance in some areas for an abstract class so that children of that abstract class don't need to implement the same code) be done? I know that I could always go the 3 generation route making the BasicGreeter an abstract class, but I think that is more elegant way as it allows BasicGreeter to be used for other families f.e. I might have a class Robot with a subclass FriendlyRobot that could also make use of inheriting from BasicGreeter but a Robot is not necessarily a speaker and vice versa.
The error gives you the solution, simply let the new method to take the arguments even if you don't use them:
from abc import ABC, abstractmethod
class Language():
English = 'english'
class AbstractSpeaker(ABC):
#abstractmethod
def say_hello(self)->None:
pass
"""... other abstract and maybe even concrete methods ..."""
class BasicGreeter:
"""implements a basic say_hello method so that Speakers don't need to implement it"""
def __new__(cls, *args, **kwargs): # Fix to your code
# print(args,kwargs)
r=object.__new__(cls)
r._language=Language.English
return r
#property
def language(self)->Language:
return self._language
#language.setter
def language(self, newLanguage:Language):
self._language=newLanguage
def say_hello(self)->None:
if self.language is Language.English:
print("Hello")
elif self.language is Language.German:
print("Hallo")
elif self.language is Language.Italian:
print("Ciao")
else:
print("Hi")
class Person(BasicGreeter, AbstractSpeaker):
def __init__(self, name):
self.name=name
"""goes on to implement everything AbstractSpeaker speaker commands but should
use the say_hello method from BasicGreeter, who should also hold the language
information.
"""
This piece of code works, said that, I'm not sure in what you need such a structure inside new, is a way of making a singleton?

Registering classes to factory with classes in different files

I have a factory as shown in the following code:
class ClassFactory:
registry = {}
#classmethod
def register(cls, name):
def inner_wrapper(wrapped_class):
if name in cls.registry:
print(f'Class {name} already exists. Will replace it')
cls.registry[name] = wrapped_class
return wrapped_class
return inner_wrapper
#classmethod
def create_type(cls, name):
exec_class = cls.registry[name]
type = exec_class()
return type
#ClassFactory.register('Class 1')
class M1():
def __init__(self):
print ("Starting Class 1")
#ClassFactory.register('Class 2')
class M2():
def __init__(self):
print("Starting Class 2")
This works fine and when I do
if __name__ == '__main__':
print(ClassFactory.registry.keys())
foo = ClassFactory.create_type("Class 2")
I get the expected result of dict_keys(['Class 1', 'Class 2']) Starting Class 2
Now the problem is that I want to isolate classes M1 and M2 to their own files m1.py and m2.py, and in the future add other classes using their own files in a plugin manner.
However, simply placing it in their own file
m2.py
from test_ import ClassFactory
#MethodFactory.register('Class 2')
class M2():
def __init__(self):
print("Starting Class 2")
gives the result dict_keys(['Class 1']) since it never gets to register the class.
So my question is: How can I ensure that the class is registered when placed in a file different from the factory, without making changes to the factory file whenever I want to add a new class? How to self register in this way? Also, is this decorator way a good way to do this kind of thing, or are there better practices?
Thanks
How can I ensure that the class is registered when placed in a file different from the factory, without making changes to the factory file whenever I want to add a new class?
I'm playing around with a similar problem, and I've found a possible solution. It seems too much of a 'hack' though, so set your critical thinking levels to 'high' when reading my suggestion below :)
As you've mentioned in one of your comments above, the trick is to force the loading of the individual *.py files that contain individual class definitions.
Applying this to your example, this would involve:
Keeping all class implementations in a specific folders, e.g., structuring the files as follows:
.
└- factory.py # file with the ClassFactory class
└─ classes/
└- __init__.py
└- m1.py # file with M1 class
└- m2.py # file with M2 class
Adding the following statement to the end of your factory.py file, which will take care of loading and registering each individual class:
from classes import *
Add a piece of code like the snippet below to your __init__.py within the classes/ foder, so that to dynamically load all classes [1]:
from inspect import isclass
from pkgutil import iter_modules
from pathlib import Path
from importlib import import_module
# iterate through the modules in the current package
package_dir = Path(__file__).resolve().parent
for (_, module_name, _) in iter_modules([package_dir]):
# import the module and iterate through its attributes
module = import_module(f"{__name__}.{module_name}")
for attribute_name in dir(module):
attribute = getattr(module, attribute_name)
if isclass(attribute):
# Add the class to this package's variables
globals()[attribute_name] = attribute
If I then run your test code, I get the desired result:
# test.py
from factory import ClassFactory
if __name__ == "__main__":
print(ClassFactory.registry.keys())
foo = ClassFactory.create_type("Class 2")
$ python test.py
dict_keys(['Class 1', 'Class 2'])
Starting Class 2
Also, is this decorator way a good way to do this kind of thing, or are there better practices?
Unfortunately, I'm not experienced enough to answer this question. However, when searching for answers to this problem, I've came across the following sources that may be helpful to you:
[2] : this presents a method for registering class existence based on Python Metaclasses. As far as I understand, it relies on the registering of subclasses, so I don't know how well it applies to your case. I did not follow this approach, as I've noticed that the new edition of the book suggests the use of another technique (see bullet below).
[3], item 49 : this is the 'current' suggestion for subclass registering, which relies on the definition of the __init_subclass__() function in a base class.
If I had to apply the __init_subclass__() approach to your case, I'd do the following:
Add a Registrable base class to your factory.py (and slightly re-factor ClassFactory), like this:
class Registrable:
def __init_subclass__(cls, name:str):
ClassFactory.register(name, cls)
class ClassFactory:
registry = {}
#classmethod
def register(cls, name:str, sub_class:Registrable):
if name in cls.registry:
print(f'Class {name} already exists. Will replace it')
cls.registry[name] = sub_class
#classmethod
def create_type(cls, name):
exec_class = cls.registry[name]
type = exec_class()
return type
from classes import *
Slightly modify your concrete classes to inherit from the Registrable base class, e.g.:
from factory import Registrable
class M2(Registrable, name='Class 2'):
def __init__(self):
print ("Starting Class 2")

Pickle a Dynamically Created Mixin-Class

I am dynamically creating classes using mixins, following the pattern in this SO answer, but it's giving me pickling errors:
https://stackoverflow.com/a/28205308/2056246
(In case anyone is curious why on earth I'd want to do this, it's really useful for machine learning when you might want to test different combinatorial pipelines of operations).
Here's a minimally reproducible example:
class AutoMixinMeta(ABCMeta):
"""
Helps us conditionally include Mixins, which is useful if we want to switch between different
combinations of models (ex. SBERT with Doc Embedding, RoBERTa with positional embeddings).
class Sub(metaclass = AutoMixinMeta):
def __init__(self, name):
self.name = name
"""
def __call__(cls, *args, **kwargs):
try:
mixin = kwargs.pop('mixin')
if isinstance(mixin, list):
mixin_names = list(map(lambda x: x.__name__, mixin))
mixin_name = '.'.join(mixin_names)
cls_list = tuple(mixin + [cls])
else:
mixin_name = mixin.__name__
cls_list = tuple([mixin, cls])
name = "{}With{}".format(cls.__name__, mixin_name)
cls = type(name, cls_list, dict(cls.__dict__))
except KeyError:
pass
return type.__call__(cls, *args, **kwargs)
class Mixer(metaclass = AutoMixinMeta):
""" Class to mix different elements in.
a = Mixer(config=config, mixin=[A, B, C])
"""
pass
class A():
pass
class B():
pass
config={
'test_a': True,
'test_b': True
}
def get_mixins(config):
mixins = []
if config['test_a']:
mixins.append(A)
if config['test_b']:
mixins.append(B)
return mixins
to_mix = get_mixins(config)
c = Mixer(mixin=to_mix)
import pickle
pickle.dump(c, open('tmp/test.pkl', 'wb'))
---------------------------------------------------------------------------
PicklingError Traceback (most recent call last)
<ipython-input-671-a7661c543ef8> in <module>
23
24 import pickle
---> 25 pickle.dump(c, open('tmp/test.pkl', 'wb'))
PicklingError: Can't pickle <class '__main__.MixerWithA.B'>: attribute lookup MixerWithA.B on __main__ failed
I've read several answers focusing on using __reduce__:
https://stackoverflow.com/a/11526524/2056246
https://stackoverflow.com/a/11493777/2056246
Pickling dynamically created classes
But none of them deal with this dynamic creation on the metaclass level.
Has anyone run into this problem before/have any idea how to solve it? I don't know enough about metaclasses unfortunately to understand how to approach this problem.
Not enough time right now for a full answer, with working code -
but since you have the metaclass already, probably this can be made to work
by simply implementing methods used by the pickling protocol on the metaclass itself - __getstate__ and __setstate__. Use the __qualname__ of the base classes as a tuple of strings as the serialized information you will need to re-create the pickled classes later.
(ping me in the comments if you are reading this 48h+ from now and there is no example code yet)

Improper use of __new__ to generate class instances?

I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.
I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:
import os
class FileSystem(object):
class NoAccess(Exception):
pass
def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
def __init__ (self,path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')
print type(data1)
print type(data2)
print data2.count_files()
I thought this would be a good use of __new__ but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?
I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
# Works in Python 2 and 3.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
#classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
Python 3.6+ Update
The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.
I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().
# Requires Python 3.6+
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Pattern for matching "xxx://" # x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = cls._registry.get(path_prefix)
if subclass:
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise cls.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class Ufs(Nfs, path_prefix='ufs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise self.NoAccess(f'Cannot read directory {path!r}')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
data4 = FileSystem('ufs://192.168.1.18')
print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(f'file count: {data2.count_files()}') # -> file count: <some number>
try:
data3 = FileSystem('c:/foobar') # A non-existent directory.
except FileSystem.NoAccess as exc:
print(f'{exc} - FileSystem.NoAccess exception raised as expected')
else:
raise RuntimeError("Non-existent path should have raised Exception!")
try:
data4 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(f'{exc} - FileSystem.Unknown exception raised as expected')
else:
raise RuntimeError("Unregistered path prefix should have raised Exception!")
In my opinion, using __new__ in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs and LocalDrive with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem class:
class FileSystem(object):
# other code ...
#staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')
Edit [BLUF]: there is no problem with the answer provided by #martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.
I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that #martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.
To follow up on a comment trail with #martinaeu I have taken the following snippet from his answer:
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")
This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.
Firstly, for the decorator on __init_subclass__, per the PEP:
One could require the explicit use of #classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.
Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).
In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as #martineau does here:
class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...
When browsing through the PEP:
As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.
Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().
If we look at the following:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem, path_prefix='nfs'): ...
This will still run without issue, but if we remove the kwargs.pop():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs) # throws TypeError
cls._registry[path_prefix] = cls # Add class to registry.
The error thrown is already known and described in the PEP:
In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.
What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).
To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[cls._path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem):
_path_prefix = 'nfs'
...
Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.
So to #martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.

Categories