Add "collection" of attributes directly to top level of a class - python

I am trying to capture (S3) logs in a structured way. I am capturing the access-related elements with this type of tuple:
class _Access(NamedTuple):
time: datetime
ip: str
actor: str
request_id: str
action: str
key: str
request_uri: str
status: int
error_code: str
I then have a class that uses this named tuple as follows (edited just down to relevant code):
class Logs:
def __init__(self, log: str):
raw_logs = match(S3_LOG_REGEX, log)
if raw_logs is None:
raise FormatError(log)
logs = raw_logs.groups()
timestamp = datetime.strptime(logs[2], "%d/%b/%Y:%H:%M:%S %z")
http_status = int(logs[9])
access = _Access(
timestamp,
logs[3],
logs[4],
logs[5],
logs[6],
logs[7],
logs[8],
http_status,
logs[10],
)
self.access = access
The problem is that it is too verbose when I now want to use it:
>>> log_struct = Logs(raw_log)
>>> log_struct.access.action # I don't want to have to add `access`
As I mention above, I'd rather be able to do something like this:
>>> log_struct = Logs(raw_log)
>>> log_struct.action
But I still want to have this clean named tuple called _Access. How can I make everything from access available at the top level?
Specifically, I have this line:
self.access = access
which is giving me that extra "layer" that I don't want. I'd like to be able to "unpack" it somehow, similar to how we can unpack arguments by passing the star in *args. But I'm not sure how I can unpack the tuple in this case.

What you really need for your use case is an alternative constructor for your NamedTuple subclass to parse a string of a log entry into respective fields, which can be done by creating a class method that calls the __new__ method with arguments parsed from the input string.
Using just the fields of ip and action as a simplified example:
from typing import NamedTuple
class Logs(NamedTuple):
ip: str
action: str
#classmethod
def parse(cls, log: str) -> 'Logs':
return cls.__new__(cls, *log.split())
log_struct = Logs.parse('192.168.1.1 GET')
print(log_struct)
print(log_struct.ip)
print(log_struct.action)
This outputs:
Logs(ip='192.168.1.1', action='GET')
192.168.1.1
GET

I agree with #blhsing and recommend that solution. This is assuming that there are not extra attributes required to be apply to the named tuple (say storing the raw log value).
If you really need the object to remain composed, another way to support accessing the properties of the _Access class would be to override the __getattr__ method [PEP 562] of Logs
The __getattr__ function at the module level should accept one
argument which is the name of an attribute and return the computed
value or raise an AttributeError:
def __getattr__(name: str) -> Any: ...
If an attribute is not found on a module object through the normal
lookup (i.e. object.__getattribute__), then __getattr__ is
searched in the module __dict__ before raising an AttributeError.
If found, it is called with the attribute name and the result is
returned. Looking up a name as a module global will bypass module
__getattr__. This is intentional, otherwise calling __getattr__
for builtins will significantly harm performance.
E.g.
from typing import NamedTuple, Any
class _Access(NamedTuple):
foo: str
bar: str
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
def __getattr__(self, name: str) -> Any:
return getattr(self.access, name)
When you request an attribute of Logs which is not present it will try to access the attribute through the Logs.access attribute. Meaning you can write code like this:
logs = Logs("fizz buzz")
print(f"{logs.log=}, {logs.foo=}, {logs.bar=}")
logs.log='fizz buzz', logs.foo='fizz', logs.bar='buzz'
Note that this would not preserve the typing information through to the Logs object in most static analyzers and autocompletes. That to me would be a compelling enough reason not to do this, and continue to use the more verbose way of accessing values as you describe in your question.
If you still really need this, and want to remain type safe. Then I would add properties to the Logs class which fetch from the _Access object.
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
#property
def foo(self) -> str:
return self.access.foo
#property
def bar(self) -> str:
return self.access.bar
This avoids the type safty issues, and depending on how much code you write using the Logs instances, still can cut down on other boilerplate dramatically.

Related

How to annotate a return type as either a class instance or its (unique) subclass instance?

I am writing a python library used by importing and (optionally) sub-classing some of the 'helper classes' it provides. I fail to come up with a design that would properly let static analysis tools properly recognise the types that my 'helper classes' methods deal with. Here's a MWE illustrating (one of) the issues I run into:
My lib
from typing import Dict
class Thing:
def shout(self):
print(f"{self} says AAAAAAAAAaaaaaaaa")
class ContainerOfThings:
def __init__(self):
thing_cls = self._thing_cls = get_unique_subclass(Thing)
self._things: Dict[str, thing_cls] = {}
def add_thing(self, id_: str):
self._things[id_] = self._thing_cls()
def get_thing(self, id_: str):
return self._things[id_]
def get_unique_subclass(cls):
# this works but maybe there's a better way to do this?
classes = cls.__subclasses__()
if len(classes) == 0:
return cls
elif len(classes) == 1:
return classes[0]
elif len(classes) > 1:
raise RuntimeError(
"This class should only be subclassed once", cls, classes
)
What I expect users to do with it
class BetterThing(Thing):
def be_civilized(self):
print(f"{self} says howdy!")
container = ContainerOfThings()
container.add_thing("some_id")
thingy = container.get_thing("some_id")
thingy.be_civilized()
thingy.do_something_invalid() # here I would like mypy to detect that this will not work
This snippet does not alarm static analysis tools, because thingy is detected as Any, but fails at runtime on the last line because do_something_invalid() is not defined. Isn't it possible to give hints that thingy is in fact an instance of BetterThing here?
My attempts so far:
Attempt 1
Annotate ContainerOfThings._things as Dict[str, Thing] instead of Dict[str, thing_cls]
This passes mypy, but pycharm detects thingy as an instance of Thing and thus complains about "Unresolved attribute reference 'be_civilized' for class 'Thing'"
Attempt 2
Annotate ContainerOfThings.get_thing() return value as Thing
Less surprisingly, this triggers errors from both pycharm and mypy about Thing not having the 'be_civilized' attribute.
Attempt 3
Use ThingType = TypeVar("ThingType", bound=Thing) as return value for ContainerOfThings.get_thing()
I believe (?) that this is what TypeVar is intended for, and it works, except for the fact that mypy then requires thingy to be be annotated with BetterThing, along with every return value of ContainerOfThings.get_thing(), which will be quite cumbersome with my 'real' library.
Is there an elegant solution for this? Is get_unique_subclass() too dirty a trick to play nice with static analysis? Is there something clever to do with typing_extensions.Protocol that I could not come up with?
Thanks for your suggestions.
Basically you need ContainerOfThings to be generic:
https://mypy.readthedocs.io/en/stable/generics.html#defining-generic-classes
And then I think it would be better for ContainerOfThings to be explicit about the type of thing that it will generate instead of auto-magically locating some sub-class that has been defined.
We can put this together in a way that will satisfy mypy (and I would expect pycharm too, though I haven't tried it)...
from typing import Dict, Generic, Type, TypeVar
class Thing:
def shout(self):
print(f"{self} says AAAAAAAAAaaaaaaaa")
T = TypeVar('T', bound=Thing)
class ContainerOfThings(Generic[T]):
def __init__(self, thing_cls: Type[T]):
self._thing_cls = thing_cls
self._things: Dict[str, T] = {}
def add_thing(self, id_: str):
self._things[id_] = self._thing_cls()
def get_thing(self, id_: str) -> T:
return self._things[id_]
class BetterThing(Thing):
def be_civilized(self):
print(f"{self} says howdy!")
container = ContainerOfThings(BetterThing)
container.add_thing("some_id")
thingy = container.get_thing("some_id")
thingy.be_civilized() # OK
thingy.do_something_invalid() # error: "BetterThing" has no attribute "do_something_invalid"

Typing for decorator that wraps attrs.frozen and adds a new field

I am trying to set up a class decorator in Python that acts like attr.frozen but adds an additional field before creation (as well as a few other things). While the code works fine, I'm having trouble getting mypy to realize that the new class has the new field. I've tried to do this through a combination of a custom mypy plugin (exactly as described in attr's documentation) and a Protocol that defines that the new class has the given field. In summary, the code breaks down as follows (all in a single file, although I've broken it up here).
It should be noted I'm running Python 3.7, so I'm using typing_extensions where needed, but I believe this problem persists regardless of version.
First define the Protocol that should inform mypy that the new class has the new field (called added here):
from typing_extensions import Protocol
class Proto(Protocol):
def __init__(self, added: float, *args, **kwargs):
...
#property
def added(self) -> float:
...
Now define the field_transformer function that adds the new field, as per attr's documentation:
from typing import Type, List
import attr
def _field_transformer(cls: type, fields: List[attr.Attribute]) -> List[attr.Attribute]:
return [
# For some reason mypy has trouble with attr.Attribute's signature
# Bonus points if someone can point out a fix that doesn't use type: ignore
attr.Attribute ( # type: ignore
"added", # name
attr.NOTHING, # default
None, # validator
True, # repr
None, # cmp
None, # hash
True, # init
False, # inherited
type=float,
order=float,
),
*fields,
]
Now, finally, set up a class decorator that does what we want:
from functools import wraps
from typing import Callable, TypeVar
_T = TypeVar("_T", bound=Proto)
_C = TypeVar("_C", bound=type)
def transform(_cls: _C = None, **kwargs):
def transform_decorator(cls: _C) -> Callable[[], Type[_T]]:
#wraps(cls)
def wrapper() -> Type[_T]:
if "field_transformer" not in kwargs:
kwargs["field_transformer"] = _field_transformer
return attr.frozen(cls, **kwargs)
return wrapper()
if _cls is None:
return transform_decorator
return transform_decorator(_cls)
And now for the (failing) mypy tests:
#transform
class Test:
other_field: str
# E: Too many arguments for "Test"
# E: Argument 1 to "Test" has incompatible type "float"; expected "str"
t = Test(0.0, "hello, world")
print(t.added) # E: "Test" has no attribute "added"
Ideally I'd like mypy to eliminate all three of these errors. I am frankly not sure whether this is possible; it could be that the dynamic addition of an attribute is just not typeable and we may have to force users of our library to write custom typing stubs when they use the decorator. However, since we always add the same attribute(s) to the generated class, it would be great if there is a solution, even if that means writing a custom mypy plugin that supports this decorator in particular (if that's even possible).

Python lambdas for __bool__, __str__, etc

Often in Python it is helpful to make use of duck typing, for instance, imagine I have an object spam, whose prompt attribute controls the prompt text in my application. Normally, I would say something like:
spam.prompt = "fixed"
for a fixed prompt. However, a dynamic prompt can also be achived - while I can't change the spam class to use a function as the prompt, thanks to duck typing, because the userlying spam object calls str, I can create a dynamic prompt like so:
class MyPrompt:
def __str__( self ):
return eggs.get_user_name() + ">"
spam.prompt = MyPrompt()
This principal could be extended to make any attribute dynamic, for instance:
class MyEnabled:
def __bool__( self ):
return eggs.is_logged_in()
spam.enabled = MyEnabled()
Sometimes though, it would be more succinct to have this inline, i.e.
spam.prompt = lambda: eggs.get_user_name() + ">"
spam.enabled = eggs.is_logged_in
These of course don't work, because neither the __str__ of the lambda or the __bool__ of the function return the actual value of the call.
I feel like a solution for this should be simple, am I missing something, or do I need to wrap my function in a class every time?
What you want are computed attributes. Python's support for computed attributes is the descriptor protocol, which has a generic implementation as the builtin property type.
Now the trick is that, as documented (cf link above), descriptors only work when they are class attributes. Your code snippet is incomplete as it doesn't contains the definition of the spam object but I assume it's a class instance, so you cannot just do spam.something = property(...) - as the descriptor protocol wouldn't then be invoked on property().
The solution here is the good old "strategy" design pattern: use properties (or custom descriptors, but if you only have a couple of such attributes the builtin property will work just fine) that delegates to a "strategy" function:
def default_prompt_strategy(obj):
return "fixed"
def default_enabled_strategy(obj):
return False
class Spam(object):
def __init__(self, prompt_strategy=default_prompt_strategy, enabled_strategy=default_enabled_strategy):
self.prompt = prompt_strategy
self.enabled = enabled_strategy
#property
def prompt(self):
return self._prompt_strategy(self)
#prompt.setter
def prompt(self, value):
if not callable(value):
raise TypeError("PromptStrategy must be a callable")
self._prompt_strategy = value
#property
def enabled(self):
return self._enabled_strategy(self)
#enabled.setter
def enabled(self, value):
if not callable(value):
raise TypeError("EnabledtStrategy must be a callable")
self._enabled_strategy = value
class Eggs(object):
def is_logged_in(self):
return True
def get_user_name(self):
return "DeadParrot"
eggs = Eggs()
spam = Spam(enabled_strategy=lambda obj: eggs.is_logged_in())
spam.prompt = lambda obj: "{}>".format(eggs.get_user_name())

Typing dict mixin class with Mypy

I'm trying to write a small mixin class to somewhat bridge Set and MutableMapping types: I want the mapping types to have ability to receive some objects (bytes), hash them, and store them, so they are accessible by that hash.
Here's a working version of mixing this class with standard dict:
from hashlib import blake2b
class HashingMixin:
def add(self, content):
digest = blake2b(content).hexdigest()
self[digest] = content
class HashingDict(dict, HashingMixin):
pass
However I can't figure out how to add type annotations.
From https://github.com/python/mypy/issues/1996 it seems the mixin has to subclass abc.ABC and abc.abstractmethod-define all the methods it expects to call, so here's my shot:
import abc
from hashlib import blake2b
from typing import Dict
class HashingMixin(abc.ABC):
def add(self, content: bytes) -> None:
digest = blake2b(content).hexdigest()
self[digest] = content
#abc.abstractmethod
def __getitem__(self, key: str) -> bytes:
raise NotImplementedError
#abc.abstractmethod
def __setitem__(self, key: str, content: bytes) -> None:
raise NotImplementedError
class HashingDict(Dict[str, bytes], HashingMixin):
pass
Then Mypy complains about the HashingDict definition:
error: Definition of "__getitem__" in base class "dict" is incompatible with definition in base class "HashingMixin"
error: Definition of "__setitem__" in base class "dict" is incompatible with definition in base class "HashingMixin"
error: Definition of "__setitem__" in base class "MutableMapping" is incompatible with definition in base class "HashingMixin"
error: Definition of "__getitem__" in base class "Mapping" is incompatible with definition in base class "HashingMixin"
Revealing types with:
reveal_type(HashingMixin.__getitem__)
reveal_type(HashingDict.__getitem__)
yields:
error: Revealed type is 'def (coup.content.HashingMixin, builtins.str) -> builtins.bytes'
error: Revealed type is 'def (builtins.dict[_KT`1, _VT`2], _KT`1) -> _VT`2'
I don't know what is wrong :(
This appears to be a bug in mypy -- see this TODO in the code mypy uses to analyze the MRO of classes using multiple inheritance. In short, mypy is incorrectly completing ignoring that you've parameterized Dict with concrete values, and is instead analyzing the code as if you were using Dict.
I believe https://github.com/python/mypy/issues/5973 is probably the most relevant issue in the issue tracker: the root cause is the same.
Until that bug is fixed, you can suppress the errors mypy is generating on that line by adding a # type: ignore to whatever line has the errors. So in your case, you could do the following:
import abc
from hashlib import blake2b
from typing import Dict
class HashingMixin(abc.ABC):
def add(self, content: bytes) -> None:
digest = blake2b(content).hexdigest()
self[digest] = content
#abc.abstractmethod
def __getitem__(self, key: str) -> bytes:
raise NotImplementedError
#abc.abstractmethod
def __setitem__(self, key: str, content: bytes) -> None:
raise NotImplementedError
class HashingDict(Dict[str, bytes], HashingMixin): # type: ignore
pass
If you decide to take this approach, I recommend also leaving an additional comment documenting why you're suppressing those errors and running mypy with the --warn-unused-ignores flag.
The former is for the benefit of any future readers of your code; the latter will make mypy report a warning whenever it encounters a # type: ignore that is not actually suppressing any errors and so can safely be deleted.
(And of course, you can always take a stab at contributing a fix yourself!)

Improper use of __new__ to generate class instances?

I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.
I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:
import os
class FileSystem(object):
class NoAccess(Exception):
pass
def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
def __init__ (self,path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')
print type(data1)
print type(data2)
print data2.count_files()
I thought this would be a good use of __new__ but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?
I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
# Works in Python 2 and 3.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
#classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
Python 3.6+ Update
The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.
I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().
# Requires Python 3.6+
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Pattern for matching "xxx://" # x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = cls._registry.get(path_prefix)
if subclass:
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise cls.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class Ufs(Nfs, path_prefix='ufs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise self.NoAccess(f'Cannot read directory {path!r}')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
data4 = FileSystem('ufs://192.168.1.18')
print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(f'file count: {data2.count_files()}') # -> file count: <some number>
try:
data3 = FileSystem('c:/foobar') # A non-existent directory.
except FileSystem.NoAccess as exc:
print(f'{exc} - FileSystem.NoAccess exception raised as expected')
else:
raise RuntimeError("Non-existent path should have raised Exception!")
try:
data4 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(f'{exc} - FileSystem.Unknown exception raised as expected')
else:
raise RuntimeError("Unregistered path prefix should have raised Exception!")
In my opinion, using __new__ in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs and LocalDrive with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem class:
class FileSystem(object):
# other code ...
#staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')
Edit [BLUF]: there is no problem with the answer provided by #martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.
I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that #martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.
To follow up on a comment trail with #martinaeu I have taken the following snippet from his answer:
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")
This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.
Firstly, for the decorator on __init_subclass__, per the PEP:
One could require the explicit use of #classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.
Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).
In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as #martineau does here:
class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...
When browsing through the PEP:
As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.
Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().
If we look at the following:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem, path_prefix='nfs'): ...
This will still run without issue, but if we remove the kwargs.pop():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs) # throws TypeError
cls._registry[path_prefix] = cls # Add class to registry.
The error thrown is already known and described in the PEP:
In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.
What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).
To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[cls._path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem):
_path_prefix = 'nfs'
...
Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.
So to #martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.

Categories