I am trying to use type hinting to specify the API to follow when implementing a connector class (to a broker, in this case).
I want to specify that such class(es) should be context manager(s)
How do I do that?
Let me reword it more clearly: how can I define the Broker class so that it indicates that its concrete implementations, e.g. the Rabbit class, must be context managers?
Is there a practical way? Do I have to specify __enter__ and __exit__ and just inherit from Protocol?
Is it enough to inherit from ContextManager?
By the way, should I use #runtime or #runtime_checkable? (My VScode linter seems to have problems finding those in typing. I am using python 3 7.5)
I know how to do it with ABC's, but I would like to learn how to do it with protocol definitions (which I have used fine already, but they weren't context managers).
I cannot make out how to use the ContextManager type. So far I haven't been able to find good examples from the official docs.
At present I came up with
from typing import Protocol, ContextManager, runtime, Dict, List
#runtime
class Broker(ContextManager):
"""
Basic interface to a broker.
It must be a context manager
"""
def publish(self, data: str) -> None:
"""
Publish data to the topic/queue
"""
...
def subscribe(self) -> None:
"""
Subscribe to the topic/queue passed to constructor
"""
...
def read(self) -> str:
"""
Read data from the topic/queue
"""
...
and the implementation is
#implements(Broker)
class Rabbit:
def __init__(self,
url: str,
queue: str = 'default'):
"""
url: where to connect, i.e. where the broker is
queue: the topic queue, one only
"""
# self.url = url
self.queue = queue
self.params = pika.URLParameters(url)
self.params.socket_timeout = 5
def __enter__(self):
self.connection = pika.BlockingConnection(self.params) # Connect to CloudAMQP
self.channel = self.connection.channel() # start a channel
self.channel.queue_declare(queue=self.queue) # Declare a queue
return self
def __exit__(self, exc_type, exc_value, traceback):
self.connection.close()
def publish(self, data: str):
pass # TBD
def subscribe(self):
pass # TBD
def read(self):
pass # TBD
Note: the implements decorator works fine (it comes form a previous project), it checks the class is a subclass of the given protocol
Short answer -- your Rabbit implementation is actually fine as-is. Just add some type hints to indicate that __enter__ returns an instance of itself and that __exit__ returns None. The types of the __exit__ params don't actually matter too much.
Longer answer:
Whenever I'm not sure what exactly some type is/what some protocol is, it's often helpful to check TypeShed, the collection of type hints for the standard library (and a few 3rd party libraries).
For example, here is the definition of typing.ContextManager. I've copied it below here:
from types import TracebackType
# ...snip...
_T_co = TypeVar('_T_co', covariant=True) # Any type covariant containers.
# ...snip...
#runtime_checkable
class ContextManager(Protocol[_T_co]):
def __enter__(self) -> _T_co: ...
def __exit__(self, __exc_type: Optional[Type[BaseException]],
__exc_value: Optional[BaseException],
__traceback: Optional[TracebackType]) -> Optional[bool]: ...
From reading this, we know a few things:
This type is a Protocol, which means any type that happens to implement __enter__ and __exit__ following the given signatures above would be a valid subtype of typing.ContextManager without needing to explicitly inherit it.
This type is runtime checkable, which means doing isinstance(my_manager, ContextManager) also works, if you want to do that for whatever reason.
The parameter names of __exit__ are all prefixed with two underscores. This is a convention type checkers use to indicate those arguments are positional only: using keyword arguments on __exit__ won't type check. Practically speaking, that means you can name your own __exit__ parameters whatever you'd like while still being in compliance with the protocol.
So, putting it together, here is the smallest possible implementation of a ContextManager that still type checks:
from typing import ContextManager, Type, Generic, TypeVar
class MyManager:
def __enter__(self) -> str:
return "hello"
def __exit__(self, *args: object) -> None:
return None
def foo(manager: ContextManager[str]) -> None:
with manager as x:
print(x) # Prints "hello"
reveal_type(x) # Revealed type is 'str'
# Type checks!
foo(MyManager())
def bar(manager: ContextManager[int]) -> None: ...
# Does not type check, since MyManager's `__enter__` doesn't return an int
bar(MyManager())
One nice little trick is that we can actually get away with a pretty lazy __exit__ signature, if we're not actually planning on using the params. After all, if __exit__ will accept basically anything, there's no type safety issue.
(More formally, PEP 484 compliant type checkers will respect that functions are contravariant with respect to their parameter types).
But of course, you can specify the full types if you want. For example, to take your Rabbit implementation:
# So I don't have to use string forward references
from __future__ import annotations
from typing import Optional, Type
from types import TracebackType
# ...snip...
#implements(Broker)
class Rabbit:
def __init__(self,
url: str,
queue: str = 'default'):
"""
url: where to connect, i.e. where the broker is
queue: the topic queue, one only
"""
# self.url = url
self.queue = queue
self.params = pika.URLParameters(url)
self.params.socket_timeout = 5
def __enter__(self) -> Rabbit:
self.connection = pika.BlockingConnection(params) # Connect to CloudAMQP
self.channel = self.connection.channel() # start a channel
self.channel.queue_declare(queue=self.queue) # Declare a queue
return self
def __exit__(self,
exc_type: Optional[Type[BaseException]],
exc_value: Optional[BaseException],
traceback: Optional[TracebackType],
) -> Optional[bool]:
self.connection.close()
def publish(self, data: str):
pass # TBD
def subscribe(self):
pass # TBD
def read(self):
pass # TBD
To answer the new edited-in questions:
How can I define the Broker class so that it indicates that its concrete implementations, e.g. the Rabbit class, must be context managers?
Is there a practical way? Do I have to specify enter and exit and just inherit from Protocol?
Is it enough to inherit from ContextManager?
There are two ways:
Redefine the __enter__ and __exit__ functions, copying the original definitions from ContextManager.
Make Broker subclass both ContextManager and Protocol.
If you subclass only ContextManager, all you are doing is making Broker just inherit whatever methods happen to have a default implementation in ContextManager, more or less.
PEP 544: Protocols and structural typing goes into more details about this. The mypy docs on Protocols have a more user-friendly version of this. For example, see the section on defining subprotocols and subclassing protocols.
By the way, should I use #runtime or #runtime_checkable? (My VScode linter seems to have problems finding those in typing. I am using python 3 7.5)
It should be runtime_checkable.
That said, both Protocol and runtime_checkable were actually added to Python in version 3.8, which is probably why your linter is unhappy.
If you want to use both in older versions of Python, you'll need to pip install typing-extensions, the official backport for typing types.
Once this is installed, you can do from typing_extensions import Protocol, runtime_checkable.
Related
I am trying to set up a class decorator in Python that acts like attr.frozen but adds an additional field before creation (as well as a few other things). While the code works fine, I'm having trouble getting mypy to realize that the new class has the new field. I've tried to do this through a combination of a custom mypy plugin (exactly as described in attr's documentation) and a Protocol that defines that the new class has the given field. In summary, the code breaks down as follows (all in a single file, although I've broken it up here).
It should be noted I'm running Python 3.7, so I'm using typing_extensions where needed, but I believe this problem persists regardless of version.
First define the Protocol that should inform mypy that the new class has the new field (called added here):
from typing_extensions import Protocol
class Proto(Protocol):
def __init__(self, added: float, *args, **kwargs):
...
#property
def added(self) -> float:
...
Now define the field_transformer function that adds the new field, as per attr's documentation:
from typing import Type, List
import attr
def _field_transformer(cls: type, fields: List[attr.Attribute]) -> List[attr.Attribute]:
return [
# For some reason mypy has trouble with attr.Attribute's signature
# Bonus points if someone can point out a fix that doesn't use type: ignore
attr.Attribute ( # type: ignore
"added", # name
attr.NOTHING, # default
None, # validator
True, # repr
None, # cmp
None, # hash
True, # init
False, # inherited
type=float,
order=float,
),
*fields,
]
Now, finally, set up a class decorator that does what we want:
from functools import wraps
from typing import Callable, TypeVar
_T = TypeVar("_T", bound=Proto)
_C = TypeVar("_C", bound=type)
def transform(_cls: _C = None, **kwargs):
def transform_decorator(cls: _C) -> Callable[[], Type[_T]]:
#wraps(cls)
def wrapper() -> Type[_T]:
if "field_transformer" not in kwargs:
kwargs["field_transformer"] = _field_transformer
return attr.frozen(cls, **kwargs)
return wrapper()
if _cls is None:
return transform_decorator
return transform_decorator(_cls)
And now for the (failing) mypy tests:
#transform
class Test:
other_field: str
# E: Too many arguments for "Test"
# E: Argument 1 to "Test" has incompatible type "float"; expected "str"
t = Test(0.0, "hello, world")
print(t.added) # E: "Test" has no attribute "added"
Ideally I'd like mypy to eliminate all three of these errors. I am frankly not sure whether this is possible; it could be that the dynamic addition of an attribute is just not typeable and we may have to force users of our library to write custom typing stubs when they use the decorator. However, since we always add the same attribute(s) to the generated class, it would be great if there is a solution, even if that means writing a custom mypy plugin that supports this decorator in particular (if that's even possible).
I needed to encapsulate functions related to single responsibility of parsing and emiting messages to API endpoint so I have created class Emitter.
class Emitter:
def __init__(self, message: str) -> None:
self.parsed = self.parse_message(message)
self.emit_message(self.parsed)
#staticmethod
def parse_message(msg: str) -> str:
... # parsing code
#staticmethod
def emit_message(msg: str) -> None:
... # emitting code
In order to emit a message I call a short-lived instance of that class with message passed as argument to __init__.
Emitter("my message to send")
__init__ itself directly runs all necessary methods to parse and emit message.
Is it correct to use __init__ to directly run the main responsibility of a class? Or should I use different solution like creating function that first instantiates the class and then calls all the necessary methods?
It looks like you're attempting to do the following at once:
initialize a class Emitter with an attribute message
parse the message
emit the message
IMO, a class entity is a good design choice since it allows granular control flow. Given that each step from above is discrete, I'd recommend executing them as such:
# Instantiate (will automatically parse)
e = Emitter('my message')
# Send the message
e.send_message()
You will need to redesign your class to the following:
class Emitter:
def __init__(self, message: str) -> None:
self.message = message
self.parsed = self.parse_message(message)
#staticmethod
def parse_message(msg: str) -> str:
... # parsing code
# This should be an instance method
def emit_message(msg: str) -> None:
... # emitting code
Also, your parse_message() method could be converted to a validation method, but that's another subject.
I'm very new to type checking in Python. I'd like to find a way to use it to check for this common situation:
class (e.g. my DbQuery class) is instantiated, is in some uninitialized state. e.g. I'm a db query-er but I havent connected to a db yet. You could say (abstractly) the instance is of type 'Unconnected Db Query Connector'
user calls .connect() which sets the class instance to connected. Can now think of this class instance as belong to a new category (protocol?). You could say the instance is of type 'Connected DB Query Connector' now...
user calls .query(), etc. uses the class. The query method is annotated to express that self in this case must be a 'Connected DB Query Connector'
In an incorrect usage, which I would like to detect automatically: the user instantiates the db connector and then calls query() without calling connect first.
Is there a representation for this with annotations? Can I express that the connect() method has caused 'self' to join a new type? or is that the right way to do it?
Is there some other standard mechanism for expressing this and detecting it in Python or mypy?
I might be able to see how this could be expressed with inheritance maybe... I'm not sure
Thanks in advance!
EDIT:
Here's what I wish I could do:
from typing import Union, Optional, NewType, Protocol, cast
class Connector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
# This is a version of class 'A' where conn is None and you can't call query()
class NoQuery(Protocol):
conn: None
# This is a version of class 'A' where conn is initialized. You can query, but you cant call connect()
class CanQuery(Protocol):
conn: Connector
# This class starts its life as a NoQuery. Should switch personality when connect() is called
class A(NoQuery):
def __init__(self) -> None:
self.conn = None
def query(self: CanQuery, sql: str) -> str:
return self.conn.run(sql)
def connect(self: NoQuery, host: str):
# Attempting to change from 'NoQuery' to 'CanQuery' like this
# mypy complains: Incompatible types in assignment (expression has type "CanQuery", variable has type "NoQuery")
self = cast(CanQuery, self)
self.conn = Connector(host)
a = A()
a.connect('host.domain')
print(a.query('SELECT field FROM table'))
b = A()
# mypy should help me spot this. I'm trying to query an unconnected host. self.conn is None
print(b.query('SELECT oops'))
For me, this is a common scenario (an object that has a few distinct and very meaningful modes of operation). Is there no way to express this in mypy?
You may be able to hack something together by making your A class a generic type, (ab)using Literal enums, and annotating the self parameter, but frankly I don't think that's a good idea.
Mypy in general assumes that calling a method won't change the type of a method, and circumventing that is probably not possible without resorting that gross hacks and a bunch of casts or # type: ignores.
Instead, the standard convention is to use two classes -- a "connection" object and a "query" object -- along with context managers. This, as a side benefit, would also let you ensure your connections are always closed once you're done using them.
For example:
from typing import Union, Optional, Iterator
from contextlib import contextmanager
class RawConnector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
def close(self) -> None:
print("Closing connection!")
class Database:
def __init__(self, host: str) -> None:
self.host = host
#contextmanager
def connect(self) -> Iterator[Connection]:
conn = RawConnector(self.host)
yield Connection(conn)
conn.close()
class Connection:
def __init__(self, conn: RawConnector) -> None:
self.conn = conn
def query(self, sql: str) -> str:
return self.conn.run(sql)
db = Database("my-host")
with db.connect() as conn:
conn.query("some sql")
If you really want to combine these two new classes into one, you can by (ab)using literal types, generics, and self annotations and by keeping within the constraint that you can only ever return instances with new personalities.
For example:
# If you are using Python 3.8+, you can import 'Literal' directly from
# typing. But if you need to support older Pythons, you'll need to
# pip-install typing_extensions and import from there.
from typing import Union, Optional, Iterator, TypeVar, Generic, cast
from typing_extensions import Literal
from contextlib import contextmanager
from enum import Enum
class RawConnector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
def close(self) -> None:
print("Closing connection!")
class State(Enum):
Unconnected = 0
Connected = 1
# Type aliases here for readability. We use an enum and Literal
# types mostly so we can give each of our states a nice name. We
# could have also created an empty 'State' class and created an
# 'Unconnected' and 'Connected' subclasses: all that matters is we
# have one distinct type per state/per "personality".
Unconnected = Literal[State.Unconnected]
Connected = Literal[State.Connected]
T = TypeVar('T', bound=State)
class Connection(Generic[T]):
def __init__(self: Connection[Unconnected]) -> None:
self.conn: Optional[RawConnector] = None
def connect(self: Connection[Unconnected], host: str) -> Connection[Connected]:
self.conn = RawConnector(host)
# Important! We *return* the new type!
return cast(Connection[Connected], self)
def query(self: Connection[Connected], sql: str) -> str:
assert self.conn is not None
return self.conn.run(sql)
c1 = Connection()
c2 = c1.connect("foo")
c2.query("some-sql")
# Does not type check, since types of c1 and c2 do not match declared self types
c1.query("bad")
c2.connect("bad")
Basically, it becomes possible to make a type act more or less as a state machine as long as we stick with returning new instances (even if at runtime, we always return just 'self').
With a little more cleverness/a few more compromises, you might even be able to get rid of the cast whenever you transition from one state to another.
But tbh, I consider this sort of trick to be overkill/probably inappropriate for what you seem to be trying to do. I would personally recommend the two classes + contextmanager approach.
I do not know why, but I get this strange error whenever I try to pass to the method of a shared object shared custom class object. Python version: 3.6.3
Code:
from multiprocessing.managers import SyncManager
class MyManager(SyncManager): pass
class MyClass: pass
class Wrapper:
def set(self, ent):
self.ent = ent
MyManager.register('MyClass', MyClass)
MyManager.register('Wrapper', Wrapper)
if __name__ == '__main__':
manager = MyManager()
manager.start()
try:
obj = manager.MyClass()
lst = manager.list([1,2,3])
collection = manager.Wrapper()
collection.set(lst) # executed fine
collection.set(obj) # raises error
except Exception as e:
raise
Error:
---------------------------------------------------------------------------
Traceback (most recent call last):
File "D:\Program Files\Python363\lib\multiprocessing\managers.py", line 228, in serve_client
request = recv()
File "D:\Program Files\Python363\lib\multiprocessing\connection.py", line 251, in recv
return _ForkingPickler.loads(buf.getbuffer())
File "D:\Program Files\Python363\lib\multiprocessing\managers.py", line 881, in RebuildProxy
return func(token, serializer, incref=incref, **kwds)
TypeError: AutoProxy() got an unexpected keyword argument 'manager_owned'
---------------------------------------------------------------------------
What's the problem here?
I ran into this too, as noted, this is a bug in Python multiprocessing (see issue #30256) and the pull request that corrects this has not yet been merged. The pull request has since been superseded by another PR that makes the same change but adds a test as well.
Apart from manually patching your local installation, you have three other options:
you could use the MakeProxyType() callable to specify your proxytype, without relying on the AutoProxy proxy generator,
you could define a custom proxy class,
you can patch the bug with a monkeypatch
I'll describe those options below, after explaining what AutoProxy does:
What's the point of the AutoProxy class
The multiprocessing Manager pattern gives access to shared values by putting the values all in the same, dedicated 'canonical values server' process. All other processes (clients) talk to the server through proxies that then pass messages back and forth with the server.
The server does need to know what methods are acceptable for the type of object, however, so clients can produce a proxy object with the same methods. This is what the AutoProxy object is for. Whenever a client needs a new instance of your registered class, the default proxy the client creates is an AutoProxy, which then asks the server to tell it what methods it can use.
Once it has the method names, it calls MakeProxyType to construct a new class and then creates an instance for that class to return.
All this is deferred until you actually need an instance of the proxied type, so in principle AutoProxy saves a little bit of memory if you are not using certain classes you have registered. It's very little memory, however, and the downside is that this process has to take place in each client process.
These proxy objects use reference counting to track when the server can remove the canonical value. It is that part that is broken in the AutoProxy callable; a new argument is passed to the proxy type to disable reference counting when the proxy object is being created in the server process rather than in a client but the AutoProxy type wasn't updated to support this.
So, how can you fix this? Here are those 3 options:
Use the MakeProxyType() callable
As mentioned, AutoProxy is really just a call (via the server) to get the public methods of the type, and a call to MakeProxyType(). You can just make these calls yourself, when registering.
So, instead of
from multiprocessing.managers import SyncManager
SyncManager.register("YourType", YourType)
use
from multiprocessing.managers import SyncManager, MakeProxyType, public_methods
# arguments: classname, sequence of method names
YourTypeProxy = MakeProxyType("YourType", public_methods(YourType))
SyncManager.register("YourType", YourType, YourTypeProxy)
Feel free to inline the MakeProxyType() call there.
If you were using the exposed argument to SyncManager.register(), you should pass those names to MakeProxyType instead:
# SyncManager.register("YourType", YourType, exposed=("foo", "bar"))
# becomes
YourTypeProxy = MakeProxyType("YourType", ("foo", "bar"))
SyncManager.register("YourType", YourType, YourTypeProxy)
You'd have to do this for all the pre-registered types, too:
from multiprocessing.managers import SyncManager, AutoProxy, MakeProxyType, public_methods
registry = SyncManager._registry
for typeid, (callable, exposed, method_to_typeid, proxytype) in registry.items():
if proxytype is not AutoProxy:
continue
create_method = hasattr(managers.SyncManager, typeid)
if exposed is None:
exposed = public_methods(callable)
SyncManager.register(
typeid,
callable=callable,
exposed=exposed,
method_to_typeid=method_to_typeid,
proxytype=MakeProxyType(f"{typeid}Proxy", exposed),
create_method=create_method,
)
Create custom proxies
You could not rely on multiprocessing creating a proxy for you. You could just write your own. The proxy is used in all processes except for the special 'managed values' server process, and the proxy should pass messages back and forth. This is not an option for the already-registered types, of course, but I'm mentioning it here because for your own types this offers opportunities for optimisations.
Note that you should have methods for all interactions that need to go back to the 'canonical' value instance, so you'd need to use properties to handle normal attributes or add __getattr__, __setattr__ and __delattr__ methods as needed.
The advantage is that you can have very fine-grained control over what methods actually need to exchange data with the server process; in my specific example, my proxy class caches information that is immutable (the values would never change once the object was created), but were used often. That includes a flag value that controls if other methods would do something, so the proxy could just check the flag value and not talk to the server process if not set. Something like this:
class FooProxy(BaseProxy):
# what methods the proxy is allowed to access through calls
_exposed_ = ("__getattribute__", "expensive_method", "spam")
#property
def flag(self):
try:
v = self._flag
except AttributeError:
# ask for the value from the server, "realvalue.flag"
# use __getattribute__ because it's an attribute, not a property
v = self._flag = self._callmethod("__getattribute__", ("flag",))
return flag
def expensive_method(self, *args, **kwargs):
if self.flag: # cached locally!
return self._callmethod("expensive_method", args, kwargs)
def spam(self, *args, **kwargs):
return self._callmethod("spam", args, kwargs
SyncManager.register("Foo", Foo, FooProxy)
Because MakeProxyType() returns a BaseProxy subclass, you can combine that class with a custom subclass, saving yourself having to write any methods that just consist of return self._callmethod(...):
# a base class with the methods generated for us. The second argument
# doubles as the 'permitted' names, stored as _exposed_
FooProxyBase = MakeProxyType(
"FooProxyBase",
("__getattribute__", "expensive_method", "spam"),
)
class FooProxy(FooProxyBase):
#property
def flag(self):
try:
v = self._flag
except AttributeError:
# ask for the value from the server, "realvalue.flag"
# use __getattribute__ because it's an attribute, not a property
v = self._flag = self._callmethod("__getattribute__", ("flag",))
return flag
def expensive_method(self, *args, **kwargs):
if self.flag: # cached locally!
return self._callmethod("expensive_method", args, kwargs)
def spam(self, *args, **kwargs):
return self._callmethod("spam", args, kwargs
SyncManager.register("Foo", Foo, FooProxy)
Again, this won't solve the issue with standard types nested inside other proxied values.
Apply a monkeypatch
I use this to fix the AutoProxy callable, this should automatically avoid patching when you are running a Python version where the fix has already been applied to the source code:
# Backport of https://github.com/python/cpython/pull/4819
# Improvements to the Manager / proxied shared values code
# broke handling of proxied objects without a custom proxy type,
# as the AutoProxy function was not updated.
#
# This code adds a wrapper to AutoProxy if it is missing the
# new argument.
import logging
from inspect import signature
from functools import wraps
from multiprocessing import managers
logger = logging.getLogger(__name__)
orig_AutoProxy = managers.AutoProxy
#wraps(managers.AutoProxy)
def AutoProxy(*args, incref=True, manager_owned=False, **kwargs):
# Create the autoproxy without the manager_owned flag, then
# update the flag on the generated instance. If the manager_owned flag
# is set, `incref` is disabled, so set it to False here for the same
# result.
autoproxy_incref = False if manager_owned else incref
proxy = orig_AutoProxy(*args, incref=autoproxy_incref, **kwargs)
proxy._owned_by_manager = manager_owned
return proxy
def apply():
if "manager_owned" in signature(managers.AutoProxy).parameters:
return
logger.debug("Patching multiprocessing.managers.AutoProxy to add manager_owned")
managers.AutoProxy = AutoProxy
# re-register any types already registered to SyncManager without a custom
# proxy type, as otherwise these would all be using the old unpatched AutoProxy
SyncManager = managers.SyncManager
registry = managers.SyncManager._registry
for typeid, (callable, exposed, method_to_typeid, proxytype) in registry.items():
if proxytype is not orig_AutoProxy:
continue
create_method = hasattr(managers.SyncManager, typeid)
SyncManager.register(
typeid,
callable=callable,
exposed=exposed,
method_to_typeid=method_to_typeid,
create_method=create_method,
)
Import the above and call the apply() function to fix multiprocessing. Do so before you start the manager server!
Solution editing multiprocessing source code
The original answer by Sergey requires you to edit multiprocessing source code as follows:
Find your multiprocessing package (mine, installed via Anaconda, was in /anaconda3/lib/python3.6/multiprocessing).
Open managers.py
Add the key argument manager_owned=True to the AutoProxy function.
Original AutoProxy:
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True):
...
Edited AutoProxy:
def AutoProxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True, manager_owned=True):
...
Solution via code, at run time
I have managed to solve the unexpected keyword argument TypeError exception without editing directly the source code of multiprocessing by instead adding these few lines of code where I use multiprocessing's Managers:
import multiprocessing
# Backup original AutoProxy function
backup_autoproxy = multiprocessing.managers.AutoProxy
# Defining a new AutoProxy that handles unwanted key argument 'manager_owned'
def redefined_autoproxy(token, serializer, manager=None, authkey=None,
exposed=None, incref=True, manager_owned=True):
# Calling original AutoProxy without the unwanted key argument
return backup_autoproxy(token, serializer, manager, authkey,
exposed, incref)
# Updating AutoProxy definition in multiprocessing.managers package
multiprocessing.managers.AutoProxy = redefined_autoproxy
Found temporary solution here.
I've managed to fix it by adding needed keyword to initializer of AutoProxy in multiprocessing\managers.py Though, I don't know if this kwarg is responsible for anything.
I'm creating some classes for dealing with filenames in various types of file shares (nfs, afp, s3, local disk) etc. I get as user input a string that identifies the data source (i.e. "nfs://192.168.1.3" or "s3://mybucket/data") etc.
I'm subclassing the specific filesystems from a base class that has common code. Where I'm confused is in the object creation. What I have is the following:
import os
class FileSystem(object):
class NoAccess(Exception):
pass
def __new__(cls,path):
if cls is FileSystem:
if path.upper().startswith('NFS://'):
return super(FileSystem,cls).__new__(Nfs)
else:
return super(FileSystem,cls).__new__(LocalDrive)
else:
return super(FileSystem,cls).__new__(cls,path)
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
def __init__ (self,path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
def __init__(self,path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return len([x for x in os.listdir(self.path) if os.path.isfile(os.path.join(self.path, x))])
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('/var/log')
print type(data1)
print type(data2)
print data2.count_files()
I thought this would be a good use of __new__ but most posts I read about it's use discourage it. Is there a more accepted way to approach this problem?
I don't think using __new__() to do what you want is improper. In other words, I disagree with the accepted answer to this question which claims factory functions are always the "best way to do it".
If you really want to avoid using it, then the only options are metaclasses or a separate factory function/method (however see Python 3.6+ Update below). Given the choices available, making the __new__() method one — since it's static by default — is a perfectly sensible approach.
That said, below is what I think is an improved version of your code. I've added a couple of class methods to assist in automatically finding all the subclasses. These support the most important way in which it's better — which is now adding subclasses doesn't require modifying the __new__() method. This means it's now easily extensible since it effectively supports what you could call virtual constructors.
A similar implementation could also be used to move the creation of instances out of the __new__() method into a separate (static) factory method — so in one sense the technique shown is just a relatively simple way of coding an extensible generic factory function regardless of what name it's given.
# Works in Python 2 and 3.
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
#classmethod
def _get_all_subclasses(cls):
""" Recursive generator of all class' subclasses. """
for subclass in cls.__subclasses__():
yield subclass
for subclass in subclass._get_all_subclasses():
yield subclass
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass using path prefix. """
path_prefix = cls._get_prefix(path)
for subclass in cls._get_all_subclasses():
if subclass.prefix == path_prefix:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (& no default defined)
raise FileSystem.Unknown(
'path "{}" has no known file system prefix'.format(path))
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem):
prefix = 'nfs'
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem):
prefix = None # Default when no file system prefix is found.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
Python 3.6+ Update
The code above works in both Python 2 and 3.x. However in Python 3.6 a new class method was added to object named __init_subclass__() which makes the finding of subclasses simpler by using it to automatically create a "registry" of them instead of potentially having to check every subclass recursively as the _get_all_subclasses() method is doing in the above.
I got the idea of using __init_subclass__() to do this from the Subclass registration section in the PEP 487 -- Simpler customisation of class creation proposal. Since the method will be inherited by all the base class' subclasses, registration will automatically be done for sub-subclasses, too (as opposed to only to direct subclasses) — it completely eliminates the need for a method like _get_all_subclasses().
# Requires Python 3.6+
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Pattern for matching "xxx://" # x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, path_prefix, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = cls._registry.get(path_prefix)
if subclass:
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise cls.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class Ufs(Nfs, path_prefix='ufs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise self.NoAccess(f'Cannot read directory {path!r}')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
data4 = FileSystem('ufs://192.168.1.18')
print(type(data1)) # -> <class '__main__.Nfs'>
print(type(data2)) # -> <class '__main__.LocalDrive'>
print(f'file count: {data2.count_files()}') # -> file count: <some number>
try:
data3 = FileSystem('c:/foobar') # A non-existent directory.
except FileSystem.NoAccess as exc:
print(f'{exc} - FileSystem.NoAccess exception raised as expected')
else:
raise RuntimeError("Non-existent path should have raised Exception!")
try:
data4 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(f'{exc} - FileSystem.Unknown exception raised as expected')
else:
raise RuntimeError("Unregistered path prefix should have raised Exception!")
In my opinion, using __new__ in such a way is really confusing for other people who might read your code. Also it requires somewhat hackish code to distinguish guessing file system from user input and creating Nfs and LocalDrive with their corresponding classes.
Why not make a separate function with this behaviour? It can even be a static method of FileSystem class:
class FileSystem(object):
# other code ...
#staticmethod
def from_path(path):
if path.upper().startswith('NFS://'):
return Nfs(path)
else:
return LocalDrive(path)
And you call it like this:
data1 = FileSystem.from_path('nfs://192.168.1.18')
data2 = FileSystem.from_path('/var/log')
Edit [BLUF]: there is no problem with the answer provided by #martineau, this post is merely to follow up for completion to discuss a potential error encountered when using additional keywords in a class definition that are not managed by the metaclass.
I'd like to supply some additional information on the use of __init_subclass__ in conjuncture with using __new__ as a factory. The answer that #martineau has posted is very useful and I have implemented an altered version of it in my own programs as I prefer using the class creation sequence over adding a factory method to the namespace; very similar to how pathlib.Path is implemented.
To follow up on a comment trail with #martinaeu I have taken the following snippet from his answer:
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
#classmethod
def __init_subclass__(cls, /, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
#classmethod
def _get_prefix(cls, s):
""" Extract any file system prefix at beginning of string s and
return a lowercase version of it or None when there isn't one.
"""
match = cls._PATH_PREFIX_PATTERN.match(s)
return match.group(1).lower() if match else None
def __new__(cls, path):
""" Create instance of appropriate subclass. """
path_prefix = cls._get_prefix(path)
subclass = FileSystem._registry.get(path_prefix)
if subclass:
# Using "object" base class method avoids recursion here.
return object.__new__(subclass)
else: # No subclass with matching prefix found (and no default).
raise FileSystem.Unknown(
f'path "{path}" has no known file system prefix')
def count_files(self):
raise NotImplementedError
class Nfs(FileSystem, path_prefix='nfs'):
def __init__ (self, path):
pass
def count_files(self):
pass
class LocalDrive(FileSystem, path_prefix=None): # Default file system.
def __init__(self, path):
if not os.access(path, os.R_OK):
raise FileSystem.NoAccess('Cannot read directory')
self.path = path
def count_files(self):
return sum(os.path.isfile(os.path.join(self.path, filename))
for filename in os.listdir(self.path))
if __name__ == '__main__':
data1 = FileSystem('nfs://192.168.1.18')
data2 = FileSystem('c:/') # Change as necessary for testing.
print(type(data1).__name__) # -> Nfs
print(type(data2).__name__) # -> LocalDrive
print(data2.count_files()) # -> <some number>
try:
data3 = FileSystem('foobar://42') # Unregistered path prefix.
except FileSystem.Unknown as exc:
print(str(exc), '- raised as expected')
else:
raise RuntimeError(
"Unregistered path prefix should have raised Exception!")
This answer, as written works, but I wish to address a few items (potential pitfalls) others may experience through inexperience or perhaps codebase standards their team requires.
Firstly, for the decorator on __init_subclass__, per the PEP:
One could require the explicit use of #classmethod on the __init_subclass__ decorator. It was made implicit since there's no sensible interpretation for leaving it out, and that case would need to be detected anyway in order to give a useful error message.
Not a problem since its already implied, and the Zen tells us "explicit over implicit"; never the less, when abiding by PEPs, there you go (and rational is further explained).
In my own implementation of a similar solution, subclasses are not defined with an additional keyword argument, such as #martineau does here:
class Nfs(FileSystem, path_prefix='nfs'): ...
class LocalDrive(FileSystem, path_prefix=None): ...
When browsing through the PEP:
As a second change, the new type.__init__ just ignores keyword arguments. Currently, it insists that no keyword arguments are given. This leads to a (wanted) error if one gives keyword arguments to a class declaration if the metaclass does not process them. Metaclass authors that do want to accept keyword arguments must filter them out by overriding __init__.
Why is this (potentially) problematic? Well there are several questions (notably this) describing the problem surrounding additional keyword arguments in a class definition, use of metaclasses (subsequently the metaclass= keyword) and the overridden __init_subclass__. However, that doesn't explain why it works in the currently given solution. The answer: kwargs.pop().
If we look at the following:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
path_prefix = kwargs.pop('path_prefix', None)
super().__init_subclass__(**kwargs)
cls._registry[path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem, path_prefix='nfs'): ...
This will still run without issue, but if we remove the kwargs.pop():
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs) # throws TypeError
cls._registry[path_prefix] = cls # Add class to registry.
The error thrown is already known and described in the PEP:
In the new code, it is not __init__ that complains about keyword arguments, but __init_subclass__, whose default implementation takes no arguments. In a classical inheritance scheme using the method resolution order, each __init_subclass__ may take out it's keyword arguments until none are left, which is checked by the default implementation of __init_subclass__.
What is happening is the path_prefix= keyword is being "popped" off of kwargs, not just accessed, so then **kwargs is now empty and passed up the MRO and thus compliant with the default implementation (receiving no keyword arguments).
To avoid this entirely, I propose not relying on kwargs but instead use that which is already present in the call to __init_subclass__, namely the cls reference:
# code in CPython 3.7
import os
import re
class FileSystem(object):
class NoAccess(Exception): pass
class Unknown(Exception): pass
# Regex for matching "xxx://" where x is any non-whitespace character except for ":".
_PATH_PREFIX_PATTERN = re.compile(r'\s*([^:]+)://')
_registry = {} # Registered subclasses.
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._registry[cls._path_prefix] = cls # Add class to registry.
...
class Nfs(FileSystem):
_path_prefix = 'nfs'
...
Adding the prior keyword as a class attribute also extends the use in later methods if one needs to refer back to the particular prefix used by the subclass (via self._path_prefix). To my knowledge, you cannot refer back to supplied keywords in the definition (without some complexity) and this seemed trivial and useful.
So to #martineau I apologize for my comments seeming confusing, only so much space to type them and as shown it was more detailed.