I needed to encapsulate functions related to single responsibility of parsing and emiting messages to API endpoint so I have created class Emitter.
class Emitter:
def __init__(self, message: str) -> None:
self.parsed = self.parse_message(message)
self.emit_message(self.parsed)
#staticmethod
def parse_message(msg: str) -> str:
... # parsing code
#staticmethod
def emit_message(msg: str) -> None:
... # emitting code
In order to emit a message I call a short-lived instance of that class with message passed as argument to __init__.
Emitter("my message to send")
__init__ itself directly runs all necessary methods to parse and emit message.
Is it correct to use __init__ to directly run the main responsibility of a class? Or should I use different solution like creating function that first instantiates the class and then calls all the necessary methods?
It looks like you're attempting to do the following at once:
initialize a class Emitter with an attribute message
parse the message
emit the message
IMO, a class entity is a good design choice since it allows granular control flow. Given that each step from above is discrete, I'd recommend executing them as such:
# Instantiate (will automatically parse)
e = Emitter('my message')
# Send the message
e.send_message()
You will need to redesign your class to the following:
class Emitter:
def __init__(self, message: str) -> None:
self.message = message
self.parsed = self.parse_message(message)
#staticmethod
def parse_message(msg: str) -> str:
... # parsing code
# This should be an instance method
def emit_message(msg: str) -> None:
... # emitting code
Also, your parse_message() method could be converted to a validation method, but that's another subject.
Related
I am trying to capture (S3) logs in a structured way. I am capturing the access-related elements with this type of tuple:
class _Access(NamedTuple):
time: datetime
ip: str
actor: str
request_id: str
action: str
key: str
request_uri: str
status: int
error_code: str
I then have a class that uses this named tuple as follows (edited just down to relevant code):
class Logs:
def __init__(self, log: str):
raw_logs = match(S3_LOG_REGEX, log)
if raw_logs is None:
raise FormatError(log)
logs = raw_logs.groups()
timestamp = datetime.strptime(logs[2], "%d/%b/%Y:%H:%M:%S %z")
http_status = int(logs[9])
access = _Access(
timestamp,
logs[3],
logs[4],
logs[5],
logs[6],
logs[7],
logs[8],
http_status,
logs[10],
)
self.access = access
The problem is that it is too verbose when I now want to use it:
>>> log_struct = Logs(raw_log)
>>> log_struct.access.action # I don't want to have to add `access`
As I mention above, I'd rather be able to do something like this:
>>> log_struct = Logs(raw_log)
>>> log_struct.action
But I still want to have this clean named tuple called _Access. How can I make everything from access available at the top level?
Specifically, I have this line:
self.access = access
which is giving me that extra "layer" that I don't want. I'd like to be able to "unpack" it somehow, similar to how we can unpack arguments by passing the star in *args. But I'm not sure how I can unpack the tuple in this case.
What you really need for your use case is an alternative constructor for your NamedTuple subclass to parse a string of a log entry into respective fields, which can be done by creating a class method that calls the __new__ method with arguments parsed from the input string.
Using just the fields of ip and action as a simplified example:
from typing import NamedTuple
class Logs(NamedTuple):
ip: str
action: str
#classmethod
def parse(cls, log: str) -> 'Logs':
return cls.__new__(cls, *log.split())
log_struct = Logs.parse('192.168.1.1 GET')
print(log_struct)
print(log_struct.ip)
print(log_struct.action)
This outputs:
Logs(ip='192.168.1.1', action='GET')
192.168.1.1
GET
I agree with #blhsing and recommend that solution. This is assuming that there are not extra attributes required to be apply to the named tuple (say storing the raw log value).
If you really need the object to remain composed, another way to support accessing the properties of the _Access class would be to override the __getattr__ method [PEP 562] of Logs
The __getattr__ function at the module level should accept one
argument which is the name of an attribute and return the computed
value or raise an AttributeError:
def __getattr__(name: str) -> Any: ...
If an attribute is not found on a module object through the normal
lookup (i.e. object.__getattribute__), then __getattr__ is
searched in the module __dict__ before raising an AttributeError.
If found, it is called with the attribute name and the result is
returned. Looking up a name as a module global will bypass module
__getattr__. This is intentional, otherwise calling __getattr__
for builtins will significantly harm performance.
E.g.
from typing import NamedTuple, Any
class _Access(NamedTuple):
foo: str
bar: str
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
def __getattr__(self, name: str) -> Any:
return getattr(self.access, name)
When you request an attribute of Logs which is not present it will try to access the attribute through the Logs.access attribute. Meaning you can write code like this:
logs = Logs("fizz buzz")
print(f"{logs.log=}, {logs.foo=}, {logs.bar=}")
logs.log='fizz buzz', logs.foo='fizz', logs.bar='buzz'
Note that this would not preserve the typing information through to the Logs object in most static analyzers and autocompletes. That to me would be a compelling enough reason not to do this, and continue to use the more verbose way of accessing values as you describe in your question.
If you still really need this, and want to remain type safe. Then I would add properties to the Logs class which fetch from the _Access object.
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
#property
def foo(self) -> str:
return self.access.foo
#property
def bar(self) -> str:
return self.access.bar
This avoids the type safty issues, and depending on how much code you write using the Logs instances, still can cut down on other boilerplate dramatically.
I have a Python class that allows a user to register a callback. I am trying to provide a default callback but I'm not sure how to do it.
First attempt:
class MyClass:
callback = printing_callback
def register_callback(self, callback):
self.callback = callback
def printing_callback(self, message):
print(f"Message: {message}")
def notify(self, message):
self.callback(message)
This gave me an 'unresolved reference' error for printing_callback
Second attempt:
I tried changed the line to callback = self.printing_callback. This gave me an 'unresolved reference' error for self
Third attempt:
callback = lambda message: print(f"Message: {message}")
which gave me this warning: "PEP 8: E731 do not assign a lambda expression, use a def"
Is there a way to initialize callback to a default?
Update
I found a way to set the default method and that is not to have printing_callback be an instance method - which makes sense. This appears to compile without warnings
def printing_callback(message):
print(f"Message: {message}")
class MyClass:
callback = printing_callback
def register_callback(self, callback):
self.callback = callback
def notify(self, message):
self.callback(message)
But now the when printing_callable is called it is called with an extra argument - the MyClass instance that called it.
I can change the signature to printing_callback(myClass, message) and the code works. Are there cleaner ways to do this than to just have an extra parameter?
Set the default on initialization.
def printing_callback(message):
print(f"Message: {message}")
class MyClass:
def __init__(self, callback=printing_callback):
self.callback = callback
def notify(self, message):
self.callback(message)
As far as I can tell, there's no reason for callback to be a class attribute, so make it an instance attribute, and that avoids it being registered as a method.
If you need to change it later, you can simply change the callback attribute instead of using the setter register_callback():
m = MyClass()
m.notify('Hello!') # -> Message: Hello!
m.callback = print
m.notify('Hi!') # -> Hi!
In first attempt, it's obvious that you have no reference to printing_callback, you didn't define it.
In second: self parameter is which get filled by python, when you call that method on an instance of the class. It points to the newly created object which is the class instance ofcourse. note that it is local to the methods, not inside the body of your class. so no reference again.
The structure you are looking for is :
def printing_callback(self, message):
print(f"Message: {message}")
class MyClass:
callback = printing_callback
def register_callback(self, callback):
self.callback = callback
def notify(self, message):
self.callback(message)
obj = MyClass()
obj.callback("testing")
Note I added an extra parameter self (the actual name doesn't matter).This is because we call that function on an instance, so again python fill the first argument with the reference to that instance. This is why I named it self.
I am trying to use type hinting to specify the API to follow when implementing a connector class (to a broker, in this case).
I want to specify that such class(es) should be context manager(s)
How do I do that?
Let me reword it more clearly: how can I define the Broker class so that it indicates that its concrete implementations, e.g. the Rabbit class, must be context managers?
Is there a practical way? Do I have to specify __enter__ and __exit__ and just inherit from Protocol?
Is it enough to inherit from ContextManager?
By the way, should I use #runtime or #runtime_checkable? (My VScode linter seems to have problems finding those in typing. I am using python 3 7.5)
I know how to do it with ABC's, but I would like to learn how to do it with protocol definitions (which I have used fine already, but they weren't context managers).
I cannot make out how to use the ContextManager type. So far I haven't been able to find good examples from the official docs.
At present I came up with
from typing import Protocol, ContextManager, runtime, Dict, List
#runtime
class Broker(ContextManager):
"""
Basic interface to a broker.
It must be a context manager
"""
def publish(self, data: str) -> None:
"""
Publish data to the topic/queue
"""
...
def subscribe(self) -> None:
"""
Subscribe to the topic/queue passed to constructor
"""
...
def read(self) -> str:
"""
Read data from the topic/queue
"""
...
and the implementation is
#implements(Broker)
class Rabbit:
def __init__(self,
url: str,
queue: str = 'default'):
"""
url: where to connect, i.e. where the broker is
queue: the topic queue, one only
"""
# self.url = url
self.queue = queue
self.params = pika.URLParameters(url)
self.params.socket_timeout = 5
def __enter__(self):
self.connection = pika.BlockingConnection(self.params) # Connect to CloudAMQP
self.channel = self.connection.channel() # start a channel
self.channel.queue_declare(queue=self.queue) # Declare a queue
return self
def __exit__(self, exc_type, exc_value, traceback):
self.connection.close()
def publish(self, data: str):
pass # TBD
def subscribe(self):
pass # TBD
def read(self):
pass # TBD
Note: the implements decorator works fine (it comes form a previous project), it checks the class is a subclass of the given protocol
Short answer -- your Rabbit implementation is actually fine as-is. Just add some type hints to indicate that __enter__ returns an instance of itself and that __exit__ returns None. The types of the __exit__ params don't actually matter too much.
Longer answer:
Whenever I'm not sure what exactly some type is/what some protocol is, it's often helpful to check TypeShed, the collection of type hints for the standard library (and a few 3rd party libraries).
For example, here is the definition of typing.ContextManager. I've copied it below here:
from types import TracebackType
# ...snip...
_T_co = TypeVar('_T_co', covariant=True) # Any type covariant containers.
# ...snip...
#runtime_checkable
class ContextManager(Protocol[_T_co]):
def __enter__(self) -> _T_co: ...
def __exit__(self, __exc_type: Optional[Type[BaseException]],
__exc_value: Optional[BaseException],
__traceback: Optional[TracebackType]) -> Optional[bool]: ...
From reading this, we know a few things:
This type is a Protocol, which means any type that happens to implement __enter__ and __exit__ following the given signatures above would be a valid subtype of typing.ContextManager without needing to explicitly inherit it.
This type is runtime checkable, which means doing isinstance(my_manager, ContextManager) also works, if you want to do that for whatever reason.
The parameter names of __exit__ are all prefixed with two underscores. This is a convention type checkers use to indicate those arguments are positional only: using keyword arguments on __exit__ won't type check. Practically speaking, that means you can name your own __exit__ parameters whatever you'd like while still being in compliance with the protocol.
So, putting it together, here is the smallest possible implementation of a ContextManager that still type checks:
from typing import ContextManager, Type, Generic, TypeVar
class MyManager:
def __enter__(self) -> str:
return "hello"
def __exit__(self, *args: object) -> None:
return None
def foo(manager: ContextManager[str]) -> None:
with manager as x:
print(x) # Prints "hello"
reveal_type(x) # Revealed type is 'str'
# Type checks!
foo(MyManager())
def bar(manager: ContextManager[int]) -> None: ...
# Does not type check, since MyManager's `__enter__` doesn't return an int
bar(MyManager())
One nice little trick is that we can actually get away with a pretty lazy __exit__ signature, if we're not actually planning on using the params. After all, if __exit__ will accept basically anything, there's no type safety issue.
(More formally, PEP 484 compliant type checkers will respect that functions are contravariant with respect to their parameter types).
But of course, you can specify the full types if you want. For example, to take your Rabbit implementation:
# So I don't have to use string forward references
from __future__ import annotations
from typing import Optional, Type
from types import TracebackType
# ...snip...
#implements(Broker)
class Rabbit:
def __init__(self,
url: str,
queue: str = 'default'):
"""
url: where to connect, i.e. where the broker is
queue: the topic queue, one only
"""
# self.url = url
self.queue = queue
self.params = pika.URLParameters(url)
self.params.socket_timeout = 5
def __enter__(self) -> Rabbit:
self.connection = pika.BlockingConnection(params) # Connect to CloudAMQP
self.channel = self.connection.channel() # start a channel
self.channel.queue_declare(queue=self.queue) # Declare a queue
return self
def __exit__(self,
exc_type: Optional[Type[BaseException]],
exc_value: Optional[BaseException],
traceback: Optional[TracebackType],
) -> Optional[bool]:
self.connection.close()
def publish(self, data: str):
pass # TBD
def subscribe(self):
pass # TBD
def read(self):
pass # TBD
To answer the new edited-in questions:
How can I define the Broker class so that it indicates that its concrete implementations, e.g. the Rabbit class, must be context managers?
Is there a practical way? Do I have to specify enter and exit and just inherit from Protocol?
Is it enough to inherit from ContextManager?
There are two ways:
Redefine the __enter__ and __exit__ functions, copying the original definitions from ContextManager.
Make Broker subclass both ContextManager and Protocol.
If you subclass only ContextManager, all you are doing is making Broker just inherit whatever methods happen to have a default implementation in ContextManager, more or less.
PEP 544: Protocols and structural typing goes into more details about this. The mypy docs on Protocols have a more user-friendly version of this. For example, see the section on defining subprotocols and subclassing protocols.
By the way, should I use #runtime or #runtime_checkable? (My VScode linter seems to have problems finding those in typing. I am using python 3 7.5)
It should be runtime_checkable.
That said, both Protocol and runtime_checkable were actually added to Python in version 3.8, which is probably why your linter is unhappy.
If you want to use both in older versions of Python, you'll need to pip install typing-extensions, the official backport for typing types.
Once this is installed, you can do from typing_extensions import Protocol, runtime_checkable.
I was wondering if factory class methods break the Liskov substitution principle.
For instance in the following Python code, does the Response.from_request factory class method break it?
import abc
class BaseResponse(abc.ABC):
#abc.abstractmethod
def get_headers(self):
raise NotImplementedError
#abc.abstractmethod
def get_body(self):
raise NotImplementedError
class Response(BaseResponse):
def __init__(self, headers, body):
self.__headers = headers
self.__body = body
def get_headers(self):
return self.__headers
def get_body(self):
return self.__body
#classmethod
def from_request(cls, request, payload):
headers = request.get_headers()
headers["meta_data"] = payload["meta_data"]
body = payload["data"]
return cls(headers, body)
The substitution principle says that you need to be able to substitute an object with another object of a compatible type (i.e. a subtype), and it must still behave the same. You need to see this from the perspective of a function that type hints for a specific object:
def func(foo: BaseResponse):
...
This function expects an argument that behaves like BaseResponse. What does that behave like?
get_headers()
get_body()
These are the only two methods of BaseResponse. As long as the object you pass to func has these two characteristics, it passes the duck test of BaseResponse. If it further implements any additional methods, that's of no concern.
So, no, class methods don't break the LSP.
I'm writing a Python class to wrap/decorate/enhance another class from a package called petl, a framework for ETL (data movement) workflows. Due to design constraints I can't just subclass it; every method call has to be sent through my own class so I can control what kind of objects are being passed back. So in principle this is a proxy class, but I'm having some trouble using existing answers/recipes out there. This is what my code looks like:
from functools import partial
class PetlTable(object):
"""not really how we construct petl tables, but for illustrative purposes"""
def hello(name):
print('Hello, {}!'.format(name)
class DatumTable(object):
def __init__(self, petl_tbl):
self.petl_tbl = petl_tbl
def __getattr__(self, name):
"""this returns a partial referencing the child method"""
petl_attr = getattr(self.petl_tbl, name, None)
if petl_attr and callable(petl_attr):
return partial(self.call_petl_method, func=petl_attr)
raise NotImplementedError('Not implemented')
def call_petl_method(self, func, *args, **kwargs):
func(*args, **kwargs)
Then I try to instantiate a table and call something:
# create a petl table
pt = PetlTable()
# wrap it with our own class
dt = DatumTable(pt)
# try to run the petl method
dt.hello('world')
This gives a TypeError: call_petl_method() got multiple values for argument 'func'.
This only happens with positional arguments; kwargs seem to be fine. I'm pretty sure it has to do with self not being passed in, but I'm not sure what the solution is. Can anyone think of what I'm doing wrong, or a better solution altogether?
This seems to be a common issue with mixing positional and keyword args:
TypeError: got multiple values for argument
To get around it, I took the positional arg func out of call_petl_method and put it in a kwarg that's unlikely to overlap with the kwargs of the child function. A little hacky, but it works.
I ended up writing a Proxy class to do all this generically:
class Proxy(object):
def __init__(self, child):
self.child = child
def __getattr__(self, name):
child_attr = getattr(self.child, name)
return partial(self.call_child_method, __child_fn__=child_attr)
#classmethod
def call_child_method(cls, *args, **kwargs):
"""
This calls a method on the child object and wraps the response as an
object of its own class.
Takes a kwarg `__child_fn__` which points to a method on the child
object.
Note: this can't take any positional args or they get clobbered by the
keyword args we're trying to pass to the child. See:
https://stackoverflow.com/questions/21764770/typeerror-got-multiple-values-for-argument
"""
# get child method
fn = kwargs.pop('__child_fn__')
# call the child method
r = fn(*args, **kwargs)
# wrap the response as an object of the same class
r_wrapped = cls(r)
return r_wrapped
This will also solve the problem. It doesn't use partial at all.
class PetlTable(object):
"""not really how we construct petl tables, but for illustrative purposes"""
def hello(name):
print('Hello, {}!'.format(name))
class DatumTable(object):
def __init__(self, petl_tbl):
self.petl_tbl = petl_tbl
def __getattr__(self, name):
"""Looks-up named attribute in class of the petl_tbl object."""
petl_attr = self.petl_tbl.__class__.__dict__.get(name, None)
if petl_attr and callable(petl_attr):
return petl_attr
raise NotImplementedError('Not implemented')
if __name__ == '__main__':
# create a petl table
pt = PetlTable()
# wrap it with our own class
dt = DatumTable(pt)
# try to run the petl method
dt.hello('world') # -> Hello, world!