The Zen of Python says:
“There should be one—and preferably only one—obvious way to do it.”
Let’s say I want to create a class that builds a financial transaction. The class should allow the user to build a transaction and then call a sign() method to sign the transaction in preparation for it to be broadcast via an API call.
The class will have the following parameters:
sender
recipient
amount
signer (private key for signing)
metadata
signed_data
All of these are strings, except for the amount which is an int, and all are required except for the last two: metadata which is an optional parameter, and signed_data which is created when the method sign() is called.
We would like all of the parameters to undergo some kind of validation before the signing happens so we can reject badly formatted transactions by raising an appropriate error for the user.
This seems straight-forward using a classic Python class and constructor:
class Transaction:
def __init__(self, sender, recipient, amount, signer, metadata=None):
self.sender = sender
self.recipient = recipient
self.amount = amount
self.signer = signer
if metadata:
self.metadata = metadata
def is_valid(self):
# check that all required parameters are valid and exist and return True,
# otherwise return false
def sign(self):
if self.is_valid():
# sign transaction
self.signed_data = "pretend signature"
else:
# raise InvalidTransactionError
Or with properties:
class Transaction:
def __init__(self, sender, recipient, amount, signer, metadata=None):
self._sender = sender
self._recipient = recipient
self._amount = amount
self._signer = signer
self._signed_data = None
if metadata:
self._metadata = metadata
#property
def sender(self):
return self._sender
#sender.setter
def sender(self, sender):
# validate value, raise InvalidParamError if invalid
self._sender = sender
#property
def recipient(self):
return self._recipient
#recipient.setter
def recipient(self, recipient):
# validate value, raise InvalidParamError if invalid
self._recipient = recipient
#property
def amount(self):
return self._amount
#amount.setter
def amount(self, amount):
# validate value, raise InvalidParamError if invalid
self._amount = amount
#property
def signer(self):
return self._signer
#signer.setter
def signer(self, signer):
# validate value, raise InvalidParamError if invalid
self._signer = signer
#property
def metadata(self):
return self._metadata
#metadata.setter
def metadata(self, metadata):
# validate value, raise InvalidParamError if invalid
self._metadata = metadata
#property
def signed_data(self):
return self._signed_data
#signed_data.setter
def signed_data(self, signed_data):
# validate value, raise InvalidParamError if invalid
self._signed_data = signed_data
def is_valid(self):
return (self.sender and self.recipient and self.amount and self.signer)
def sign(self):
if self.is_valid():
# sign transaction
self.signed_data = "pretend signature"
else:
# raise InvalidTransactionError
print("Invalid Transaction!")
We can now validate each value when it’s set so by the time we go to sign we know we have valid parameters and the is_valid() method only has to check that all required parameters have been set. This feels a little more Pythonic to me than doing all the validation in the single is_valid() method but I am unsure if all the extra boiler plate code is really worth it.
With dataclasses:
#dataclass
class Transaction:
sender: str
recipient: str
amount: int
signer: str
metadata: str = None
signed_data: str = None
def is_valid(self):
# check that all parameters are valid and exist and return True,
# otherwise return false
def sign(self):
if self.is_valid():
# sign transaction
self.signed_data = "pretend signature"
else:
# raise InvalidTransactionError
print("Invalid Transaction!")
Comparing this to Approach
1, this is pretty nice. It’s concise, clean, and readable and already has __init__(), __repr__() and __eq__() methods built-in. On the other hand, compared to Approach
2 we’re back to validating all the inputs via a massive is_valid() method.
We could try to use properties with dataclasses but that's actually harder than it sounds. According to this blog post it can be done something like this:
#dataclass
class Transaction:
sender: str
_sender: field(init=False, repr=False)
recipient: str
_recipient: field(init=False, repr=False)
. . .
# properties for all parameters
def is_valid(self):
# if all parameters exist, return True,
# otherwise return false
def sign(self):
if self.is_valid():
# sign transaction
self.signed_data = "pretend signature"
else:
# raise InvalidTransactionError
print("Invalid Transaction!")
Is there one and only one obvious way to do this? Are dataclasses recommended for this kind of application?
As a general rule, and not limited to Python, it is a good idea to write code which "fails fast": that is, if something goes wrong at runtime, you want it to be detected and signalled (e.g. by throwing an exception) as early as possible.
Especially in the context of debugging, if the bug is that an invalid value is being set, you want the exception to be thrown at the time the value is set, so that the stack trace includes the method setting the invalid value. If the exception is thrown at the time the value is used, then you can't signal which part of the code caused the invalid value.
Of your three examples, only the second one allows you to follow this principle. It may require more boilerplate code, but writing boilerplate code is easy and doesn't take much time, compared to debugging without a meaningful stack trace.
By the way, if you have setters which do validation, then you should call these setters from your constructor too, otherwise it's possible to create an object with an invalid initial state.
Given your constraints, I think your dataclass approach can be improved to produce an expressive and idiomatic solution with very strong runtime assertions about the resulting Transaction instances, mostly by leveraging the __post_init__ mechanism:
from dataclasses import dataclass, asdict, field
from typing import Optional
#dataclass(frozen=True)
class Transaction:
sender: str
recipient: str
amount: int
signer: str
metadata: Optional[str] = None
signed_data: str = field(init=False)
def is_valid(self) -> bool:
... # implement your validity assertion logic
def __post_init__(self):
if self.is_valid():
object.__setattr__(self, "signed_data", "pretend signature")
else:
raise ValueError(f"Invalid transaction with parameter list "
f"{asdict(self)}.")
This reduces the amount of code you have to maintain and understand to a degree where every written line relates to a meaningful part of your requirements, which is the essence of pythonic code.
Put into words, instances of this Transaction class may specify metadata but don't need to and may not supply their own signed_data, something which was possible in your variant #3. Attributes can't be mutated any more after initialization (enforced by frozen=True), so that an instance that is valid cannot be altered into an invalid state. And most importantly, since the validation is now part of the constructor, it is impossible for an invalid instance to exist. Whenever you are able to refer to a Transaction in runtime, you can be 100% sure that it passed the validity check and would do so again.
Since you based your question on python-zen conformity (referring to Beautiful is better than ugly and Simple is better than complex in particular), I'd say this solution is preferable to the property based one.
Related
I am trying to capture (S3) logs in a structured way. I am capturing the access-related elements with this type of tuple:
class _Access(NamedTuple):
time: datetime
ip: str
actor: str
request_id: str
action: str
key: str
request_uri: str
status: int
error_code: str
I then have a class that uses this named tuple as follows (edited just down to relevant code):
class Logs:
def __init__(self, log: str):
raw_logs = match(S3_LOG_REGEX, log)
if raw_logs is None:
raise FormatError(log)
logs = raw_logs.groups()
timestamp = datetime.strptime(logs[2], "%d/%b/%Y:%H:%M:%S %z")
http_status = int(logs[9])
access = _Access(
timestamp,
logs[3],
logs[4],
logs[5],
logs[6],
logs[7],
logs[8],
http_status,
logs[10],
)
self.access = access
The problem is that it is too verbose when I now want to use it:
>>> log_struct = Logs(raw_log)
>>> log_struct.access.action # I don't want to have to add `access`
As I mention above, I'd rather be able to do something like this:
>>> log_struct = Logs(raw_log)
>>> log_struct.action
But I still want to have this clean named tuple called _Access. How can I make everything from access available at the top level?
Specifically, I have this line:
self.access = access
which is giving me that extra "layer" that I don't want. I'd like to be able to "unpack" it somehow, similar to how we can unpack arguments by passing the star in *args. But I'm not sure how I can unpack the tuple in this case.
What you really need for your use case is an alternative constructor for your NamedTuple subclass to parse a string of a log entry into respective fields, which can be done by creating a class method that calls the __new__ method with arguments parsed from the input string.
Using just the fields of ip and action as a simplified example:
from typing import NamedTuple
class Logs(NamedTuple):
ip: str
action: str
#classmethod
def parse(cls, log: str) -> 'Logs':
return cls.__new__(cls, *log.split())
log_struct = Logs.parse('192.168.1.1 GET')
print(log_struct)
print(log_struct.ip)
print(log_struct.action)
This outputs:
Logs(ip='192.168.1.1', action='GET')
192.168.1.1
GET
I agree with #blhsing and recommend that solution. This is assuming that there are not extra attributes required to be apply to the named tuple (say storing the raw log value).
If you really need the object to remain composed, another way to support accessing the properties of the _Access class would be to override the __getattr__ method [PEP 562] of Logs
The __getattr__ function at the module level should accept one
argument which is the name of an attribute and return the computed
value or raise an AttributeError:
def __getattr__(name: str) -> Any: ...
If an attribute is not found on a module object through the normal
lookup (i.e. object.__getattribute__), then __getattr__ is
searched in the module __dict__ before raising an AttributeError.
If found, it is called with the attribute name and the result is
returned. Looking up a name as a module global will bypass module
__getattr__. This is intentional, otherwise calling __getattr__
for builtins will significantly harm performance.
E.g.
from typing import NamedTuple, Any
class _Access(NamedTuple):
foo: str
bar: str
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
def __getattr__(self, name: str) -> Any:
return getattr(self.access, name)
When you request an attribute of Logs which is not present it will try to access the attribute through the Logs.access attribute. Meaning you can write code like this:
logs = Logs("fizz buzz")
print(f"{logs.log=}, {logs.foo=}, {logs.bar=}")
logs.log='fizz buzz', logs.foo='fizz', logs.bar='buzz'
Note that this would not preserve the typing information through to the Logs object in most static analyzers and autocompletes. That to me would be a compelling enough reason not to do this, and continue to use the more verbose way of accessing values as you describe in your question.
If you still really need this, and want to remain type safe. Then I would add properties to the Logs class which fetch from the _Access object.
class Logs:
def __init__(self, log: str) -> None:
self.log = log
self.access = _Access(*log.split())
#property
def foo(self) -> str:
return self.access.foo
#property
def bar(self) -> str:
return self.access.bar
This avoids the type safty issues, and depending on how much code you write using the Logs instances, still can cut down on other boilerplate dramatically.
I have a filesystem object (let it be fs). The object makes use of jwt token for authentication. When we create the filesystem object, the authentication is done. Once, we have the object, we can call method like ls to list the directory, etc. The token has some expiration time.
The issue is when I call fs.ls('/'), there is no validation in the backend for the token like is token still valid or not. What I want is, when ever there is call to a method on the object, I will intercept the call and check for the token expiration. If it is about to expire will update the token.
Searching and reading on SO, I came to about __getattribute__. But my code is not working as expected. Sometimes I am getting recursion error or sometimes I am getting null values.
This code gives recursion error:
class FileSystem(adlfs.FileSystem):
def __init__(self, *args, **kwargs):
try:
self.name, self.token = self._get_token()
if self.name is not None:
kwargs["name"] = self.name
if self.token is not None:
kwargs["token"] = self.token
self.exp = self.token.token.expires_on
super().__init__(*args, **kwargs)
except Exception as exception:
print(exception)
def _get_token(self) -> (str, 'Credential'):
return name, token
def __getattribute__(self, attr):
attribute = super().__getattribute__(attr)
if callable(attribute):
curr_time = int(time.time())
if curr_time > self.token_exp:
def refresh_token(*args, **kwargs):
self.name, self.token = self._get_token()
self.token_exp = updateTokenExpiration(self.token)
super().updateConnection()
return attribute(*args, **kwargs)
return refresh_token
else:
return attribute
else:
return attribute
Typically, in __getattribute__() all occurrences of self.something need to be replaced with super().__getattribute__('something') (unless they are targets of an assignment or del).
In your case that can be relaxed for non-callables (as for them your implementation of __getattribute__() just calls super().__getattribute__(...), practically without doing anything more), but for callables it still needs to be adjusted, for example:
# before adjustment:
self.name, self.token = self._get_token()
# after adjustment:
self.name, self.token = super().__getattribute__('_get_token')()
Otherwise your implementation calls itself, so that an infinite recursion occurs.
Replacing simple attribute access with such calls can, however, be tedious if you have many such places in your __getattribute__()...
A possible trick is to use in your definition of __getattribute__() only such attributes/methods of self that are specially named, e.g. with _ga_ at the beginning of their names, and filter out such names from the customized behavior of your __getattribute__(), e.g.:
def __getattribute__(self, name):
if name.startswith('_ga_'):
return super().__getattribute__(name)
...here the actual part of your custom implementation
...in which you can freely use `self._ga_whatever...`
I am using the Python package named Clint to prettify the inputs necessary in my application. In the package you can access the module validators and use it combined with prompt to properly ask users to input data.
I was looking for a possibility to implement a custom validator in Clint, due to a relatively short list of built-in classes of validators in the module:
[FileValidator, IntegerValidator, OptionValidator, PathValidator, RegexValidator, ValidationError]
So I wrote the code bellow:
from clint.textui import prompt, validators
class NumberValidator(object):
message = 'Input is not valid.'
def __init__(self, message=None):
if message is not None:
self.message = message
def __call__(self, value):
"""
Validates the input.
"""
try:
if int(value) > 10:
return value
else:
raise ValueError()
except (TypeError, ValueError):
raise validators.ValidationError(self.message)
answer = prompt.query(f'Insert range in days:',
'365',
validators=[NumberValidator("Must to be > 10")],
batch=False)
print(answer)
It works, but I found the solution a bit messy. That is because using this solution I have to create a new class every time I need to perform a new different type of validation.
I think it would be better if somehow the class could be dynamic using decorators, accepting a new function each time it would be initiated. But I found myself really bad in the decorators' subject.
So I ask you to help me to make a more Pythonic solution to this problem.
Not sure if this is the best way, but I may found a way better to this issue. In the code bellow I can create as much custom functions as I want (custom_validation_1, custom_validation_2, custom_validation_3...) and then just change the parameter validators in the prompt.query
from clint.textui import prompt, validators
class InputValidator(object):
message = 'Input is not valid.'
def __init__(self, fun, message=None, *args):
if message is not None:
self.message = message
self.my_function = fun
self.my_args = args
def __call__(self, value):
"""
Validates the input.
"""
try:
return self.my_function(value, *self.my_args)
except (TypeError, ValueError):
raise validators.ValidationError(self.message)
def custom_validation_1(value, number):
if int(value) > int(number):
return value
else:
raise ValueError
answer = prompt.query(f'Insert range in days:',
'365',
validators=[InputValidator(custom_validation_1,
"Must to be greater than 10",
10)],
batch=False)
print(answer)
I'm working with an external service which reports errors by code.
I have the list of error codes and the associated messages. Say, the following categories exist: authentication error, server error.
What is the smartest way to implement these errors in Python so I can always lookup an error by code and get the corresponding exception object?
Here's my straightforward approach:
class AuthError(Exception):
pass
class ServerError(Exception):
pass
map = {
1: AuthError,
2: ServerError
}
def raise_code(code, message):
""" Raise an exception by code """
raise map[code](message)
Would like to see better solutions :)
Your method is correct, except that map should be renamed something else (e.g. ERROR_MAP) so it does not shadow the builtin of the same name.
You might also consider making the function return the exception rather than raising it:
def error(code, message):
""" Return an exception by code """
return ERROR_MAP[code](message)
def foo():
raise error(code, message)
By placing the raise statement inside foo, you'd raise the error closer to where the error occurred and there would be one or two less lines to trace through if the stack trace is printed.
Another approach is to create a polymorphic base class which, being instantiated, actually produces a subclass that has the matching code.
This is implemented by traversing __subclasses__() of the parent class and comparing the error code to the one defined in the class. If found, use that class instead.
Example:
class CodeError(Exception):
""" Base class """
code = None # Error code
def __new__(cls, code, *args):
# Pick the appropriate class
for E in cls.__subclasses__():
if E.code == code:
C = E
break
else:
C = cls # fall back
return super(CodeError, cls).__new__(C, code, *args)
def __init__(self, code, message):
super(CodeError, self).__init__(message)
# Subclasses with error codes
class AuthError(CodeError):
code = 1
class ServerError(CodeError):
code = 2
CodeError(1, 'Wrong password') #-> AuthError
CodeError(2, 'Failed') #-> ServerError
With this approach, it's trivial to associate error message presets, and even map one class to multiple codes with a dict.
I have a relatively large enum wherein each member represents a message type. A client will receive a message containing the integer value associated with the msg type in the enum. For each msg type there will be an individual function callback to handle the msg.
I'd like to make the lookup and dispatching of the callback as quick as possible by using a sparse array (or vector) in which the enum value maps to the index of the callback. Is this possible in Python given arrays can't hold function types?
#pseudo code for 'enum'
class MsgType(object):
LOGIN, LOGOUT, HEARTBEAT, ... = range(n)
#handler class
class Handler(object):
def handleMsg(self, msg):
#dispatch msg to specific handler
def __onLogin(self, msg):
#handle login
def __onLogout(self, msg):
#handle logout
Update:
I wasn't clear in my terminology. I now understand Python dictionary lookups to be of complexity O(1) which makes them the perfect candidate. Thanks.
class MsgID(int):
pass
LOGIN = MsgID(0)
LOGOUT = MsgID(1)
HEARTBEAT = MsgID(2)
... # add all other message identifier numbers
class MsgType(object):
def __init__(self, id, data):
self.id = id
self.data = data
def login_handler(msg):
... # do something here
def logout_handler(msg):
... # do something here
def heartbeat_handler(msg):
... # do something here
msg_func = {
LOGIN : login_handler,
LOGOUT : logout_handler,
HEARTBEAT : heartbeat_handler,
...
}
class Handler(object):
def handleMsg(self, msg):
try:
msg_func[msg.id](msg) # lookup function reference in dict, call function
except KeyError:
log_error_mesg('message without a handler function: %d' % msg.id)
It's not strictly needed, but I added a subclass of int for message ID. That way you can check to see if the ID value is really an ID value rather than just some random integer.
I assume that each message will have an ID value in it, identifying what sort of message it is, plus some data. The msg_func dictionary uses MsgID values as keys, which map to function references.
You could put all the functions inside a class, but I didn't do that here; they are just functions.