Python generic with union - python

I have a Document and Page types, both containing data and metadata parts. They are looking the same:
class Document:
__data: DocumentData
__meta: DocumentMeta
def __init__(self, part: Union[DocumentData, DocumentMeta, None] = None, data: Optional[DocumentData] = None,
meta: Optional[DocumentMeta] = None):
super().__init__()
self.data: Optional[DocumentData] = data
self.meta: Optional[DocumentMeta] = meta
if part is not None:
if type(part) == DocumentData:
data = part
meta = DocumentMeta()
elif type(part) == DocumentMeta:
meta = part
data = DocumentData()
class Page:
__data: PageData
__meta: PageMeta
def __init__(self, part: Union[PageData, PageMeta, None] = None, data: Optional[PageData] = None,
meta: Optional[PageMeta] = None):
super().__init__()
self.data: Optional[PageData] = data
self.meta: Optional[PageMeta] = meta
if part is not None:
if type(part) == PageData:
data = part
meta = PageMeta()
elif type(part) == PageMeta:
meta = part
data = PageData()
I would like now to refactor these 2 types to use a generic type. I did it that way:
from typing import Generic, Optional, TypeVar, Union
DataStruct = TypeVar('DataStruct')
MetaStruct = TypeVar('MetaStruct')
class MetaDataStruct(Generic[DataStruct, MetaStruct]):
__data: DataStruct
__meta: MetaStruct
def __init__(
self,
part: Union[DataStruct, MetaStruct, None] = None,
data: Optional[DataStruct] = None,
meta: Optional[MetaStruct] = None
):
super().__init__()
self.data: Optional[DataStruct] = data
self.meta: Optional[MetaStruct] = meta
if part is not None:
if type(part) == DataStruct:
data = part
meta = MetaStruct()
elif type(part) == MetaStruct:
meta = part
data = DataStruct()
class DocumentData:
pass
class DocumentMeta:
pass
class PageData:
pass
class PageMeta:
pass
class Document(MetaDataStruct[DocumentData, DocumentMeta]):
pass
class Page(MetaDataStruct[PageData, PageMeta]):
pass
Now there's few problems with type checking.
if type(part) == DataStruct: returns False all the time. In runtime a type(part) is one of: DocumentData, DocumentMeta, PageData, PageMeta. I understand that I have to compare type(part) with actual type of DataStruct. What is the right way to resolve the runtime type of DataStruct?
In python hints manual it's written: At runtime, isinstance(x, T) will raise TypeError. In general, isinstance() and issubclass() should not be used with types. I believe the same issue is here.
I can use type(self).orig_bases[0].args[0] to infer DataStruct, but it is conceptually incorrect. It will retrieve the first generic argument instead DataStruct. So, if a MetaDataStruct base class signature will change to class MergedStruct(Struct, Generic[MetaStruct, DataStruct]) (swapped TypeVar arguments), MetaStruct will be retrieved instead DataStruct.
For some reason, when I tried to intialize Document(part=1), it passed. In practice I expected the code to raise TypeError.

Python doesn't do type checking at runtime; you need to use a static analysis tool like mypy. Running mypy over the code you've given shows these errors:
22: error: 'DataStruct' is a type variable and only valid in type context
23: error: Incompatible types in assignment (expression has type "Union[DataStruct, MetaStruct]", variable has type "Optional[DataStruct]")
24: error: 'MetaStruct' is a type variable and only valid in type context
25: error: 'MetaStruct' is a type variable and only valid in type context
26: error: Incompatible types in assignment (expression has type "Union[DataStruct, MetaStruct]", variable has type "Optional[MetaStruct]")
27: error: 'DataStruct' is a type variable and only valid in type context
If you add a line trying to initialize Document(part=1), you won't get a runtime error (there's nothing in your code that would raise an error; your if/elif will just be a no-op), but you will get a typechecking error from mypy that looks like:
54: error: Argument "part" to "Document" has incompatible type "int"; expected "Union[DocumentData, DocumentMeta, None]"
The problem with the type() check you're trying to do (and with an equivalent isinstance) is that a TypeVar has no runtime value, so you can't invoke it as a constructor. See: Instantiate a type that is a TypeVar
One way to fix this is to require that the subclass provide the actual types:
from abc import ABC, abstractclassmethod
from typing import Generic, Optional, Type, TypeVar, Union
DataStruct = TypeVar('DataStruct')
MetaStruct = TypeVar('MetaStruct')
class MetaDataStruct(Generic[DataStruct, MetaStruct], ABC):
#abstractclassmethod
def _data_type(cls) -> Type[DataStruct]:
pass
#abstractclassmethod
def _meta_type(cls) -> Type[MetaStruct]:
pass
def __init__(
self,
part: Union[DataStruct, MetaStruct, None] = None,
data: Optional[DataStruct] = None,
meta: Optional[MetaStruct] = None
):
super().__init__()
self.data: Optional[DataStruct] = data
self.meta: Optional[MetaStruct] = meta
if part is not None:
if isinstance(part, self._data_type()):
data = part
meta = self._meta_type()()
elif isinstance(part, self._meta_type()):
meta = part
data = self._data_type()()
class DocumentData:
pass
class DocumentMeta:
pass
class Document(MetaDataStruct[DocumentData, DocumentMeta]):
#classmethod
def _data_type(cls) -> Type[DocumentData]:
return DocumentData
#classmethod
def _meta_type(cls) -> Type[DocumentMeta]:
return DocumentMeta
The above typechecks correctly (you'll get mypy errors if you don't implement the _data_type and _meta_type methods correctly in the subclass), and is able to use the class methods to call the appropriate constructors at runtime.

Temporarily, I used this solution:
actual_data_struct = getattr(type(self), '__orig_bases__')[0].__args__[0]
actual_meta_struct = getattr(type(self), '__orig_bases__')[0].__args__[1]
if part is not None:
if type(part) == actual_data_struct:
data = part
meta = actual_meta_struct()
elif type(part) == actual_meta_struct:
meta = part
data = actual_data_struct()

Related

Conditional type hint

Im using a pattern that all my adapters returns a Result objects instead of the result itself. Let me explain:
from typing import Generic, Optional, TypeVar
from pydantic import BaseModel
Dto = TypeVar("Dto", bound=BaseModel)
class Result(BaseModel, Generic[Dto]):
error: Optional[Exception]
data: Optional[Dto]
#property
def is_success(self) -> bool:
return bool(self.data) and not self.error
class Config:
arbitrary_types_allowed = True
def adapter_example(input: Any) -> Result[int]:
try:
# some complex stuff here
result = Result[int](data=10)
except SomethingBad as e:
return Result[int](error=e)
The point is, check if error is None do not ensures me that data != None. There is a way to force that at least (and conditionally) one of them are mandatory (or, not Optional)?
Like:
Result[str](data='a') # VALID
Result[str](error=Exception()) # VALID
Result[str](data='', error=Exception()) # VALID
Result[str]() # INVALID
if result.data:
# Here any linter are 100% sure that result.error is None
else:
# Here any linter are 100% sure that result.error != None
ps: Im only using pydantic.BaseModel here because its easier in this implementation. If any sugestions about how to type this class conditionally dont use pydantic is fine to mee

Look up items in list of dataclasses by value

I'm using python to filter data in elasticsearch based on request params provided. I've got a working example, but know it can be improved and am trying to think of a better way. The current code is like this:
#dataclass(frozen=True)
class Filter:
param: str
custom_es_field: Optional[str] = None
is_bool_query: bool = False
is_date_query: bool = False
is_range_query: bool = False
def es_field(self) -> str:
if self.custom_es_field:
field = self.custom_es_field
elif "." in self.param:
field = self.param.replace(".", "__")
else:
field = self.param
return field
filters = [
Filter(param="publication_year", is_range_query=True),
Filter(param="publication_date", is_date_query=True),
Filter(param="venue.issn").
...
]
def filter_records(filter_params, s):
for filter in filters:
# range query
if filter.param in filter_params and filter.is_range_query:
param = filter_params[filter.param]
if "<" in param:
param = param[1:]
validate_range_param(filter, param)
kwargs = {filter.es_field(): {"lte": int(param)}}
s = s.filter("range", **kwargs)
elif filter.param in filter_params and filter.is_bool_query:
....
The thing I think is slow is I am looping through all of the filters in order to use the one that came in as a request variable. I'm tempted to convert this to a dictionary so I can do filter["publication_year"], but I like having the extra methods available via the dataclass. Would love to hear any thoughts.

Pydantic get a fields type hint

I want to store metadata for my ML models in pydantic. Is there a proper way to access a fields type? I know you can do BaseModel.__fields__['my_field'].type_ but I assume there's a better way.
I want to make it so that if a BaseModel fails to instantiate it is very clear what data is required to create this missing fields and which methods to use. Something like this :
from pydantic import BaseModel
import pandas as pd
# basic model
class Metadata(BaseModel):
peaks_per_day: float
class PeaksPerDayType(float):
data_required = pd.Timedelta("180D")
data_type = "foo"
#classmethod
def determine(cls, data):
return cls(data)
# use our custom float
class Metadata(BaseModel):
peaks_per_day: PeaksPerDayType
def get_data(data_type, required_data):
# get enough of the appropriate data type
return [1]
# Initial data we have
metadata_json = {}
try:
metadata = Metadata(**metadata_json)
# peaks per day is missing
except Exception as e:
error_msg = e
missing_fields = error_msg.errors()
missing_fields = [missing_field['loc'][0] for missing_field in missing_fields]
# For each missing field use its type hint to find what data is required to
# determine it and access the method to determine the value
new_data = {}
for missing_field in missing_fields:
req_data = Metadata[missing_field].data_required
data_type = Metadata[missing_field].data_type
data = get_data(data_type=data_type, required_data=req_data)
new_data[missing_field] = Metadata[missing_field].determine(data)
metadata = Metadata(**metadata_json, **new_data)
In the case you dont need to handle nested classes, this should work
from pydantic import BaseModel, ValidationError
import typing
class PeaksPerDayType(float):
data_required = 123.22
data_type = "foo"
#classmethod
def determine(cls, data):
return cls(data)
# use our custom float
class Metadata(BaseModel):
peaks_per_day: PeaksPerDayType
def get_data(data_type, required_data):
# get enough of the appropriate data type
return required_data
metadata_json = {}
try:
Metadata(**metadata_json)
except ValidationError as e:
field_to_type = typing.get_type_hints(Metadata)
missing_fields = []
for error in e.errors():
if error['type']=='value_error.missing':
missing_fields.append(error['loc'][0])
else:
raise
new_data = {}
for field in missing_fields:
type_ = field_to_type[field]
new_data[field] = get_data(type_.data_type, type_.data_required)
print(Metadata(**metadata_json, **new_data))
peaks_per_day=123.22
Im not really sure whats the point of data_type or get_data, but I assume its some internal logic that you want to add

How do I check if a value matches a type in python?

Let's say I have a python function whose single argument is a non-trivial type:
from typing import List, Dict
ArgType = List[Dict[str, int]] # this could be any non-trivial type
def myfun(a: ArgType) -> None:
...
... and then I have a data structure that I have unpacked from a JSON source:
import json
data = json.loads(...)
My question is: How can I check at runtime that data has the correct type to be used as an argument to myfun() before using it as an argument for myfun()?
if not isCorrectType(data, ArgType):
raise TypeError("data is not correct type")
else:
myfun(data)
Validating a type annotation is a non-trivial task. Python does not do it automatically, and writing your own validator is difficult because the typing module doesn't offer much of a useful interface. (In fact the internals of the typing module have changed so much since its introduction in python 3.5 that it's honestly a nightmare to work with.)
Here's a type validator function taken from one of my personal projects (wall of code warning):
import inspect
import typing
__all__ = ['is_instance', 'is_subtype', 'python_type', 'is_generic', 'is_base_generic', 'is_qualified_generic']
if hasattr(typing, '_GenericAlias'):
# python 3.7
def _is_generic(cls):
if isinstance(cls, typing._GenericAlias):
return True
if isinstance(cls, typing._SpecialForm):
return cls not in {typing.Any}
return False
def _is_base_generic(cls):
if isinstance(cls, typing._GenericAlias):
if cls.__origin__ in {typing.Generic, typing._Protocol}:
return False
if isinstance(cls, typing._VariadicGenericAlias):
return True
return len(cls.__parameters__) > 0
if isinstance(cls, typing._SpecialForm):
return cls._name in {'ClassVar', 'Union', 'Optional'}
return False
def _get_base_generic(cls):
# subclasses of Generic will have their _name set to None, but
# their __origin__ will point to the base generic
if cls._name is None:
return cls.__origin__
else:
return getattr(typing, cls._name)
def _get_python_type(cls):
"""
Like `python_type`, but only works with `typing` classes.
"""
return cls.__origin__
def _get_name(cls):
return cls._name
else:
# python <3.7
if hasattr(typing, '_Union'):
# python 3.6
def _is_generic(cls):
if isinstance(cls, (typing.GenericMeta, typing._Union, typing._Optional, typing._ClassVar)):
return True
return False
def _is_base_generic(cls):
if isinstance(cls, (typing.GenericMeta, typing._Union)):
return cls.__args__ in {None, ()}
if isinstance(cls, typing._Optional):
return True
return False
else:
# python 3.5
def _is_generic(cls):
if isinstance(cls, (typing.GenericMeta, typing.UnionMeta, typing.OptionalMeta, typing.CallableMeta, typing.TupleMeta)):
return True
return False
def _is_base_generic(cls):
if isinstance(cls, typing.GenericMeta):
return all(isinstance(arg, typing.TypeVar) for arg in cls.__parameters__)
if isinstance(cls, typing.UnionMeta):
return cls.__union_params__ is None
if isinstance(cls, typing.TupleMeta):
return cls.__tuple_params__ is None
if isinstance(cls, typing.CallableMeta):
return cls.__args__ is None
if isinstance(cls, typing.OptionalMeta):
return True
return False
def _get_base_generic(cls):
try:
return cls.__origin__
except AttributeError:
pass
name = type(cls).__name__
if not name.endswith('Meta'):
raise NotImplementedError("Cannot determine base of {}".format(cls))
name = name[:-4]
return getattr(typing, name)
def _get_python_type(cls):
"""
Like `python_type`, but only works with `typing` classes.
"""
# Many classes actually reference their corresponding abstract base class from the abc module
# instead of their builtin variant (i.e. typing.List references MutableSequence instead of list).
# We're interested in the builtin class (if any), so we'll traverse the MRO and look for it there.
for typ in cls.mro():
if typ.__module__ == 'builtins' and typ is not object:
return typ
try:
return cls.__extra__
except AttributeError:
pass
if is_qualified_generic(cls):
cls = get_base_generic(cls)
if cls is typing.Tuple:
return tuple
raise NotImplementedError("Cannot determine python type of {}".format(cls))
def _get_name(cls):
try:
return cls.__name__
except AttributeError:
return type(cls).__name__[1:]
if hasattr(typing.List, '__args__'):
# python 3.6+
def _get_subtypes(cls):
subtypes = cls.__args__
if get_base_generic(cls) is typing.Callable:
if len(subtypes) != 2 or subtypes[0] is not ...:
subtypes = (subtypes[:-1], subtypes[-1])
return subtypes
else:
# python 3.5
def _get_subtypes(cls):
if isinstance(cls, typing.CallableMeta):
if cls.__args__ is None:
return ()
return cls.__args__, cls.__result__
for name in ['__parameters__', '__union_params__', '__tuple_params__']:
try:
subtypes = getattr(cls, name)
break
except AttributeError:
pass
else:
raise NotImplementedError("Cannot extract subtypes from {}".format(cls))
subtypes = [typ for typ in subtypes if not isinstance(typ, typing.TypeVar)]
return subtypes
def is_generic(cls):
"""
Detects any kind of generic, for example `List` or `List[int]`. This includes "special" types like
Union and Tuple - anything that's subscriptable, basically.
"""
return _is_generic(cls)
def is_base_generic(cls):
"""
Detects generic base classes, for example `List` (but not `List[int]`)
"""
return _is_base_generic(cls)
def is_qualified_generic(cls):
"""
Detects generics with arguments, for example `List[int]` (but not `List`)
"""
return is_generic(cls) and not is_base_generic(cls)
def get_base_generic(cls):
if not is_qualified_generic(cls):
raise TypeError('{} is not a qualified Generic and thus has no base'.format(cls))
return _get_base_generic(cls)
def get_subtypes(cls):
return _get_subtypes(cls)
def _instancecheck_iterable(iterable, type_args):
if len(type_args) != 1:
raise TypeError("Generic iterables must have exactly 1 type argument; found {}".format(type_args))
type_ = type_args[0]
return all(is_instance(val, type_) for val in iterable)
def _instancecheck_mapping(mapping, type_args):
return _instancecheck_itemsview(mapping.items(), type_args)
def _instancecheck_itemsview(itemsview, type_args):
if len(type_args) != 2:
raise TypeError("Generic mappings must have exactly 2 type arguments; found {}".format(type_args))
key_type, value_type = type_args
return all(is_instance(key, key_type) and is_instance(val, value_type) for key, val in itemsview)
def _instancecheck_tuple(tup, type_args):
if len(tup) != len(type_args):
return False
return all(is_instance(val, type_) for val, type_ in zip(tup, type_args))
_ORIGIN_TYPE_CHECKERS = {}
for class_path, check_func in {
# iterables
'typing.Container': _instancecheck_iterable,
'typing.Collection': _instancecheck_iterable,
'typing.AbstractSet': _instancecheck_iterable,
'typing.MutableSet': _instancecheck_iterable,
'typing.Sequence': _instancecheck_iterable,
'typing.MutableSequence': _instancecheck_iterable,
'typing.ByteString': _instancecheck_iterable,
'typing.Deque': _instancecheck_iterable,
'typing.List': _instancecheck_iterable,
'typing.Set': _instancecheck_iterable,
'typing.FrozenSet': _instancecheck_iterable,
'typing.KeysView': _instancecheck_iterable,
'typing.ValuesView': _instancecheck_iterable,
'typing.AsyncIterable': _instancecheck_iterable,
# mappings
'typing.Mapping': _instancecheck_mapping,
'typing.MutableMapping': _instancecheck_mapping,
'typing.MappingView': _instancecheck_mapping,
'typing.ItemsView': _instancecheck_itemsview,
'typing.Dict': _instancecheck_mapping,
'typing.DefaultDict': _instancecheck_mapping,
'typing.Counter': _instancecheck_mapping,
'typing.ChainMap': _instancecheck_mapping,
# other
'typing.Tuple': _instancecheck_tuple,
}.items():
try:
cls = eval(class_path)
except AttributeError:
continue
_ORIGIN_TYPE_CHECKERS[cls] = check_func
def _instancecheck_callable(value, type_):
if not callable(value):
return False
if is_base_generic(type_):
return True
param_types, ret_type = get_subtypes(type_)
sig = inspect.signature(value)
missing_annotations = []
if param_types is not ...:
if len(param_types) != len(sig.parameters):
return False
# FIXME: add support for TypeVars
# if any of the existing annotations don't match the type, we'll return False.
# Then, if any annotations are missing, we'll throw an exception.
for param, expected_type in zip(sig.parameters.values(), param_types):
param_type = param.annotation
if param_type is inspect.Parameter.empty:
missing_annotations.append(param)
continue
if not is_subtype(param_type, expected_type):
return False
if sig.return_annotation is inspect.Signature.empty:
missing_annotations.append('return')
else:
if not is_subtype(sig.return_annotation, ret_type):
return False
if missing_annotations:
raise ValueError("Missing annotations: {}".format(missing_annotations))
return True
def _instancecheck_union(value, type_):
types = get_subtypes(type_)
return any(is_instance(value, typ) for typ in types)
def _instancecheck_type(value, type_):
# if it's not a class, return False
if not isinstance(value, type):
return False
if is_base_generic(type_):
return True
type_args = get_subtypes(type_)
if len(type_args) != 1:
raise TypeError("Type must have exactly 1 type argument; found {}".format(type_args))
return is_subtype(value, type_args[0])
_SPECIAL_INSTANCE_CHECKERS = {
'Union': _instancecheck_union,
'Callable': _instancecheck_callable,
'Type': _instancecheck_type,
'Any': lambda v, t: True,
}
def is_instance(obj, type_):
if type_.__module__ == 'typing':
if is_qualified_generic(type_):
base_generic = get_base_generic(type_)
else:
base_generic = type_
name = _get_name(base_generic)
try:
validator = _SPECIAL_INSTANCE_CHECKERS[name]
except KeyError:
pass
else:
return validator(obj, type_)
if is_base_generic(type_):
python_type = _get_python_type(type_)
return isinstance(obj, python_type)
if is_qualified_generic(type_):
python_type = _get_python_type(type_)
if not isinstance(obj, python_type):
return False
base = get_base_generic(type_)
try:
validator = _ORIGIN_TYPE_CHECKERS[base]
except KeyError:
raise NotImplementedError("Cannot perform isinstance check for type {}".format(type_))
type_args = get_subtypes(type_)
return validator(obj, type_args)
return isinstance(obj, type_)
def is_subtype(sub_type, super_type):
if not is_generic(sub_type):
python_super = python_type(super_type)
return issubclass(sub_type, python_super)
# at this point we know `sub_type` is a generic
python_sub = python_type(sub_type)
python_super = python_type(super_type)
if not issubclass(python_sub, python_super):
return False
# at this point we know that `sub_type`'s base type is a subtype of `super_type`'s base type.
# If `super_type` isn't qualified, then there's nothing more to do.
if not is_generic(super_type) or is_base_generic(super_type):
return True
# at this point we know that `super_type` is a qualified generic... so if `sub_type` isn't
# qualified, it can't be a subtype.
if is_base_generic(sub_type):
return False
# at this point we know that both types are qualified generics, so we just have to
# compare their sub-types.
sub_args = get_subtypes(sub_type)
super_args = get_subtypes(super_type)
return all(is_subtype(sub_arg, super_arg) for sub_arg, super_arg in zip(sub_args, super_args))
def python_type(annotation):
"""
Given a type annotation or a class as input, returns the corresponding python class.
Examples:
::
>>> python_type(typing.Dict)
<class 'dict'>
>>> python_type(typing.List[int])
<class 'list'>
>>> python_type(int)
<class 'int'>
"""
try:
mro = annotation.mro()
except AttributeError:
# if it doesn't have an mro method, it must be a weird typing object
return _get_python_type(annotation)
if Type in mro:
return annotation.python_type
elif annotation.__module__ == 'typing':
return _get_python_type(annotation)
else:
return annotation
Demonstration:
>>> is_instance([{'x': 3}], List[Dict[str, int]])
True
>>> is_instance([{'x': 3}, {'y': 7.5}], List[Dict[str, int]])
False
(As far as I'm aware, this supports all python versions, even the ones <3.5 using the typing module backport.)
It's awkward that there's no built-in function for this but typeguard comes with a convenient check_type() function:
>>> from typeguard import check_type
>>> from typing import List
>>> check_type("foo", [1,2,"3"], List[int])
Traceback (most recent call last):
...
TypeError: type of foo[2] must be int; got str instead
type of foo[2] must be int; got str instead
For more see: https://typeguard.readthedocs.io/en/latest/api.html#typeguard.check_type
First of all, even though I think you are aware but rather for the sake of completeness, the typing library contains types for type hints. These type hints are used by IDE's to check if your code is somewhat sane, and also serves as documentation what types a developer expects.
To check whether a variable is a type of something, we have to use the isinstance function. Amazingly, we can use direct types of the typing library function, eg.
from typing import List
value = []
isinstance(value, List)
However, for nested structures such as List[Dict[str, int]] we cannot use this directly, because you funny enough get a TypeError. What you have to do is:
Check if the initial value is a list
Check if each item of the list is of type dict
Check if each key of each dict is in fact a string and if each value is in fact an int
Unfortunately, for strict checking python is a bit cumbersome. However, do be aware that python makes use of duck typing: if it is like a duck and behaves like a duck, then it definitely is a duck.
The common way to handle this is by making use of the fact that if whatever object you pass to myfun doesn't have the required functionality a corresponding exception will be raised (usually TypeError or AttributeError). So you would do the following:
try:
myfun(data)
except (TypeError, AttributeError) as err:
# Fallback for invalid types here.
You indicate in your question that you would raise a TypeError if the passed object does not have the appropriate structure but Python does this already for you. The critical question is how you would handle this case. You could also move the try / except block into myfun, if appropriate. When it comes to typing in Python you usually rely on duck typing: if the object has the required functionality then you don't care much about what type it is, as long as it serves the purpose.
Consider the following example. We just pass the data into the function and then get the AttributeError for free (which we can then except); no need for manual type checking:
>>> def myfun(data):
... for x in data:
... print(x.items())
...
>>> data = json.loads('[[["a", 1], ["b", 2]], [["c", 3], ["d", 4]]]')
>>> myfun(data)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in myfun
AttributeError: 'list' object has no attribute 'items'
In case you are concerned about the usefulness of the resulting error, you could still except and then re-raise a custom exception (or even change the exception's message):
try:
myfun(data)
except (TypeError, AttributeError) as err:
raise TypeError('Data has incorrect structure') from err
try:
myfun(data)
except (TypeError, AttributeError) as err:
err.args = ('Data has incorrect structure',)
raise
When using third-party code one should always check the documentation for exceptions that will be raised. For example numpy.inner reports that it will raise a ValueError under certain circumstances. When using that function we don't need to perform any checks ourselves but rely on the fact that it will raise the error if needed. When using third-party code for which it is not clear how it will behave in some corner-cases, i.m.o. it is easier and clearer to just hardcode a corresponding type checker (see below) instead of using a generic solution that works for any type. These cases should be rare anyway and leaving a corresponding comment makes your fellow developers aware of the situation.
The typing library is for type-hinting and as such it won't be checking the types at runtime. Sure you could do this manually but it is rather cumbersome:
def type_checker(data):
return (
isinstance(data, list)
and all(isinstance(x, dict) for x in list)
and all(isinstance(k, str) and isinstance(v, int) for x in list for k, v in x.items())
)
This together with an appropriate comment is still an acceptable solution and it is reusable where a similar data structure is expected. The intent is clear and the code is easily verifiable.
You would have to check your nested type structure manually - the type hint's are not enforced.
Checking like this ist best done using ABC (Abstract Meta Classes) - so users can provide their derived classes that support the same accessing as default dict/lists:
import collections.abc
def isCorrectType(data):
if isinstance(data, collections.abc.Collection):
for d in data:
if isinstance(d,collections.abc.MutableMapping):
for key in d:
if isinstance(key,str) and isinstance(d[key],int):
pass
else:
return False
else:
return False
else:
return False
return True
Output:
print ( isCorrectType( [ {"a":2} ] )) # True
print ( isCorrectType( [ {2:2} ] )) # False
print ( isCorrectType( [ {"a":"a"} ] )) # False
print ( isCorrectType( [ {"a":2},1 ] )) # False
Doku:
ABC - abstract meta classes
Related:
What is duck typing?
The other way round would be to follow the "Ask forgiveness not permission" - explain paradigm and simyply use your data in the form you want and try:/except: around if if it does not conform to what you wanted. This fits better with What is duck typing? - and allows (similar to ABC-checking) the consumer to provide you with derived classes from list/dict while it still will work...
If all you want to do is json-parsing, you should just use pydantic.
But, I encountered the same problem where I wanted to check the type of python objects, so I created a simpler solution than in other answers that handles at least complex types with nested lists and dictionaries.
I created a gist with this method at https://gist.github.com/ramraj07/f537bf9f80b4133c65dd76c958d4c461
Some example uses of this method include:
from typing import List, Dict, Union, Type, Optional
check_type('a', str)
check_type({'a': 1}, Dict[str, int])
check_type([{'a': [1.0]}, 'ten'], List[Union[Dict[str, List[float]], str]])
check_type(None, Optional[str])
check_type('abc', Optional[str])
Here's the code below for reference:
import typing
def check_type(obj: typing.Any, type_to_check: typing.Any, _external=True) -> None:
try:
if not hasattr(type_to_check, "_name"):
# base-case
if not isinstance(obj, type_to_check):
raise TypeError
return
# type_to_check is from typing library
type_name = type_to_check._name
if type_to_check is typing.Any:
pass
elif type_name in ("List", "Tuple"):
if (type_name == "List" and not isinstance(obj, list)) or (
type_name == "Tuple" and not isinstance(obj, tuple)
):
raise TypeError
element_type = type_to_check.__args__[0]
for element in obj:
check_type(element, element_type, _external=False)
elif type_name == "Dict":
if not isinstance(obj, dict):
raise TypeError
if len(type_to_check.__args__) != 2:
raise NotImplementedError(
"check_type can only accept Dict typing with separate annotations for key and values"
)
key_type, value_type = type_to_check.__args__
for key, value in obj.items():
check_type(key, key_type, _external=False)
check_type(value, value_type, _external=False)
elif type_name is None and type_to_check.__origin__ is typing.Union:
type_options = type_to_check.__args__
no_option_matched = True
for type_option in type_options:
try:
check_type(obj, type_option, _external=False)
no_option_matched = False
break
except TypeError:
pass
if no_option_matched:
raise TypeError
else:
raise NotImplementedError(
f"check_type method currently does not support checking typing of form '{type_name}'"
)
except TypeError:
if _external:
raise TypeError(
f"Object {repr(obj)} is of type {_construct_type_description(obj)} "
f"when {type_to_check} was expected"
)
raise TypeError()
def _construct_type_description(obj) -> str:
def get_types_in_iterable(iterable) -> str:
types = {_construct_type_description(element) for element in iterable}
return types.pop() if len(types) == 1 else f"Union[{','.join(types)}]"
if isinstance(obj, list):
return f"List[{get_types_in_iterable(obj)}]"
elif isinstance(obj, dict):
key_types = get_types_in_iterable(obj.keys())
val_types = get_types_in_iterable(obj.values())
return f"Dict[{key_types}, {val_types}]"
else:
return type(obj).__name__

Dynamically subclass an Enum base class

I've set up a metaclass and base class pair for creating the line specifications of several different file types I have to parse.
I have decided to go with using enumerations because many of the individual parts of the different lines in the same file often have the same name. Enums make it easy to tell them apart. Additionally, the specification is rigid and there will be no need to add more members, or extend the line specifications later.
The specification classes work as expected. However, I am having some trouble dynamically creating them:
>>> C1 = LineMakerMeta('C1', (LineMakerBase,), dict(a = 0))
AttributeError: 'dict' object has no attribute '_member_names'
Is there a way around this? The example below works just fine:
class A1(LineMakerBase):
Mode = 0, dict(fill=' ', align='>', type='s')
Level = 8, dict(fill=' ', align='>', type='d')
Method = 10, dict(fill=' ', align='>', type='d')
_dummy = 20 # so that Method has a known length
A1.format(**dict(Mode='DESIGN', Level=3, Method=1))
# produces ' DESIGN 3 1'
The metaclass is based on enum.EnumMeta, and looks like this:
import enum
class LineMakerMeta(enum.EnumMeta):
"Metaclass to produce formattable LineMaker child classes."
def _iter_format(cls):
"Iteratively generate formatters for the class members."
for member in cls:
yield member.formatter
def __str__(cls):
"Returns string line with all default values."
return cls.format()
def format(cls, **kwargs):
"Create formatted version of the line populated by the kwargs members."
# build resulting string by iterating through members
result = ''
for member in cls:
# determine value to be injected into member
try:
try:
value = kwargs[member]
except KeyError:
value = kwargs[member.name]
except KeyError:
value = member.default
value_str = member.populate(value)
result = result + value_str
return result
And the base class is as follows:
class LineMakerBase(enum.Enum, metaclass=LineMakerMeta):
"""A base class for creating Enum subclasses used for populating lines of a file.
Usage:
class LineMaker(LineMakerBase):
a = 0, dict(align='>', fill=' ', type='f'), 3.14
b = 10, dict(align='>', fill=' ', type='d'), 1
b = 15, dict(align='>', fill=' ', type='s'), 'foo'
# ^-start ^---spec dictionary ^--default
"""
def __init__(member, start, spec={}, default=None):
member.start = start
member.spec = spec
if default is not None:
member.default = default
else:
# assume value is numerical for all provided types other than 's' (string)
default_or_set_type = member.spec.get('type','s')
default = {'s': ''}.get(default_or_set_type, 0)
member.default = default
#property
def formatter(member):
"""Produces a formatter in form of '{0:<format>}' based on the member.spec
dictionary. The member.spec dictionary makes use of these keys ONLY (see
the string.format docs):
fill align sign width grouping_option precision type"""
try:
# get cached value
return '{{0:{}}}'.format(member._formatter)
except AttributeError:
# add width to format spec if not there
member.spec.setdefault('width', member.length if member.length != 0 else '')
# build formatter using the available parts in the member.spec dictionary
# any missing parts will simply not be present in the formatter
formatter = ''
for part in 'fill align sign width grouping_option precision type'.split():
try:
spec_value = member.spec[part]
except KeyError:
# missing part
continue
else:
# add part
sub_formatter = '{!s}'.format(spec_value)
formatter = formatter + sub_formatter
member._formatter = formatter
return '{{0:{}}}'.format(formatter)
def populate(member, value=None):
"Injects the value into the member's formatter and returns the formatted string."
formatter = member.formatter
if value is not None:
value_str = formatter.format(value)
else:
value_str = formatter.format(member.default)
if len(value_str) > len(member) and len(member) != 0:
raise ValueError(
'Length of object string {} ({}) exceeds available'
' field length for {} ({}).'
.format(value_str, len(value_str), member.name, len(member)))
return value_str
#property
def length(member):
return len(member)
def __len__(member):
"""Returns the length of the member field. The last member has no length.
Length are based on simple subtraction of starting positions."""
# get cached value
try:
return member._length
# calculate member length
except AttributeError:
# compare by member values because member could be an alias
members = list(type(member))
try:
next_index = next(
i+1
for i,m in enumerate(type(member))
if m.value == member.value
)
except StopIteration:
raise TypeError(
'The member value {} was not located in the {}.'
.format(member.value, type(member).__name__)
)
try:
next_member = members[next_index]
except IndexError:
# last member defaults to no length
length = 0
else:
length = next_member.start - member.start
member._length = length
return length
This line:
C1 = enum.EnumMeta('C1', (), dict(a = 0))
fails with exactly the same error message. The __new__ method of EnumMeta expects an instance of enum._EnumDict as its last argument. _EnumDict is a subclass of dict and provides an instance variable named _member_names, which of course a regular dict doesn't have. When you go through the standard mechanism of enum creation, this all happens correctly behind the scenes. That's why your other example works just fine.
This line:
C1 = enum.EnumMeta('C1', (), enum._EnumDict())
runs with no error. Unfortunately, the constructor of _EnumDict is defined as taking no arguments, so you can't initialize it with keywords as you apparently want to do.
In the implementation of enum that's backported to Python3.3, the following block of code appears in the constructor of EnumMeta. You could do something similar in your LineMakerMeta class:
def __new__(metacls, cls, bases, classdict):
if type(classdict) is dict:
original_dict = classdict
classdict = _EnumDict()
for k, v in original_dict.items():
classdict[k] = v
In the official implementation, in Python3.5, the if statement and the subsequent block of code is gone for some reason. Therefore classdict must be an honest-to-god _EnumDict, and I don't see why this was done. In any case the implementation of Enum is extremely complicated and handles a lot of corner cases.
I realize this is not a cut-and-dried answer to your question but I hope it will point you to a solution.
Create your LineMakerBase class, and then use it like so:
C1 = LineMakerBase('C1', dict(a=0))
The metaclass was not meant to be used the way you are trying to use it. Check out this answer for advice on when metaclass subclasses are needed.
Some suggestions for your code:
the double try/except in format seems clearer as:
for member in cls:
if member in kwargs:
value = kwargs[member]
elif member.name in kwargs:
value = kwargs[member.name]
else:
value = member.default
this code:
# compare by member values because member could be an alias
members = list(type(member))
would be clearer with list(member.__class__)
has a false comment: listing an Enum class will never include the aliases (unless you have overridden that part of EnumMeta)
instead of the complicated __len__ code you have now, and as long as you are subclassing EnumMeta you should extend __new__ to automatically calculate the lengths once:
# untested
def __new__(metacls, cls, bases, clsdict):
# let the main EnumMeta code do the heavy lifting
enum_cls = super(LineMakerMeta, metacls).__new__(cls, bases, clsdict)
# go through the members and calculate the lengths
canonical_members = [
member
for name, member in enum_cls.__members__.items()
if name == member.name
]
last_member = None
for next_member in canonical_members:
next_member.length = 0
if last_member is not None:
last_member.length = next_member.start - last_member.start
The simplest way to create Enum subclasses on the fly is using Enum itself:
>>> from enum import Enum
>>> MyEnum = Enum('MyEnum', {'a': 0})
>>> MyEnum
<enum 'MyEnum'>
>>> MyEnum.a
<MyEnum.a: 0>
>>> type(MyEnum)
<class 'enum.EnumMeta'>
As for your custom methods, it might be simpler if you used regular functions, precisely because Enum implementation is so special.

Categories