Pydantic - How to add a field that keeps changing its name? - python

I am working with an API that is returning a response that contains fields like this:
{
"0e933a3c-0daa-4a33-92b5-89d38180a142": someValue
}
Where the field name is a UUID that changes depending on the request (but is not included in the actual request parameters). How do I declare that in a dataclass in Python? It would essentially be str: str, but that would interpret the key as literally "str" instead of a type.

I personally feel the simplest approach would be to create a custom Container dataclass. This would then split the dictionary data up, by first the keys and then individually by the values.
The one benefit of this is that you could then access the list by index value instead of searching by the random uuid itself, which from what I understand is something you won't be doing at all. So for example, you could access the first string value like values[0] if you wanted to.
Here is a sample implementation of this:
from dataclasses import dataclass
#dataclass(init=False, slots=True)
class MyContainer:
ids: list[str]
# can be annotated as `str: str` or however you desire
values: list[str]
def __init__(self, input_data: dict):
self.ids = list(input_data)
self.values = list(input_data.values())
def orig_dict(self):
return dict(zip(self.ids, self.values))
input_dict = {
"0e933a3c-0daa-4a33-92b5-89d38180a142": "test",
"25a82f15-abe9-49e2-b039-1fb608c729e0": "hello",
"f9b7e20d-3d11-4620-9780-4f500fee9d65": "world !!",
}
c = MyContainer(input_dict)
print(c)
assert c.values[0] == 'test'
assert c.values[1] == 'hello'
assert c.values[2] == 'world !!'
assert c.orig_dict() == input_dict
Output:
MyClass(values=['test', 'hello', 'world !!'], ids=['0e933a3c-0daa-4a33-92b5-89d38180a142', '25a82f15-abe9-49e2-b039-1fb608c729e0', 'f9b7e20d-3d11-4620-9780-4f500fee9d65'])

Related

Python dynamic enum

I would like to create dynamic enums in python loaded from a SQL Table. The output of SQL will be a list of tuplets, which with I want to fill the attributes of the enum.
Lets say I receive this list:
lst = [('PROCESS_0', 0, "value", 123, False), ('PROCESS_1',1,"anothervalue", 456, True)]
I now want to fill the values in the enum below:
class Jobs(IntEnum):
def __new__(cls, value: int, label: str, heartbeat: int = 60, heartbeat_required: bool = False):
obj = int.__new__(cls, value)
obj._value_ = value
obj.label = label
obj.heartbeat = heartbeat
obj.heartbeat_required = heartbeat_required
return obj
The first variable in the tuple should be the variable name of the enum, I have solved this with:
locals()['Test'] = (0, '', 789, False)
But this only works for single values, it seems that I can not run a for loop within enum. When using a for loop like this:
for i in lst:
locals()[i[0]] = (i[1], i[2], i[3])
Python sends this error TypeError: Attempted to reuse key: 'i' which propably comes from enums only having constants.
Is there any (possibly elegant) solution for this?
Many thanks in advance!
You need to use _ignore_ = "i". Something like:
class Jobs(IntEnum):
_ignore_ = "i"
def __new__(cls, value, label, heartbeat=60, heartbeat_required=False):
obj = int.__new__(cls, value)
obj._value_ = value
obj.label = label
obj.heartbeat = heartbeat
obj.heartbeat_required = heartbeat_required
return obj
for i in lst:
locals()[i[0]] = i[1:]
Check the example at https://docs.python.org/3/howto/enum.html#timeperiod
Note that the _ignore_ can be avoided in favor of dict comprehension
from datetime import timedelta
class Period(timedelta, Enum):
"different lengths of time"
vars().update({ f"day_{i}": i for i in range(367) })
Then you can access all possible enum values via Period.__members__

Look up items in list of dataclasses by value

I'm using python to filter data in elasticsearch based on request params provided. I've got a working example, but know it can be improved and am trying to think of a better way. The current code is like this:
#dataclass(frozen=True)
class Filter:
param: str
custom_es_field: Optional[str] = None
is_bool_query: bool = False
is_date_query: bool = False
is_range_query: bool = False
def es_field(self) -> str:
if self.custom_es_field:
field = self.custom_es_field
elif "." in self.param:
field = self.param.replace(".", "__")
else:
field = self.param
return field
filters = [
Filter(param="publication_year", is_range_query=True),
Filter(param="publication_date", is_date_query=True),
Filter(param="venue.issn").
...
]
def filter_records(filter_params, s):
for filter in filters:
# range query
if filter.param in filter_params and filter.is_range_query:
param = filter_params[filter.param]
if "<" in param:
param = param[1:]
validate_range_param(filter, param)
kwargs = {filter.es_field(): {"lte": int(param)}}
s = s.filter("range", **kwargs)
elif filter.param in filter_params and filter.is_bool_query:
....
The thing I think is slow is I am looping through all of the filters in order to use the one that came in as a request variable. I'm tempted to convert this to a dictionary so I can do filter["publication_year"], but I like having the extra methods available via the dataclass. Would love to hear any thoughts.

Python Refactor JSON into different JSON Structure

I have a bunch of JSON data that I did mostly by hand. Several thousand lines. I need to refactor it into a totally different format using Python.
An overview of my 'stuff':
Column: The basic 'unit' of my data. Each Column has attributes. Don't worry about the meaning of the attributes, but the attributes need to be retained for each Column if they exist.
Folder: Folders group Columns and other Folders together. The folders currently have no attributes, they (currently) only contain other Folder and Column objects (Object does not necessarily refer to JSON objects here... more of an 'entity')
Universe: Universes group everything into big chunks which, in the larger scope of my project, are unable to interact with each other. That is not important here, but that's what they do.
Some limitations:
Columns cannot contain other Column objects, Folder objects, or Universe objects.
Folders cannot contain Universe objects.
Universes cannot contain other Universe objects.
Currently, I have Columns in this form:
"Column0Name": {
"type": "a type",
"dtype": "data type",
"description": "abcdefg"
}
and I need it to go to:
{
"name": "Column0Name",
"type": "a type",
"dtype": "data type",
"description": "abcdefg"
}
Essentially I need to convert the Column key-value things to an array of things (I am new to JSON, don't know the terminology). I also need each Folder to end up with two new JSON arrays (in addition to the "name": "FolderName" key-value pair). It needs a "folders": [] and "columns": [] to be added. So I have this for folders:
"Folder0Name": {
"Column0Name": {
"type": "a",
"dtype": "b",
"description": "c"
},
"Column1Name": {
"type": "d",
"dtype": "e",
"description": "f"
}
}
and need to go to this:
{
"name": "Folder0Name",
"folders": [],
"columns": [
{"name": "Column0Name", "type": "a", "dtype": "b", "description": "c"},
{"name": "Column1Name", "type": "d", "dtype": "e", "description": "f"}
]
}
The folders will also end up in an array inside its parent Universe. Likewise, each Universe will end up with "name", "folders", and "columns" things. As such:
{
"name": "Universe0",
"folders": [a bunch of folders in a JSON array],
"columns": [occasionally some columns in a JSON array]
}
Bottom line:
I'm going to guess that I need a recursive function to iterate though all the nested dictionaries after I import the JSON data with the json Python module.
I'm thinking some sort of usage of yield might help but I'm not super familiar yet with it.
Would it be easier to update the dicts as I go, or destroy each key-value pairs and construct an entirely new dict as I go?
Here is what I have so far. I'm stuck on getting the generator to return actual dictionaries instead of a generator object.
import json
class AllUniverses:
"""Container to hold all the Universes found in the json file"""
def __init__(self, filename):
self._fn = filename
self.data = {}
self.read_data()
def read_data(self):
with open(self._fn, 'r') as fin:
self.data = json.load(fin)
return self
def universe_key(self):
"""Get the next universe key from the dict of all universes
The key will be used as the name for the universe.
"""
yield from self.data
class Universe:
def __init__(self, json_filename):
self._au = AllUniverses(filename=json_filename)
self.uni_key = self._au.universe_key()
self._universe_data = self._au.data.copy()
self._col_attrs = ['type', 'dtype', 'description', 'aggregation']
self._folders_list = []
self._columns_list = []
self._type = "Universe"
self._name = ""
self.uni = dict()
self.is_folder = False
self.is_column = False
def output(self):
# TODO: Pass this to json.dump?
# TODO: Still need to get the actual folder and column dictionaries
# from the generators
out = {
"name": self._name,
"type": "Universe",
"folder": [f.me for f in self._folders_list],
"columns": [c.me for c in self._columns_list]}
return out
def update_universe(self):
"""Get the next universe"""
universe_k = next(self.uni_key)
self._name = str(universe_k)
self.uni = self._universe_data.pop(universe_k)
return self
def parse_nodes(self):
"""Process all child nodes"""
nodes = [_ for _ in self.uni.keys()]
for k in nodes:
v = self.uni.pop(k)
self._is_column(val=v)
if self.is_column:
fc = Column(data=v, key_name=k)
self._columns_list.append(fc)
else:
fc = Folder(data=v, key_name=k)
self._folders_list.append(fc)
return self
def _is_column(self, val):
"""Determine if val is a Column or Folder object"""
self.is_folder = False
self._column = False
if isinstance(val, dict) and not val:
self.is_folder = True
elif not isinstance(val, dict):
raise TypeError('Cannot handle inputs not of type dict')
elif any([i in val.keys() for i in self._col_attrs]):
self._column = True
else:
self.is_folder = True
return self
def parse_children(self):
for folder in self._folders_list:
assert(isinstance(folder, Folder)), f'bletch idk what happened'
folder.parse_nodes()
class Folder:
def __init__(self, data, key_name):
self._data = data.copy()
self._name = str(key_name)
self._node_keys = [_ for _ in self._data.keys()]
self._folders = []
self._columns = []
self._col_attrs = ['type', 'dtype', 'description', 'aggregation']
#property
def me(self):
# maybe this should force the code to parse all children of this
# Folder? Need to convert the generator into actual dictionaries
return {"name": self._name, "type": "Folder",
"columns": [(c.me for c in self._columns)],
"folders": [(f.me for f in self._folders)]}
def parse_nodes(self):
"""Parse all the children of this Folder
Parse through all the node names. If it is detected to be a Folder
then create a Folder obj. from it and add to the list of Folder
objects. Else create a Column obj. from it and append to the list
of Column obj.
This should be appending dictionaries
"""
for key in self._node_keys:
_folder = False
_column = False
values = self._data.copy()[key]
if isinstance(values, dict) and not values:
_folder = True
elif not isinstance(values, dict):
raise TypeError('Cannot handle inputs not of type dict')
elif any([i in values.keys() for i in self._col_attrs]):
_column = True
else:
_folder = True
if _folder:
f = Folder(data=values, key_name=key)
self._folders.append(f.me)
else:
c = Column(data=values, key_name=key)
self._columns.append(c.me)
return self
class Column:
def __init__(self, data, key_name):
self._data = data.copy()
self._stupid_check()
self._me = {
'name': str(key_name),
'type': 'Column',
'ctype': self._data.pop('type'),
'dtype': self._data.pop('dtype'),
'description': self._data.pop('description'),
'aggregation': self._data.pop('aggregation')}
def __str__(self):
# TODO: pretty sure this isn't correct
return str(self.me)
#property
def me(self):
return self._me
def to_json(self):
# This seems to be working? I think?
return json.dumps(self, default=lambda o: str(self.me)) # o.__dict__)
def _stupid_check(self):
"""If the key isn't in the dictionary, add it"""
keys = [_ for _ in self._data.keys()]
keys_defining_a_column = ['type', 'dtype', 'description', 'aggregation']
for json_key in keys_defining_a_column:
if json_key not in keys:
self._data[json_key] = ""
return self
if __name__ == "__main__":
file = r"dummy_json_data.json"
u = Universe(json_filename=file)
u.update_universe()
u.parse_nodes()
u.parse_children()
print('check me')
And it gives me this:
{
"name":"UniverseName",
"type":"Universe",
"folder":[
{"name":"Folder0Name",
"type":"Folder",
"columns":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB0B0>],
"folders":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB190>]
},
{"name":"Folder2Name",
"type":"Folder",
"columns":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB040>],
"folders":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB120>]
},
{"name":"Folder4Name",
"type":"Folder",
"columns":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB270>],
"folders":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB200>]
},
{"name":"Folder6Name",
"type":"Folder",
"columns":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB2E0>],
"folders":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB350>]
},
{"name":"Folder8Name",
"type":"Folder",
"columns":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB3C0>],
"folders":[<generator object Folder.me.<locals>.<genexpr> at 0x000001ACFBEDB430>]
}
],
"columns":[]
}
If there is an existing tool for this kind of transformation so that I don't have to write Python code, that would be an attractive alternative, too.
Lets create the 3 classes needed to represent Columns, Folders and Unverses. Before starting some topics I wanna talk about, I give a short description of them here, if any of them is new to you I can go deeper:
I will use type annotations to make clear what type each variable is.
I am gonna use __slots__. By telling the Column class that its instances are gonna have a name, ctype, dtype, description and aggragation attributes, each instance of Column will require less memory space. The downside is that it will not accept any other attribute not listed there. This is, it saves memory but looses flexibility. As we are going to have several (maybe hundreds or thousands) of instances, reduced memory footprint seems more important than the flexibility of being able to add any attribute.
Each class will have the standard constructor where every argument has a default value except name, which is mandatory.
Each class will have another constructor called from_old_syntax. It is going to be a class method that receives the string corresponding to the name and a dict corresponding to the data as its arguments and outputs the corresponding instance (Column, Folder or Universe).
Universes are basically the same as Folders with different names (for now) so it will basically inherit it (class Universe(Folder): pass).
from typing import List
class Column:
__slots__ = 'name', 'ctype', 'dtype', 'description', 'aggregation'
def __init__(
self,
name: str,
ctype: str = '',
dtype: str = '',
description: str = '',
aggregation: str = '',
) -> None:
self.name = name
self.ctype = ctype
self.dtype = dtype
self.description = description
self.aggregation = aggregation
#classmethod
def from_old_syntax(cls, name: str, data: dict) -> "Column":
column = cls(name)
for key, value in data.items():
# The old syntax used type for column type but in the new syntax it
# will have another meaning so we use ctype instead
if key == 'type':
key = 'ctype'
try:
setattr(column, key, value)
except AttributeError as e:
raise AttributeError(f"Unexpected key {key} for Column") from e
return column
class Folder:
__slots__ = 'name', 'folders', 'columns'
def __init__(
self,
name: str,
columns: List[Column] = None,
folders: List["Folder"] = None,
) -> None:
self.name = name
if columns is None:
self.columns = []
else:
self.columns = [column for column in columns]
if folders is None:
self.folders = []
else:
self.folders = [folder for folder in folders]
#classmethod
def from_old_syntax(cls, name: str, data: dict) -> "Folder":
columns = [] # type: List[Column]
folders = [] # type: List["Folder"]
for key, value in data.items():
# Determine if it is a Column or a Folder
if 'type' in value and 'dtype' in value:
columns.append(Column.from_old_syntax(key, value))
else:
folders.append(Folder.from_old_syntax(key, value))
return cls(name, columns, folders)
class Universe(Folder):
pass
As you can see the constructors are pretty trivial, assign the arguments to the attributes and done. In the case of Folders (and thus in Universes too), two arguments are lists of columns and folders. The default value is None (in this case we initialize as an empty list) because using mutable variables as default values has some issues so it is good practice to use None as the default value for mutable variables (such as lists).
Column's from_old_syntax class method creates an empty Column with the provided name. Afterwards we iterate over the data dict that was also provided and assign its key value pair to its corresponding attribute. There is a special case where "type" key is converted to "ctype" as "type" is going to be used for a different purpose with the new syntax. The assignation itself is done by setattr(column, key, value). We have included it inside a try ... except ... clause because as we said above, only the items in __slots__ can be used as attributes, so if there is an attribute that you forgot, you will get an exception saying "AttributeError: Unexpected key 'NAME'" and you will only have to add that "NAME" to the __slots__.
Folder's (and thus Unverse's) from_old_syntax class method is even simpler. Create a list of columns and folders, iterate over the data checking if it is a folder or a column and use the appropiate from_old_syntax class method. Then use those two lists and the provided name to return the instance. Notice that Folder.from_old_syntax notation is used to create the folders instead of cls.from_old_syntax because cls may be Universe. However, to create the insdance we do use cls(...) as here we do want to use Universe or Folder.
Now you could do universes = [Universe.from_old_syntax(name, data) for name, data in json.load(f).items()] where f is the file and you will get all your Universes, Folders and Columns in memory. So now we need to encode them back to JSON. For this we are gonna extend the json.JSONEncoder so that it knows how to parse our classes into dictionaries that it can encode normally. To do so, you just need to overwrite the default method, check if the object passed is of our classes and return a dict that will be encoded. If it is not one of our classes we will let the parent default method to take care of it.
import json
# JSON fields with this values will be omitted
EMPTY_VALUES = "", [], {}
class CustomEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (Column, Folder, Universe)):
# Make a dict with every item in their respective __slots__
data = {
attr: getattr(obj, attr) for attr in obj.__slots__
if getattr(obj, attr) not in EMPTY_VALUES
}
# Add the type fild with the class name
data['type'] = obj.__class__.__name__
return data
# Use the parent class function for any object not handled explicitly
super().default(obj)
Converting the classes to dictionaries is basically taking what is in __slots__ as the key and the attribute's value as the value. We will filter those values that are an empty string, an empty list or an empty dict as we do not need to write them to JSON. We finally add the "type" key to the dict by reading the objects class name (Column, Folder and Universe).
To use it you have to pass the CustomEncoder as the cls argument to json.dump.
So the code will look like this (omitting the class definitions to keep it short):
import json
from typing import List
# JSON fields with this values will be omitted
EMPTY_VALUES = "", [], {}
class Column:
# ...
class Folder:
# ...
class Universe(Folder):
pass
class CustomEncoder(json.JSONEncoder):
# ...
if __name__ == '__main__':
with open('dummy_json_data.json', 'r') as f_in, open('output.json', 'w') as f_out:
universes = [Universe.from_old_syntax(name, data)
for name, data in json.load(f_in).items()]
json.dump(universes, f_out, cls=CustomEncoder, indent=4)

Dynamically subclass an Enum base class

I've set up a metaclass and base class pair for creating the line specifications of several different file types I have to parse.
I have decided to go with using enumerations because many of the individual parts of the different lines in the same file often have the same name. Enums make it easy to tell them apart. Additionally, the specification is rigid and there will be no need to add more members, or extend the line specifications later.
The specification classes work as expected. However, I am having some trouble dynamically creating them:
>>> C1 = LineMakerMeta('C1', (LineMakerBase,), dict(a = 0))
AttributeError: 'dict' object has no attribute '_member_names'
Is there a way around this? The example below works just fine:
class A1(LineMakerBase):
Mode = 0, dict(fill=' ', align='>', type='s')
Level = 8, dict(fill=' ', align='>', type='d')
Method = 10, dict(fill=' ', align='>', type='d')
_dummy = 20 # so that Method has a known length
A1.format(**dict(Mode='DESIGN', Level=3, Method=1))
# produces ' DESIGN 3 1'
The metaclass is based on enum.EnumMeta, and looks like this:
import enum
class LineMakerMeta(enum.EnumMeta):
"Metaclass to produce formattable LineMaker child classes."
def _iter_format(cls):
"Iteratively generate formatters for the class members."
for member in cls:
yield member.formatter
def __str__(cls):
"Returns string line with all default values."
return cls.format()
def format(cls, **kwargs):
"Create formatted version of the line populated by the kwargs members."
# build resulting string by iterating through members
result = ''
for member in cls:
# determine value to be injected into member
try:
try:
value = kwargs[member]
except KeyError:
value = kwargs[member.name]
except KeyError:
value = member.default
value_str = member.populate(value)
result = result + value_str
return result
And the base class is as follows:
class LineMakerBase(enum.Enum, metaclass=LineMakerMeta):
"""A base class for creating Enum subclasses used for populating lines of a file.
Usage:
class LineMaker(LineMakerBase):
a = 0, dict(align='>', fill=' ', type='f'), 3.14
b = 10, dict(align='>', fill=' ', type='d'), 1
b = 15, dict(align='>', fill=' ', type='s'), 'foo'
# ^-start ^---spec dictionary ^--default
"""
def __init__(member, start, spec={}, default=None):
member.start = start
member.spec = spec
if default is not None:
member.default = default
else:
# assume value is numerical for all provided types other than 's' (string)
default_or_set_type = member.spec.get('type','s')
default = {'s': ''}.get(default_or_set_type, 0)
member.default = default
#property
def formatter(member):
"""Produces a formatter in form of '{0:<format>}' based on the member.spec
dictionary. The member.spec dictionary makes use of these keys ONLY (see
the string.format docs):
fill align sign width grouping_option precision type"""
try:
# get cached value
return '{{0:{}}}'.format(member._formatter)
except AttributeError:
# add width to format spec if not there
member.spec.setdefault('width', member.length if member.length != 0 else '')
# build formatter using the available parts in the member.spec dictionary
# any missing parts will simply not be present in the formatter
formatter = ''
for part in 'fill align sign width grouping_option precision type'.split():
try:
spec_value = member.spec[part]
except KeyError:
# missing part
continue
else:
# add part
sub_formatter = '{!s}'.format(spec_value)
formatter = formatter + sub_formatter
member._formatter = formatter
return '{{0:{}}}'.format(formatter)
def populate(member, value=None):
"Injects the value into the member's formatter and returns the formatted string."
formatter = member.formatter
if value is not None:
value_str = formatter.format(value)
else:
value_str = formatter.format(member.default)
if len(value_str) > len(member) and len(member) != 0:
raise ValueError(
'Length of object string {} ({}) exceeds available'
' field length for {} ({}).'
.format(value_str, len(value_str), member.name, len(member)))
return value_str
#property
def length(member):
return len(member)
def __len__(member):
"""Returns the length of the member field. The last member has no length.
Length are based on simple subtraction of starting positions."""
# get cached value
try:
return member._length
# calculate member length
except AttributeError:
# compare by member values because member could be an alias
members = list(type(member))
try:
next_index = next(
i+1
for i,m in enumerate(type(member))
if m.value == member.value
)
except StopIteration:
raise TypeError(
'The member value {} was not located in the {}.'
.format(member.value, type(member).__name__)
)
try:
next_member = members[next_index]
except IndexError:
# last member defaults to no length
length = 0
else:
length = next_member.start - member.start
member._length = length
return length
This line:
C1 = enum.EnumMeta('C1', (), dict(a = 0))
fails with exactly the same error message. The __new__ method of EnumMeta expects an instance of enum._EnumDict as its last argument. _EnumDict is a subclass of dict and provides an instance variable named _member_names, which of course a regular dict doesn't have. When you go through the standard mechanism of enum creation, this all happens correctly behind the scenes. That's why your other example works just fine.
This line:
C1 = enum.EnumMeta('C1', (), enum._EnumDict())
runs with no error. Unfortunately, the constructor of _EnumDict is defined as taking no arguments, so you can't initialize it with keywords as you apparently want to do.
In the implementation of enum that's backported to Python3.3, the following block of code appears in the constructor of EnumMeta. You could do something similar in your LineMakerMeta class:
def __new__(metacls, cls, bases, classdict):
if type(classdict) is dict:
original_dict = classdict
classdict = _EnumDict()
for k, v in original_dict.items():
classdict[k] = v
In the official implementation, in Python3.5, the if statement and the subsequent block of code is gone for some reason. Therefore classdict must be an honest-to-god _EnumDict, and I don't see why this was done. In any case the implementation of Enum is extremely complicated and handles a lot of corner cases.
I realize this is not a cut-and-dried answer to your question but I hope it will point you to a solution.
Create your LineMakerBase class, and then use it like so:
C1 = LineMakerBase('C1', dict(a=0))
The metaclass was not meant to be used the way you are trying to use it. Check out this answer for advice on when metaclass subclasses are needed.
Some suggestions for your code:
the double try/except in format seems clearer as:
for member in cls:
if member in kwargs:
value = kwargs[member]
elif member.name in kwargs:
value = kwargs[member.name]
else:
value = member.default
this code:
# compare by member values because member could be an alias
members = list(type(member))
would be clearer with list(member.__class__)
has a false comment: listing an Enum class will never include the aliases (unless you have overridden that part of EnumMeta)
instead of the complicated __len__ code you have now, and as long as you are subclassing EnumMeta you should extend __new__ to automatically calculate the lengths once:
# untested
def __new__(metacls, cls, bases, clsdict):
# let the main EnumMeta code do the heavy lifting
enum_cls = super(LineMakerMeta, metacls).__new__(cls, bases, clsdict)
# go through the members and calculate the lengths
canonical_members = [
member
for name, member in enum_cls.__members__.items()
if name == member.name
]
last_member = None
for next_member in canonical_members:
next_member.length = 0
if last_member is not None:
last_member.length = next_member.start - last_member.start
The simplest way to create Enum subclasses on the fly is using Enum itself:
>>> from enum import Enum
>>> MyEnum = Enum('MyEnum', {'a': 0})
>>> MyEnum
<enum 'MyEnum'>
>>> MyEnum.a
<MyEnum.a: 0>
>>> type(MyEnum)
<class 'enum.EnumMeta'>
As for your custom methods, it might be simpler if you used regular functions, precisely because Enum implementation is so special.

Dynamically adding nested dictionaries

I want to dynamically add values in a nested dictionary. I am trying to cache similarity score of two words with their part-of-speech-tag.
In short I want to store values as this;
synset_cache[word1][word1_tag][word2][word2_tag] = score
class MyClass(Object):
def __init__(self):
MyClass.synset_cache={} #dict
def set_cache(self,word1, word1_tag, word2, word2_tag, score)
try:
MyClass.synset_cache[word1]
except:
MyClass.synset_cache[word1]={} #create new dict
try:
MyClass.synset_cache[word1][word1_tag]
except:
MyClass.synset_cache[word1][word1_tag]={} #create new dict
try:
MyClass.synset_cache[word1][word1_tag][word2]
except:
MyClass.synset_cache[word1][word1_tag][word2]={} #create new dict
#store the value
MyClass.synset_cache[word1][word1_tag][word2][word2_tag] = score
But I am getting this error.
Type error: list indices must be integers, not unicode
Line number it shows is at MyClass.synset_cache[word1][word1_tag]={} #create new dict.
How can I get this working?
EDIT:
According to the #Robᵩ's comments on his answer; I was assigning a list to this MyClass.synset_cache in another method(note it is at the class-level). So this code part had no errors.
Use dict.setdefault.
This might work:
#UNTESTED
d = MyClass.synset_cache.setdefault(word1, {})
d = d.setdefault(word1_tag, {})
d = d.setdefault(word2, {})
d[word2_tag] = score
Alternatively, you can use this handy recursive defaultdict that springs up new levels of dict automatically. (See: here and here.)
import collections
def tree():
return collections.defaultdict(tree)
class MyClass(Object):
def __init__(self):
MyClass.synset_cache=tree()
def set_cache(self,word1, word1_tag, word2, word2_tag, score)
MyClass.synset_cache[word1][word1_tag][word2][word2_tag] = score
This will be data dependent, as at least for some test data (see below), the code does not produce that error. How are you calling it?
Also, note that as written above, it won't compile due to some syntax errors (i.e. no colon to end the def set_cache line).
Below is some tweaked-to-compile code with some example calling data and how that pretty-prints:
#!/usr/bin/env python
import pprint
class MyClass():
def __init__(self):
MyClass.synset_cache={} #dict
def set_cache(self,word1, word1_tag, word2, word2_tag, score):
try:
MyClass.synset_cache[word1]
except:
MyClass.synset_cache[word1]={} #create new dict
try:
MyClass.synset_cache[word1][word1_tag]
except:
MyClass.synset_cache[word1][word1_tag]={} #create new dict
try:
MyClass.synset_cache[word1][word1_tag][word2]
except:
MyClass.synset_cache[word1][word1_tag][word2]={} #create new dict
#store the value
MyClass.synset_cache[word1][word1_tag][word2][word2_tag] = score
x = MyClass()
x.set_cache('foo', 'foo-tag', 'bar', 'bar-tag', 100)
pp = pprint.PrettyPrinter(indent=4)
pp.pprint(x.synset_cache)
Which outputs:
{ 'foo': { 'foo-tag': { 'bar': { 'bar-tag': 100}}}}
A couple other things of note...
I'd recommend using the in style syntax to check for key presence rather than try-except. It's more compact and more Pythonic.
Also, your main variable, synset_cache, is class-level (i.e. static). Did you mean for that to be the case?

Categories