Default dict with a non-trivial default

Default dict with a non-trivial default - python

I want to create a "default dict" that performs a non-trivial operation on the missing key (like a DB lookup, for example). I've seen some old answers on here, like Using the key in collections.defaultdict, that recommend subclassing collections.defaultdict.
While this makes sense, is there a reason to use defaultdict at this point. Why not simply subclass dict and override its __missing__ method instead? Does defaultdict provide something else that I'd gain by subclassing it?

What does defaultdict add?
According to the documentation, the only difference between a defaultdict and a built-in dict is:
It overrides one method and adds one writable instance variable.
The one method is the __missing__ method which is called when a key that is not present is accessed.
And the one writable instance variable is the default_factory - a callable with no arguments used by __missing__ to determine the default value to be used with missing keys.
Roughly equivalent to:
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = self.default_factory()
return self[key]
When to inherit at all?
It is important to make it clear that the only reason you would even need to create a subclass is if your default value for missing keys is dependent of the actual key. If your default factory doesn't need to key - no matter how complicated the logic is, you can just use defaultdict instead of inheriting from it. If the logic is too much for a lambda, you can still of course create a function and use it:
def calc():
# very long code
# calculating a static new key
# (maybe a DB request to fetch the latest record...)
return new_value
d = defaultdict(calc)
If you actually need the key itself for the calculation of the default value, then you need to inherit:
When to inherit from defaultdict?
The main advantage is if you want to be able to have a dynamic factory (i.e. change the default_factory during runtime) this saves you the bother of implementing that yourself (no need to override __init__...).
But, note that this means you will have to take in account the existence of this default_factory when you override __missing__, as can be seen in this answer.
When to inherit from dict
When you don't care about dynamically changing the factory and can be satisfied with a static one throughout the existence of the dict.
In this case you simply override the __missing__ method and implement the factory with whatever complicated logic you have dependent of the key.

Related

Override dict class?

I'm trying to override dict class in a way that is compatible with standard dict class. How I can get access to parent dict attribute if I override __getitem__ method?
class CSJSON(dict):
def __getitem__(self, Key : str):
Key = Key + 'zzz' # sample of key modification for app use
return(super()[Key])
Then I receive error:
'super' object is not subscriptable.
If I use self[Key] then I get infinite recursive call of __getitem__.

You have to explicitly invoke __getitem__, syntax techniques like [Key] don't work on super() objects (because they don't implement __getitem__ at the class level, which is how [] is looked up when used as syntax):
class CSJSON(dict):
def __getitem__(self, Key : str):
Key = Key + 'zzz' # sample of key modification for app use
return super().__getitem__(Key)

Depending on your needs, working from collections.UserDict or abc.MutableMapping might be less painful than directly subclassing dict. There are some good discussions here about the options: 1, 2, 3
How I can get access to parent dict attribute if I override
getitem method?
More experienced users here seem to prefer MutableMapping, but UserDict provides a convenient solution to this part of your question by exposing a .data dict you can manipulate as a normal dict.

Pool of hashable objects

I've made a highly recursive, hashable (assumed immutable) datastructure. Thus it would be nice to have only one instance of each object (if objectA == objectB, then there is no reason not to have objectA is objectB).
I have tried solving it by defining a custom __new__(). It creates the requested object, then checks if it is in a dictionary (stored as a class variable). The object is added to the dict if necessary and then returned. If it is already in the dict, the version in the dict is returned and the newly created instance passes out of scope.
This solution works, but
I have to have a dict where the value at each key is the same object. What I really need is to extract an object from a set when I "show" the set an equal object. Is there a more elegant way of doing this?
Is there a builtin/canonical solution to my problem in Python? Such as a class I can inherit from or something....
My current implementation is along these lines:
class NoDuplicates(object):
pool = dict()
def __new__(cls, *args):
new_instance = object.__new__(cls)
new_instance.__init__(*args)
if new_instance in cls.pool:
return cls.pool[new_instance]
else:
cls.pool[new_instance] = new_instance
return new_instance
I am not a programmer by profession, so I suspect this corresponds to some well known technique or concept. The most similar concepts that come to mind are memoization and singleton.
One subtle problem with the above implementation is that __init__ is always called on the return value from __new__. I made a metaclass to modify this behaviour. But that ended up causing a lot of trouble since NoDuplicates also inherits from dict.

First, I would use a factory instead of overriding __new__. See Python's use of __new__ and __init__?.
Second, you can use tuples of arguments needed to create an object as dictionary keys (if same arguments produce same objects, of course), so you won't need to create an actual (expensive to create) object instance.

Python subclassing: adding properties

I have several classes where I want to add a single property to each class (its md5 hash value) and calculate that hash value when initializing objects of that class, but otherwise maintain everything else about the class. Is there any more elegant way to do that in python than to create a subclass for all the classes where I want to change the initialization and add the property?

You can add properties and override __init__ dynamically:
def newinit(self, orig):
orig(self)
self._md5 = #calculate md5 here
_orig_init = A.__init__
A.__init__ = lambda self: newinit(self, _orig_init)
A.md5 = property(lambda self: self._md5)
However, this can get quite confusing, even once you use more descriptive names than I did above. So I don't really recommend it.
Cleaner would probably be to simply subclass, possibly using a mixin class if you need to do this for multiple classes. You could also consider creating the subclasses dynamically using type() to cut down on the boilerplate further, but clarity of code would be my first concern.

When is the object() built-in useful?

I'm trying to figure out what I would use the object() built-in function for. It takes no arguments, and returns a "featureless object" of the type that is common to all Python classes, and has all the methods that are common to all Python classes.
To quote Jack Skellington, WHAT. IS. THIS?

Even if you do not need to program with it, object serves a purpose: it is the common class from which all other objects are derived. It is the last class listed by the mro (method resolution order) method. We need a name and object for this concept, and object serves this purpose.
Another use for object is to create sentinels.
sentinel = object()
This is often used in multithreaded programming -- passed through queues -- to signal a termination event. We might not want to send None or any other value since the queue handler may need to interpret those values as arguments to be processed. We need some unique value that no other part of the program may generate.
Creating a sentinel this way provides just such a unique object that is sure not to be a normal queue value, and thus can be tested for and used as a signal for some special event. There are other possibilities, such as creating a class, or class instance, or a function, but all those alternatives are bigger, more resource heavy, and not as pithy as object().

It is most useful if you are overriding the dot (especially __setattr__), it allows you to break recursion. For example:
class SomeClass(object):
def __setattr__(self, name, value):
if name not in ('attr1', 'attr2', 'attr3', 'attr4'):
object.__setattr__(self, name, value)
else:
do_something_else()

Can I just partially override setattr?

I'm imitating the behavior of the ConfigParser module to write a highly specialized parser that exploits some well-defined structure in the configuration files for a particular application I work with. The files follow the standard INI structure:
[SectionA]
key1=value1
key2=value2
[SectionB]
key3=value3
key4=value4
For my application, the sections are largely irrelevant; there is no overlap between keys from different sections and all the users only remember the key names, never which section they're supposed to go in. As such, I'd like to override __getattr__ and __setattr__ in the MyParser class I'm creating to allow shortcuts like this:
config = MyParser('myfile.cfg')
config.key2 = 'foo'
The __setattr__ method would first try to find a section called key2 and set that to 'foo' if it exists. Assuming there's no such section, it would look inside each section for a key called key2. If the key exists, then it gets set to the new value. If it doesn't exist, the parser would finally raise an AttributeError.
I've built a test implementation of this, but the problem is that I also want a couple straight-up attributes exempt from this behavior. I want config.filename to be a simple string containing the name of the original file and config.content to be the dictionary that holds the dictionaries for each section.
Is there a clean way to set up the filename and content attributes in the constructor such that they will avoid being overlooked by my custom getters and setters? Will python look for attributes in the object's __dict__ before calling the custom __setattr__?

pass filename, content to super class to handle it
class MyParser(object):
def __setattr__(self, k, v):
if k in ['filename', 'content']:
super(MyParser, self).__setattr__(k, v)
else:
# mydict.update(mynewattr) # dict handles other attrs

I think it might be cleaner to present a dictionary-like interface for the contents of the file and leave attribute access for internal purposes. However, that's just my opinion.
To answer your question, __setattr__() is called prior to checking in __dict__, so you can implement it as something like this:
class MyParser(object):
specials = ("filename", "content")
def __setattr__(self, attr, value):
if attr in MyParser.specials:
self.__dict__[attr] = value
else:
# Implement your special behaviour here

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.