Well, the question is in the title: how do I define a python dictionary with immutable keys but mutable values? I came up with this (in python 2.x):
class FixedDict(dict):
"""
A dictionary with a fixed set of keys
"""
def __init__(self, dictionary):
dict.__init__(self)
for key in dictionary.keys():
dict.__setitem__(self, key, dictionary[key])
def __setitem__(self, key, item):
if key not in self:
raise KeyError("The key '" +key+"' is not defined")
dict.__setitem__(self, key, item)
but it looks to me (unsurprisingly) rather sloppy. In particular, is this safe or is there the risk of actually changing/adding some keys, since I'm inheriting from dict?
Thanks.
Consider proxying dict instead of subclassing it. That means that only the methods that you define will be allowed, instead of falling back to dict's implementations.
class FixedDict(object):
def __init__(self, dictionary):
self._dictionary = dictionary
def __setitem__(self, key, item):
if key not in self._dictionary:
raise KeyError("The key {} is not defined.".format(key))
self._dictionary[key] = item
def __getitem__(self, key):
return self._dictionary[key]
Also, you should use string formatting instead of + to generate the error message, since otherwise it will crash for any value that's not a string.
The problem with direct inheritance from dict is that it's quite hard to comply with the full dict's contract (e.g. in your case, update method won't behave in a consistent way).
What you want, is to extend the collections.MutableMapping:
import collections
class FixedDict(collections.MutableMapping):
def __init__(self, data):
self.__data = data
def __len__(self):
return len(self.__data)
def __iter__(self):
return iter(self.__data)
def __setitem__(self, k, v):
if k not in self.__data:
raise KeyError(k)
self.__data[k] = v
def __delitem__(self, k):
raise NotImplementedError
def __getitem__(self, k):
return self.__data[k]
def __contains__(self, k):
return k in self.__data
Note that the original (wrapped) dict will be modified, if you don't want that to happen, use copy or deepcopy.
How you prevent someone from adding new keys depends entirely on why someone might try to add new keys. As the comments state, most dictionary methods that modify the keys don't go through __setitem__, so a .update() call will add new keys just fine.
If you only expect someone to use d[new_key] = v, then your __setitem__ is fine. If they might use other ways to add keys, then you have to put in more work. And of course, they can always use this to do it anyway:
dict.__setitem__(d, new_key, v)
You can't make things truly immutable in Python, you can only stop particular changes.
Related
I want to make a simple wrapper to the standard python dictionary, or maybe the defaultdict class where there is a default value.
The change I want to make is very simple: I would like to store in the dictionary data-structures that are not hashable due to the possibility of mutation, but I have guarantee in my code that I won't ever mutate them anyways.
My approach is detect if the key to the dictionary is hashable, if so proceed as usual. However if the key is not hashable, turn it into a string first then proceed as usual.
You can inherit from dict and override __setitem__ method:
class CustomDict(dict):
def __setitem__(self, key, value):
try:
hash(key)
except TypeError:
key = str(key)
super(CustomDict, self).__setitem__(key, value)
def __getitem__(self, key):
try:
hash(key)
except TypeError:
key = str(key)
return super(CustomDict, self).__getitem__(key)
data = CustomDict()
data["x"] = True
data[dict(foo='bar')] = False
print(data)
>>> {'x': True, "{'foo': 'bar'}": False}
assert data[dict(foo='bar')] == False
Or you can create custom dict-like object as described here.
Let's say I have a dictionary like this:
my_dict = {
hey: onevalue,
hat: twovalue,
how: threevalue
}
Is there any way to account for a reference to a key while having a variable, say a number in it?
such as my_dict['hey1'] = onevalue
or my_dict['hey2'] = onevalue
I'm trying to allow for such a variable instead of stripping the numbers from the reference to the key. Thank you in advance.
There is not. Due to how dicts work (as a hash table), they have no way to accommodate what you want. You will have to create your own type and define the __*item__() methods to implement your logic.
Building on this answer, you can define your own dict compatible type to strip trailing digits;
import collections
class TransformedDict(collections.MutableMapping):
"""A dictionary which applies an arbitrary key-altering function before accessing the keys"""
def __init__(self, *args, **kwargs):
self.store = dict()
self.update(dict(*args, **kwargs)) # use the free update to set keys
def __getitem__(self, key):
return self.store[self.__keytransform__(key)]
def __setitem__(self, key, value):
self.store[self.__keytransform__(key)] = value
def __delitem__(self, key):
del self.store[self.__keytransform__(key)]
def __iter__(self):
return iter(self.store)
def __len__(self):
return len(self.store)
def __keytransform__(self, key):
for i in xrange(len(key)-1,-1,-1):
if not key[i].isdigit():
break
return key[:i]
a = TransformedDict()
a['hello'] = 5
print a['hello123']
>>> 5
I am doing a really simple API to test respire a SPORE client generation for python.
In WSGI, what would be the best way to keep data throught the server?
I try to make a RedisDict that way:
import json
from redis import Redis
redis = Redis()
class RedisDict:
"""A redis based dict."""
def dict(self):
TODOS = redis.get('TODOS')
return json.loads(TODOS)
def keys(self):
return self.dict().keys()
def __getitem__(self, key):
return self.dict()[key]
def __setitem__(self, key, value):
obj = self.dict()
obj[key] = value
redis.set('TODOS', json.dumps(obj))
def __delitem__(self, key):
obj = self.dict()
del obj[key]
redis.set('TODOS', json.dumps(obj))
todos = RedisDict()
How can I make dict(todos) to return a dict?
Is that enough in a WSGI environment ?
Assuming that method dict returns a dictionary, why not just do this:
dict_i_wanted = todos.dict()
If you must support dict_i_wanted = dict(todos) then add this method:
def __iter__(self):
return self.dict().iteritems()
If you want to make your own "dict-like" class, you need to implement the dictionary protocol. The easiest way would be inheriting from collections.Mapping and implementing the methods that the table mentions are abstract. You need to implement those to behave the same as the corresponding dict methods; e.g. __iter__ should return an iterable over all (key, value) tuples. (This is the one method you need to make dict(todos) work. There is no magic method to coerce to a dict like you tried to do with your dict() method.)
Is there any equivalent to KeyedCollection in Python, i.e. a set where the elements have (or dynamically generate) their own keys?
i.e. the goal here is to avoid storing the key in two places, and therefore dictionaries are less than ideal (hence the question).
You can simulate that very easily:
class KeyedObject(object):
def get_key(self):
raise NotImplementedError("You must subclass this before you can use it.")
class KeyedDict(dict):
def append(self, obj):
self[obj.get_key()] = obj
Now you can use a KeyedDict instead of dict with subclasses of KeyedObject (where get_key return a valid key based on some object property).
Given your constraints, everyone trying to implement what you're looking for using a dict is barking up the wrong tree. Instead, you should write a list subclass that overrides __getitem__ to provide the behavior you want. I've written it so it tries to get the desired item by index first, then falls back to searching for the item by the key attribute of the contained objects. (This could be a property if the object needs to determine this dynamically.)
There's no way to avoid a linear search if you don't want to duplicate something somewhere; I am sure the C# implementation does exactly the same thing if you don't allow it to use a dictionary to store the keys.
class KeyedCollection(list):
def __getitem__(self, key):
if isinstance(key, int) or isinstance(key, slice):
return list.__getitem__(key)
for item in self:
if getattr(item, "key", 0) == key:
return item
raise KeyError('item with key `%s` not found' % key)
You would probably also want to override __contains__ in a similar manner so you could say if "key" in kc.... If you want to make it even more like a dict, you could also implement keys() and so on. They will be equally inefficient, but you will have an API like a dict, that also works like a list.
#Mehrdad said:
Because semantically, it doesn't make as much sense. When an object
knows its key, it doesn't make sense to put it in a dictionary -- it's
not a key-value pair. It's more of a semantic issue than anything
else.
With this constraint, there is nothing in Python that does what you want. I suggest you use a dict and not worry about this level of detail on the semantics. #Gabi Purcaru's answer shows how you can create an object with the interface you want. Why get bothered about how it's working internally?
It could be that C#'s KeyedCollection is doing the same thing under the covers: asking the object for its key and then storing the key for fast access. In fact, from the docs:
By default, the KeyedCollection(Of TKey, TItem) includes a lookup
dictionary that you can obtain with the Dictionary property. When an
item is added to the KeyedCollection(Of TKey, TItem), the item's key
is extracted once and saved in the lookup dictionary for faster
searches. This behavior is overridden by specifying a dictionary
creation threshold when you create the KeyedCollection(Of TKey,
TItem). The lookup dictionary is created the first time the number of
elements exceeds that threshold. If you specify –1 as the threshold,
the lookup dictionary is never created.
I'm not much of a C#'er, but I think dictionaries is what you need.
http://docs.python.org/tutorial/datastructures.html#dictionaries
http://docs.python.org/tutorial/datastructures.html
Or maybe lists:
http://docs.python.org/library/functions.html#list
Why not simply use a dict? If the key already exists, a reference to the key will be used in the dict; it won't be senselessly duplicated.
class MyExample(object):
def __init__(self, key, value):
self.key = key
self.value = value
m = MyExample("foo", "bar")
d = {}
d[m.key] = m
first_key = d.keys()[0]
first_key is m.key # returns True
If the key doesn't already exist, a copy of it will be saved, but I don't see that as a problem.
def lame_hash(s):
h = 0
for ch in s:
h ^= ord(ch)
return h
d = {}
d[lame_hash(m.key)] = m
print d # key value is 102 which is stored in the dict
lame_hash(m.key) in d # returns True
I'm not sure if this is what you meant, but this dictionary will create it's own keys as you add to it...
class KeyedCollection(dict):
def __init__(self):
self.current_key = 0
def add(self, item):
self[self.current_key] = item
abc = KeyedCollection()
abc.add('bob')
abc.add('jane')
>>> abc
{0: 'bob', 1: 'jane'}
How about a set()? The elements can have their own k
To go a little more in detail that the already correct answer from #Gabi Purcaru's answer, here a class that do the same as gabi one's but that also check for correct given type on key and value (as the TKey and TValue of the .net KeyedCollection).
class KeyedCollection(MutableMapping):
"""
Provides the abstract base class for a collection (:class:`MutableMappinp`) whose keys are embedded in the values.
"""
__metaclass__ = abc.ABCMeta
_dict = None # type: dict
def __init__(self, seq={}):
self._dict = dict(seq)
#abc.abstractmethod
def __is_type_key_correct__(self, key):
"""
Returns: The type of keys in the collection
"""
pass
#abc.abstractmethod
def __is_type_value_correct__(self, value):
"""
Returns: The type of values in the collection
"""
pass
#abc.abstractmethod
def get_key_for_item(self, value):
"""
When implemented in a derivated class, extracts the key from the specified element.
Args:
value: the element from which to extract the key (of type specified by :meth:`type_value`)
Returns: The key of specified element (of type specified by :meth:`type_key`)
"""
pass
def __assert_type_key(self, key, arg_name='key'):
if not self.__is_type_key_correct__(key) :
raise ValueError("{} type is not correct".format(arg_name))
def __assert_type_value(self, value, arg_name='value'):
if not self.__is_type_value_correct__(value) :
raise ValueError("{} type is not correct".format(arg_name))
def add(self, value):
"""
Adds an object to the KeyedCollection.
Args:
value: The object to be added to the KeyedCollection (of type specified by :meth:`type_value`).
"""
key = self.get_key_for_item(value)
self._dict[key] = value
# Implements abstract method __setitem__ from MutableMapping parent class
def __setitem__(self, key, value):
self.__assert_type_key(key)
self.__assert_type_value(value)
if value.get_key() != key:
raise ValueError("provided key does not correspond to the given KeyedObject value")
self._dict[key] = value
# Implements abstract method __delitem__ from MutableMapping parent class
def __delitem__(self, key):
self.__assert_type_key(key)
self._dict.pop(key)
# Implements abstract method __getitem__ from MutableMapping parent class (Mapping base class)
def __getitem__(self, key):
self.__assert_type_key(key)
return self._dict[key]
# Implements abstract method __len__ from MutableMapping parent class (Sized mixin on Mapping base class)
def __len__(self):
return len(self._dict)
# Implements abstract method __iter__ from MutableMapping parent class (Iterable mixin on Mapping base class)
def __iter__(self):
return iter(self._dict)
pass
# Implements abstract method __contains__ from MutableMapping parent class (Container mixin on Mapping base class)
def __contains__(self, x):
self.__assert_type_key(x, 'x')
return x in self._dict
I have an algorithm in python which creates measures for pairs of values, where m(v1, v2) == m(v2, v1) (i.e. it is symmetric). I had the idea to write a dictionary of dictionaries where these values are stored in a memory-efficient way, so that they can easily be retrieved with keys in any order. I like to inherit from things, and ideally, I'd love to write a symmetric_dict where s_d[v1][v2] always equals s_d[v2][v1], probably by checking which of the v's is larger according to some kind of ordering relation and then switching them around so that the smaller element one is always mentioned first. i.e. when calling s_d[5][2] = 4, the dict of dicts will turn them around so that they are in fact stored as s_d[2][5] = 4, and the same for retrieval of the data.
I'm also very open for a better data structure, but I'd prefer an implementation with "is-a" relationship to something which just uses a dict and preprocesses some function arguments.
You could use a frozenset as the key for your dict:
>>> s_d = {}
>>> s_d[frozenset([5,2])] = 4
>>> s_d[frozenset([2,5])]
4
It would be fairly straightforward to write a subclass of dict that took iterables as key arguments and then turned then into a frozenset when storing values:
class SymDict(dict):
def __getitem__(self, key):
return dict.__getitem__(self, frozenset(key))
def __setitem__(self, key, value):
dict.__setitem__(self, frozenset(key), value)
Which gives you:
>>> s_d = SymDict()
>>> s_d[5,2] = 4
>>> s_d[2,5]
4
Doing it with nested indexing as shown will be extremely difficult. It's better to use a tuple as the key instead. That way the tuple can be sorted and an encapsulated dict can be accessed for the value.
d[2, 5] = 4
print d[5, 2]
Here's a slightly different approach that looks promising. Although the SymDict class isn't a dict subclass, it mostly behaves like one, and there's only a single private dictionary involved. I think one interesting feature is that fact that it preserves the natural [][] lookup syntax you seemed to want.
class SymDict(object):
def __init__(self, *args, **kwrds):
self._mapping = _SubSymDict(*args, **kwrds)
def __getitem__(self, key1):
self._mapping.set_key1(key1)
return self._mapping
def __setitem__(self, key1, value):
raise NotImplementedError
def __str__(self):
return '_mapping: ' + self._mapping.__str__()
def __getattr__(self, name):
return getattr(self._mapping, name)
class _SubSymDict(dict):
def __init__(self, *args, **kwrds):
dict.__init__(self, *args, **kwrds)
def set_key1(self, key1):
self.key1 = key1
def __getitem__(self, key2):
return dict.__getitem__(self, frozenset((self.key1, key2)))
def __setitem__(self, key2, value):
dict.__setitem__(self, frozenset((self.key1, key2)), value)
symdict = SymDict()
symdict[2][4] = 24
symdict[4][2] = 42
print 'symdict[2][4]:', symdict[2][4]
# symdict[2][4]: 42
print 'symdict[4][2]:', symdict[4][2]
# symdict[4][2]: 42
print 'symdict:', symdict
# symdict: _mapping: {frozenset([2, 4]): 42}
print symdict.keys()
# [frozenset([2, 4])]
Just as an alternative to Dave Webb's frozenset, why not do a SymDict like the following:
class SymDict(dict):
def __getitem__(self, key):
return dict.__getitem__(self, key if key[0] < key[1] else (key[1],key[0]))
def __setitem__(self, key, value):
dict.__setitem__(self, key if key[0] < key[1] else (key[1],key[0]), value)
From a quick test, this is more than 10% faster for getting and setting items than using a frozenset. Anyway, just another idea. However, it is less adaptable than the frozenset as it is really only set up to be used with tuples of length 2. As far as I can tell from the OP, that doesn't seem to be an issue here.
Improving on Justin Peel's solution, you need to add __delitem__ and __contains__ methods for a few more dictionary operations to work. So, for completeness,
class SymDict(dict):
def __getitem__(self, key):
return dict.__getitem__(self, key if key[0] < key[1] else (key[1],key[0]))
def __setitem__(self, key, value):
dict.__setitem__(self, key if key[0] < key[1] else (key[1],key[0]), value)
def __delitem__(self, key):
return dict.__delitem__(self, key if key[0] < key[1] else (key[1],key[0]))
def __contains__(self, key):
return dict.__contains__(self, key if key[0] < key[1] else (key[1],key[0]))
So then
>>> s_d = SymDict()
>>> s_d[2,5] = 4
>>> s_d[5,2]
4
>>> (5,2) in s_d
True
>>> del s_d[5,2]
>>> s_d
{}
I'm not sure, though, whether that covers all the bases, but it was good enough for my own code.
An obvious alternative is to use a (v1,v2) tuple as the key into a single standard dict, and insert both (v1,v2) and (v2,v1) into the dictionary, making them refer to the same object on the right-hand side.
I'd extract the function for more readability(for patvarilly answer)
class SymDict(dict):
def __getitem__(self, key):
return dict.__getitem__(self, self.symm(key))
def __setitem__(self, key, value):
dict.__setitem__(self, self.symm(key), value)
def __delitem__(self, key):
return dict.__delitem__(self, self.symm(key))
def __contains__(self, key):
return dict.__contains__(self, self.symm(key))
#staticmethod
def symm(key):
return key if key[0] < key[1] else (key[1], key[0]).