Python - changing values of a dictionary when using itervalues() - python

I'm wondering if it is OK to modify the values of a Python dictionary when it doesn't depend on the keys:
# d is some dictionary containing classes I wrote
for v in d.itervalues():
# modify v, and v's type may or may not change
I'm not sure what the Python standard says about this, could somebody please provide some information?
Thanks!

If you mean constructs like:
d = {1: [1]}
for v in d.itervalues():
v[0] += 1
then yes, this is completely safe. The dict just stores a reference to the object in question and does not touch it in any way other than storage and retrieval. This is not explicitly documented, but it is implicit in the definition of mapping (of which dict is a subtype):
A mapping object maps hashable values to arbitrary objects.
"Arbitrary" means the object may be mutable.

Related

In Python, why is a tuple hashable but not a list?

Here below when I try to hash a list, it gives me an error but works with a tuple. Guess it has something to do with immutability. Can someone explain this in detail ?
List
x = [1,2,3]
y = {x: 9}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Tuple
z = (5,6)
y = {z: 89}
print(y)
{(5, 6): 89}
Dicts and other objects use hashes to store and retrieve items really quickly. The mechanics of this all happens "under the covers" - you as the programmer don't need to do anything and Python handles it all internally. The basic idea is that when you create a dictionary with {key: value}, Python needs to be able to hash whatever you used for key so it can store and look up the value quickly.
Immutable objects, or objects that can't be altered, are hashable. They have a single unique value that never changes, so python can "hash" that value and use it to look up dictionary values efficiently. Objects that fall into this category include strings, tuples, integers and so on. You may think, "But I can change a string! I just go mystr = mystr + 'foo'," but in fact what this does is create a new string instance and assigns it to mystr. It doesn't modify the existing instance. Immutable objects never change, so you can always be sure that when you generate a hash for an immutable object, looking up the object by its hash will always return the same object you started with, and not a modified version.
You can try this for yourself: hash("mystring"), hash(('foo', 'bar')), hash(1)
Mutable objects, or objects that can be modified, aren't hashable. A list can be modified in-place: mylist.append('bar') or mylist.pop(0). You can't safely hash a mutable object because you can't guarantee that the object hasn't changed since you last saw it. You'll find that list, set, and other mutable types don't have a __hash__() method. Because of this, you can't use mutable objects as dictionary keys:
>>> hash([1,2,3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
Eric Duminil's answer provides a great example of the unexpected behaviour that arises from using mutable objects as dictionary keys
Here are examples why it might not be a good idea to allow mutable types as keys. This behaviour might be useful in some cases (e.g. using the state of the object as a key rather than the object itself) but it also might lead to suprising results or bugs.
Python
It's possible to use a numeric list as a key by defining __hash__ on a subclass of list :
class MyList(list):
def __hash__(self):
return sum(self)
my_list = MyList([1, 2, 3])
my_dict = {my_list: 'a'}
print(my_dict.get(my_list))
# a
my_list[2] = 4 # __hash__() becomes 7
print(next(iter(my_dict)))
# [1, 2, 4]
print(my_dict.get(my_list))
# None
print(my_dict.get(MyList([1,2,3])))
# None
my_list[0] = 0 # __hash_() is 6 again, but for different elements
print(next(iter(my_dict)))
# [0, 2, 4]
print(my_dict.get(my_list))
# 'a'
Ruby
In Ruby, it's allowed to use a list as a key. A Ruby list is called an Array and a dict is a Hash, but the syntax is very similar to Python's :
my_list = [1]
my_hash = { my_list => 'a'}
puts my_hash[my_list]
#=> 'a'
But if this list is modified, the dict doesn't find the corresponding value any more, even if the key is still in the dict :
my_list << 2
puts my_list
#=> [1,2]
puts my_hash.keys.first
#=> [1,2]
puts my_hash[my_list]
#=> nil
It's possible to force the dict to calculate the key hashes again :
my_hash.rehash
puts my_hash[my_list]
#=> 'a'
A hashset calculates the hash of an object and based on that hash, stores the object in the structure for fast lookup. As a result, by contract once an object is added to the dictionary, the hash is not allowed to change. Most good hash functions will depend on the number of elements and the elements itself.
A tuple is immutable, so after construction, the values cannot change and therefore the hash cannot change either (or at least a good implementation should not let the hash change).
A list on the other hand is mutable: one can later add/remove/alter elements. As a result the hash can change violating the contract.
So all objects that cannot guarantee a hash function that remains stable after the object is added, violate the contract and thus are no good candidates. Because for a lookup, the dictionary will first calculate the hash of the key, and determine the correct bucket. If the key is meanwhile changed, this could result in false negatives: the object is in the dictionary, but it can no longer be retrieved because the hash is different so a different bucket will be searched than the one where the object was originally added to.
I would like to add the following aspect as it's not covered by other answers already.
There's nothing wrong about making mutable objects hashable, it's just not unambiguous and this is why it needs to be defined and implemented consistently by the programmer himself (not by the programming language).
Note that you can implement the __hash__ method for any custom class which allows its instances to be stored in contexts where hashable types are required (such as dict keys or sets).
Hash values are usually used to decide if two objects represent the same thing. So consider the following example. You have a list with two items: l = [1, 2]. Now you add an item to the list: l.append(3). And now you must answer the following question: Is it still the same thing? Both - yes and no - are valid answers. "Yes", it is still the same list and "no", it has not the same content anymore.
So the answer to this question depends on you as the programmer and so it is up to you to manually implement hash methods for your mutable types.
Based on Python Glossary
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() method). Hashable objects which compare equal must have the same hash value.
All of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not.
Because a list is mutable, while a tuple is not. When you store the hash of a value in, for example, a dict, if the object changes, the stored hash value won't find out, so it will remain the same. The next time you look up the object, the dictionary will try to look it up by the old hash value, which is not relevant anymore.
To prevent that, python does not allow you to has mutable items.

Will dict __getitem__ create a copy of the corresponding object?

I saw following code from here.
d[key] = data # store data at key (overwrites old data if
# using an existing key)
data = d[key] # retrieve a COPY of data at key (raise KeyError if no
# such key)
I don't understand the meaning of doing so. It is said retrieve a COPY of data at key. Seems dict lookup (getitem, or indexing, which one is the proper term?) will make a cope of the object? Right?
You're seeing shelve module documentation.
shelve.open returns a dictionary-like object, not a dictionary. It does not load all key-value pair at once; so comments in the example make sense.
Ordinarily, dict lookup returns the value stored at the key, not a copy of the value. This is important for mutable objects. For instance:
A = dict()
A["a"] = ["Hello", "world"] # Stores a 2-element list in the dict, at key "a"
B = A["a"] # Gets the list that was just stored
B[0] = "Goodbye" # Changes the first element of the list
print(A["a"][0]) # Prints "Goodbye"
In contrast, shelve will return a copy of the value stored with the key, so changing the returned value will not change the shelved value.
You are confusing implementation (i.e. what __getitem__ does for one specific type of object) for a specification (i.e. a prescription for what __getitem__ should do all the time).
__getitem__ just implements syntactic sugar around x[i] - it places no demands on how that is actually done. x[i] could just return the value associated with i in a dictionary. It could return a copy. It could cause way more side effects - i.e. it could cause files to be created/deleted, databases to be connected/disconnected, objects to be created/deleted, etc.
For dict, __getitem__ is defined to return the original object. But you shouldn't assume those semantics will apply for all other objects that implement it - you will be disappointed. When in doubt, you are doing the right thing - check the docs.

Why use dict.keys?

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?
On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.
Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.
Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().
key in dict
is much faster than checking
key in dict.keys()

Python: Does a dict value pointer store its key?

I'm wondering if there is a built-in way to do this... Take this simple code for example:
D = {'one': objectA(), 'two': objectB(), 'three': objectC()}
object_a = D['one']
I believe object_a is just pointing at the objectA() created on the first line, and knows nothing about the dictionary D, but my question is, does Python store the Key of the dictionary value? Is there a way to get the Key 'one' if all you have is the variable object_a (without looping over the dictionary, of course)?
If not, I can store the value 'one' inside objectA(), but I'm just curious if Python already stores that info.
I think no.
Consider the case of adding a single object to a (large) number of different dictionaries. It would become quite expensive for Python to track that for you, it would cost a lot for a feature not used by most.
The dict mapping is not trivially "reversible" as you describe.
The key must be immutable. It must be immutable so that it can be hashed for lookup and not suffer spontaneous changes.
The value does not have to be immutable, it is not hashed for quick lookup.
You cannot simply go from value back to key without (1) creating an immutable value and (2) populating some other kind of mapping with the "reversed" value -> key mapping.
Is there a way to get the Key 'one' if
all you have is the variable object_a
(without looping over the dictionary,
of course)?
No, Python imposes no such near-useless redundancy on you. If objA is a factory callable:
d = {'zap': objA()}
a = d['zap']
and
b = objA()
just as well as
L = [objA()]
c = L[0]
all result in exactly the same kind of references in a, b and c, to exactly equivalent objects (if that's what objA gives you in the first place), without one bit wasted (neither in said objects nor in any redundant and totally hypothetical auxiliary structure) to record "this is/was a value in list L and/or dict d at these index/key" ((or indices/keys since of cource there could be many)).
Like others have said, there is no built-in way to do this, since it takes up memory and is not usually needed.
If not, I can store the value 'one' inside objectA(), but I'm just curious if Python already stores that info.
Just wanted to add that it should be pretty easy to add a more general solution which does this automatically. For example:
def MakeDictReversible(dict):
for k, v in dict.iteritems():
v.dict_key = k
This function just embeds every object in the dictionary with a member "dict_key", which is the dictionary key used to store the object.
Of course, this code can only work once (i.e., run this on two different dictionaries which share an object, and the object's "dict_key" member will be overwritten by the second dictionary).

Is there a way to set multiple defaults on a Python dict using another dict?

Suppose I've got two dicts in Python:
mydict = { 'a': 0 }
defaults = {
'a': 5,
'b': 10,
'c': 15
}
I want to be able to expand mydict using the default values from defaults, such that 'a' remains the same but 'b' and 'c' are filled in. I know about dict.setdefault() and dict.update(), but each only do half of what I want - with dict.setdefault(), I have to loop over each variable in defaults; but with dict.update(), defaults will blow away any pre-existing values in mydict.
Is there some functionality I'm not finding built into Python that can do this? And if not, is there a more Pythonic way of writing a loop to repeatedly call dict.setdefaults() than this:
for key in defaults.keys():
mydict.setdefault(key, defaults[key])
Context: I'm writing up some data in Python that controls how to parse an XML tree. There's a dict for each node (i.e., how to process each node), and I'd rather the data I write up be sparse, but filled in with defaults. The example code is just an example... real code has many more key/value pairs in the default dict.
(I realize this whole question is but a minor quibble, but it's been bothering me, so I was wondering if there was a better way to do this that I am not aware of.)
Couldnt you make mydict be a copy of default, That way, mydict would have all the correct values to start with?
mydict = default.copy()
If you don't mind creating a new dictionary in the process, this will do the trick:
newdict = dict(defaults)
newdict.update(mydict)
Now newdict contains what you need.
Since Python 3.9, you can do:
mydict = defaults | mydict
I found this solution to be the most elegant in my usecase.
You can do this the same way Python's collections.DefaultDict works:
class MultiDefaultDict(dict):
def __init__(self, defaults, **kwargs):
self.defaults = defaults
self.update(kwargs)
def __missing__(self, key):
return self.defaults[key]
>>> mydict2 = MultiDefaultDict(defaults, a=0)
>>> mydict2['a']
0
>>> mydict2['b']
10
>>> mydict2
{'a': 0}
The other solutions posted so far duplicate all the default values; this one shares them, as requested. You may or may not want to override other dict methods like __contains__(), __iter__(), items(), keys(), values() -- this class as defined here iterates over the non-default items only.
defaults.update(mydict)
Personally I like to append the dictionary object. It works mostly like a dictionary except that you have to create the object first.
class d_dict(dict):
'Dictionary object with easy defaults.'
def __init__(self,defaults={}):
self.setdefault(defaults)
def setdefault(self,defaults):
for key, value in defaults.iteritems():
if not key in self:
dict.__setitem__(self,key,value)
This provides the exact same functionality as the dict type except that it overrides the setdefault() method and will take a dictionary containing one or more items. You can set the defaults at creation.
This is just a personal preference. As I understand all that dict.setdefault() does is set the items which haven't been set yet. So probably the simplest in place option is:
new_dict = default_dict.copy()
new_dict.update({'a':0})
However, if you do this more than once you might make a function out of this. At this point it may just be easier to use a custom dict object, rather than constantly adding defaults to your dictionaries.

Categories