What are dictionary view objects? - python

In python 2.7, we got the dictionary view methods available.
Now, I know the pro and cons of the following:
dict.items() (and values, keys): returns a list, so you can actually store the result, and
dict.iteritems() (and the like): returns a generator, so you can iterate over each value generated one by one.
What are dict.viewitems() (and the like) for? What are their benefits? How does it work? What is a view after all?
I read that the view is always reflecting the changes from the dictionary. But how does it behave from the perf and memory point of view? What are the pro and cons?

Dictionary views are essentially what their name says: views are simply like a window on the keys and values (or items) of a dictionary. Here is an excerpt from the official documentation for Python 3:
>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.keys()
>>> values = dishes.values()
>>> # view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> keys # No eggs anymore!
dict_keys(['sausage', 'bacon', 'spam'])
>>> values # No eggs value (2) anymore!
dict_values([1, 1, 500])
(The Python 2 equivalent uses dishes.viewkeys() and dishes.viewvalues().)
This example shows the dynamic character of views: the keys view is not a copy of the keys at a given point in time, but rather a simple window that shows you the keys; if they are changed, then what you see through the window does change as well. This feature can be useful in some circumstances (for instance, one can work with a view on the keys in multiple parts of a program instead of recalculating the current list of keys each time they are needed)—note that if the dictionary keys are modified while iterating over the view, how the iterator should behave is not well defined, which can lead to errors.
One advantage is that looking at, say, the keys uses only a small and fixed amount of memory and requires a small and fixed amount of processor time, as there is no creation of a list of keys (Python 2, on the other hand, often unnecessarily creates a new list, as quoted by Rajendran T, which takes memory and time in an amount proportional to the length of the list). To continue the window analogy, if you want to see a landscape behind a wall, you simply make an opening in it (you build a window); copying the keys into a list would correspond to instead painting a copy of the landscape on your wall—the copy takes time, space, and does not update itself.
To summarize, views are simply… views (windows) on your dictionary, which show the contents of the dictionary even after it changes. They offer features that differ from those of lists: a list of keys contain a copy of the dictionary keys at a given point in time, while a view is dynamic and is much faster to obtain, as it does not have to copy any data (keys or values) in order to be created.

Just from reading the docs I get this impression:
Views are "pseudo-set-like", in that they don't support indexing, so what you can do with them is test for membership and iterate over them (because keys are hashable and unique, the keys and items views are more "set-like" in that they don't contain duplicates).
You can store them and use them multiple times, like the list versions.
Because they reflect the underlying dictionary, any change in the dictionary will change the view, and will almost certainly change the order of iteration. So unlike the list versions, they're not "stable".
Because they reflect the underlying dictionary, they're almost certainly small proxy objects; copying the keys/values/items would require that they watch the original dictionary somehow and copy it multiple times when changes happen, which would be an absurd implementation. So I would expect very little memory overhead, but access to be a little slower than directly to the dictionary.
So I guess the key usecase is if you're keeping a dictionary around and repeatedly iterating over its keys/items/values with modifications in between. You could just use a view instead, turning for k, v in mydict.iteritems(): into for k, v in myview:. But if you're just iterating over the dictionary once, I think the iter- versions are still preferable.

As you mentioned dict.items() returns a copy of the dictionary’s list of (key, value) pairs which is wasteful and dict.iteritems() returns an iterator over the dictionary’s (key, value) pairs.
Now take the following example to see the difference between an interator of dict and a view of dict
>>> d = {"x":5, "y":3}
>>> iter = d.iteritems()
>>> del d["x"]
>>> for i in iter: print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Whereas a view simply shows you what's in the dict. It doesn't care if it changed:
>>> d = {"x":5, "y":3}
>>> v = d.viewitems()
>>> v
dict_items([('y', 3), ('x', 5)])
>>> del d["x"]
>>> v
dict_items([('y', 3)])
A view is simply a what the dictionary looks like now. After deleting an entry .items() would have been out-of-date and .iteritems() would have thrown an error.

The view methods return a list(not a copy of the list, compared to .keys(), .items() and .values()), so it is more lightweight, but reflects the current contents of dictionary.
From Python 3.0 - dict methods return views - why?
The main reason is that for many use cases returning a completely
detached list is unnecessary and wasteful. It would require copying
the entire content (which may or many not be a lot).
If you simply want to iterate over the keys then creating a new list
is not necessary. And if you indeed need it as a separate list (as a
copy) then you can easily create that list from the view.

Views let you access the underlaying data structure, without copying it. Besides being dynamic as opposed to creating a list, one of their most useful usage is in test. Say you want to check if a value is in the dict or not (either it be key or value).
Option one is to create a list of the keys using dict.keys(), this works but obviously consumes more memory. If the dict is very large? That would be wasteful.
With views you can iterate the actual data-structure, without intermediate list.
Let's use examples. I've a dict with 1000 keys of random strings and digits and k is the key I want to look for
large_d = { .. 'NBBDC': '0RMLH', 'E01AS': 'UAZIQ', 'G0SSL': '6117Y', 'LYBZ7': 'VC8JQ' .. }
>>> len(large_d)
1000
# this is one option; It creates the keys() list every time, it's here just for the example
timeit.timeit('k in large_d.keys()', setup='from __main__ import large_d, k', number=1000000)
13.748743600954867
# now let's create the list first; only then check for containment
>>> list_keys = large_d.keys()
>>> timeit.timeit('k in list_keys', setup='from __main__ import large_d, k, list_keys', number=1000000)
8.874809793833492
# this saves us ~5 seconds. Great!
# let's try the views now
>>> timeit.timeit('k in large_d.viewkeys()', setup='from __main__ import large_d, k', number=1000000)
0.08828549011070663
# How about saving another 8.5 seconds?
As you can see, iterating view object gives a huge boost to performance, reducing memory overhead at the same time. You should use them when you need to perform Set like operations.
Note: I'm running on Python 2.7

Related

Why does Python enforce change in size during iteration for dict, but not for list? [duplicate]

Let's consider this code which iterates over a list while removing an item each iteration:
x = list(range(5))
for i in x:
print(i)
x.pop()
It will print 0, 1, 2. Only the first three elements are printed since the last two elements in the list were removed by the first two iterations.
But if you try something similar on a dict:
y = {i: i for i in range(5)}
for i in y:
print(i)
y.pop(i)
It will print 0, then raise RuntimeError: dictionary changed size during iteration, because we are removing a key from the dictionary while iterating over it.
Of course, modifying a list during iteration is bad. But why is a RuntimeError not raised as in the case of dictionary? Is there any good reason for this behaviour?
I think the reason is simple. lists are ordered, dicts (prior to Python 3.6/3.7) and sets are not. So modifying a lists as you iterate may be not advised as best practise, but it leads to consistent, reproducible, and guaranteed behaviour.
You could use this, for example let's say you wanted to split a list with an even number of elements in half and reverse the 2nd half:
>>> lst = [0,1,2,3]
>>> lst2 = [lst.pop() for _ in lst]
>>> lst, lst2
([0, 1], [3, 2])
Of course, there are much better and more intuitive ways to perform this operation, but the point is it works.
By contrast, the behaviour for dicts and sets is totally implementation specific since the iteration order may change depending on the hashing.
You get a RunTimeError with collections.OrderedDict, presumably for consistency with the dict behaviour. I don't think any change in the dict behaviour is likely after Python 3.6 (where dicts are guaranteed to maintain insertion ordered) since it would break backward compatibility for no real use cases.
Note that collections.deque also raises a RuntimeError in this case, despite being ordered.
It wouldn't have been possible to add such a check to lists without breaking backward compatibility. For dicts, there was no such issue.
In the old, pre-iterators design, for loops worked by calling the sequence element retrieval hook with increasing integer indices until it raised IndexError. (I would say __getitem__, but this was back before type/class unification, so C types didn't have __getitem__.) len isn't even involved in this design, and there is nowhere to check for modification.
When iterators were introduced, the dict iterator had the size change check from the very first commit that introduced iterators to the language. Dicts weren't iterable at all before that, so there was no backward compatibility to break. Lists still went through the old iteration protocol, though.
When list.__iter__ was introduced, it was purely a speed optimization, not intended to be a behavioral change, and adding a modification check would have broken backward compatibility with existing code that relied on the old behavior.
Dictionary uses insertion order with an additional level of indirection, which causes hiccups when iterating while keys are removed and re-inserted, thereby changing the order and internal pointers of the dictionary.
And this problem is not fixed by iterating d.keys() instead of d, since in Python 3, d.keys() returns a dynamic view of the keys in the dict which results in the same problem. Instead, iterate over list(d) as this will produce a list from the keys of the dictionary that will not change during iteration

Why use dict.keys?

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?
On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.
Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.
Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().
key in dict
is much faster than checking
key in dict.keys()

What is a flexible, hybrid python collection object?

As a way to get used to python, I am trying to translate some of my code to python from Autohotkey_L.
I am immediately running into tons of choices for collection objects. Can you help me figure out a built in type or a 3rd party contributed type that has as much as possible, the functionality of the AutoHotkey_L object type and its methods.
AutoHotkey_L Objects have features of a python dict, list, and a class instance.
I understand that there are tradeoffs for space and speed, but I am just interested in functionality rather than optimization issues.
Don't write Python as <another-language>. Write Python as Python.
The data structure should be chosen just to have the minimal ability you need to use.
list — an ordered sequence of elements, with 1 flexible end.
collections.deque — an ordered sequence of elements, with 2 flexible ends (e.g. a queue).
set / frozenset — an unordered sequence of unique elements.
collections.Counter — an unordered sequence of non-unique elements.
dict — an unordered key-value relationship.
collections.OrderedDict — an ordered key-value relationship.
bytes / bytearray — a list of bytes.
array.array — a homogeneous list of primitive types.
Looking at the interface of Object,
dict would be the most suitable for finding a value by key
collections.OrderedDict would be the most suitable for the push/pop stuff.
when you need MinIndex / MaxIndex, where a sorted key-value relationship (e.g. red black tree) is required. There's no such type in the standard library, but there are 3rd party implementations.
It would be impossible to recommend a particular class without knowing how you intend on using it. If you are using this particular object as an ordered sequence where elements can be repeated, then you should use a list; if you are looking up values by their key, then use a dictionary. You will get very different algorithmic runtime complexity with the different data types. It really does not take that much time to determine when to use which type.... I suggest you give it some further consideration.
If you really can't decide, though, here is a possibility:
class AutoHotKeyObject(object):
def __init__(self):
self.list_value = []
self.dict_value = {}
def getDict(self):
return self.dict_value
def getList(self):
return self.list_value
With the above, you could use both the list and dictionary features, like so:
obj = AutoHotKeyObject()
obj.getList().append(1)
obj.getList().append(2)
obj.getList().append(3)
print obj.getList() # Prints [1, 2, 3]
obj.getDict()['a'] = 1
obj.getDict()['b'] = 2
print obj.getDict() # Prints {'a':1, 'b':2}

Why do we need tuples in Python (or any immutable data type)?

I've read several python tutorials (Dive Into Python, for one), and the language reference on Python.org - I don't see why the language needs tuples.
Tuples have no methods compared to a list or set, and if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
Immutability?
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
In C/C++ if I allocate a pointer and point to some valid memory, I don't care where the address is located as long as it's not null before I use it.
Whenever I reference that variable, I don't need to know if the pointer is still pointing to the original address or not. I just check for null and use it (or not).
In Python, when I allocate a string (or tuple) assign it to x, then modify the string, why do I care if it's the original object? As long as the variable points to my data, that's all that matters.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
x still references the data I want, why does anyone need to care if its id is the same or different?
immutable objects can allow substantial optimization; this is presumably why strings are also immutable in Java, developed quite separately but about the same time as Python, and just about everything is immutable in truly-functional languages.
in Python in particular, only immutables can be hashable (and, therefore, members of sets, or keys in dictionaries). Again, this afford optimization, but far more than just "substantial" (designing decent hash tables storing completely mutable objects is a nightmare -- either you take copies of everything as soon as you hash it, or the nightmare of checking whether the object's hash has changed since you last took a reference to it rears its ugly head).
Example of optimization issue:
$ python -mtimeit '["fee", "fie", "fo", "fum"]'
1000000 loops, best of 3: 0.432 usec per loop
$ python -mtimeit '("fee", "fie", "fo", "fum")'
10000000 loops, best of 3: 0.0563 usec per loop
None of the answers above point out the real issue of tuples vs lists, which many new to Python seem to not fully understand.
Tuples and lists serve different purposes. Lists store homogenous data. You can and should have a list like this:
["Bob", "Joe", "John", "Sam"]
The reason that is a correct use of lists is because those are all homogenous types of data, specifically, people's names. But take a list like this:
["Billy", "Bob", "Joe", 42]
That list is one person's full name, and their age. That isn't one type of data. The correct way to store that information is either in a tuple, or in an object. Lets say we have a few :
[("Billy", "Bob", "Joe", 42), ("Robert", "", "Smith", 31)]
The immutability and mutability of Tuples and Lists is not the main difference. A list is a list of the same kind of items: files, names, objects. Tuples are a grouping of different types of objects. They have different uses, and many Python coders abuse lists for what tuples are meant for.
Please don't.
Edit:
I think this blog post explains why I think this better than I did:
Understanding tuples vs. lists in Python - E-Scribe
if I must convert a tuple to a set or list to be able to sort them, what's the point of using a tuple in the first place?
In this particular case, there probably isn't a point. This is a non-issue, because this isn't one of the cases where you'd consider using a tuple.
As you point out, tuples are immutable. The reasons for having immutable types apply to tuples:
copy efficiency: rather than copying an immutable object, you can alias it (bind a variable to a reference)
comparison efficiency: when you're using copy-by-reference, you can compare two variables by comparing location, rather than content
interning: you need to store at most one copy of any immutable value
there's no need to synchronize access to immutable objects in concurrent code
const correctness: some values shouldn't be allowed to change. This (to me) is the main reason for immutable types.
Note that a particular Python implementation may not make use of all of the above features.
Dictionary keys must be immutable, otherwise changing the properties of a key-object can invalidate invariants of the underlying data structure. Tuples can thus potentially be used as keys. This is a consequence of const correctness.
See also "Introducing tuples", from Dive Into Python.
Sometimes we like to use objects as dictionary keys
For what it's worth, tuples recently (2.6+) grew index() and count() methods
I've always found having two completely separate types for the same basic data structure (arrays) to be an awkward design, but not a real problem in practice. (Every language has its warts, Python included, but this isn't an important one.)
Why does anyone care if a variable lives at a different place in memory than when it was originally allocated? This whole business of immutability in Python seems to be over emphasized.
These are different things. Mutability isn't related to the place it's stored in memory; it means the stuff it points to can't change.
Python objects can't change location after they're created, mutable or not. (More accurately, the value of id() can't change--same thing, in practice.) The internal storage of mutable objects can change, but that's a hidden implementation detail.
>>> x='hello'
>>> id(x)
1234567
>>> x='good bye'
>>> id(x)
5432167
This isn't modifying ("mutating") the variable; it's creating a new variable with the same name, and discarding the old one. Compare to a mutating operation:
>>> a = [1,2,3]
>>> id(a)
3084599212L
>>> a[1] = 5
>>> a
[1, 5, 3]
>>> id(a)
3084599212L
As others have pointed out, this allows using arrays as keys to dictionaries, and other data structures that need immutability.
Note that keys for dictionaries do not have to be completely immutable. Only the part of it used as a key needs to be immutable; for some uses, this is an important distinction. For example, you could have a class representing a user, which compares equality and a hash by the unique username. You could then hang other mutable data on the class--"user is logged in", etc. Since this doesn't affect equality or the hash, it's possible and perfectly valid to use this as a key in a dictionary. This isn't too commonly needed in Python; I just point it out since several people have claimed that keys need to be "immutable", which is only partially correct. I've used this many times with C++ maps and sets, though.
As gnibbler offered in a comment, Guido had an opinion that is not fully accepted/appreciated: “lists are for homogeneous data, tuples are for heterogeneous data”. Of course, many of the opposers interpreted this as meaning that all elements of a list should be of the same type.
I like to see it differently, not unlike others also have in the past:
blue= 0, 0, 255
alist= ["red", "green", blue]
Note that I consider alist to be homogeneous, even if type(alist[1]) != type(alist[2]).
If I can change the order of the elements and I won't have issues in my code (apart from assumptions, e.g. “it should be sorted”), then a list should be used. If not (like in the tuple blue above), then I should use a tuple.
They are important since they guarantee the caller that the object they pass won't be mutated.
If you do this:
a = [1,1,1]
doWork(a)
The caller has no guarantee of the value of a after the call.
However,
a = (1,1,1)
doWorK(a)
Now you as the caller or as a reader of this code know that a is the same.
You could always for this scenario make a copy of the list and pass that but now you are wasting cycles instead of using a language construct that makes more semantic sense.
you can see here for some discussion on this
Your question (and follow-up comments) focus on whether the id() changes during an assignment. Focusing on this follow-on effect of the difference between immutable object replacement and mutable object modification rather than the difference itself is perhaps not the best approach.
Before we continue, make sure that the behavior demonstrated below is what you expect from Python.
>>> a1 = [1]
>>> a2 = a1
>>> print a2[0]
1
>>> a1[0] = 2
>>> print a2[0]
2
In this case, the contents of a2 was changed, even though only a1 had a new value assigned. Contrast to the following:
>>> a1 = (1,)
>>> a2 = a1
>>> print a2[0]
1
>>> a1 = (2,)
>>> print a2[0]
1
In this latter case, we replaced the entire list, rather than updating its contents. With immutable types such as tuples, this is the only behavior allowed.
Why does this matter? Let's say you have a dict:
>>> t1 = (1,2)
>>> d1 = { t1 : 'three' }
>>> print d1
{(1,2): 'three'}
>>> t1[0] = 0 ## results in a TypeError, as tuples cannot be modified
>>> t1 = (2,3) ## creates a new tuple, does not modify the old one
>>> print d1 ## as seen here, the dict is still intact
{(1,2): 'three'}
Using a tuple, the dictionary is safe from having its keys changed "out from under it" to items which hash to a different value. This is critical to allow efficient implementation.

Python: Does a dict value pointer store its key?

I'm wondering if there is a built-in way to do this... Take this simple code for example:
D = {'one': objectA(), 'two': objectB(), 'three': objectC()}
object_a = D['one']
I believe object_a is just pointing at the objectA() created on the first line, and knows nothing about the dictionary D, but my question is, does Python store the Key of the dictionary value? Is there a way to get the Key 'one' if all you have is the variable object_a (without looping over the dictionary, of course)?
If not, I can store the value 'one' inside objectA(), but I'm just curious if Python already stores that info.
I think no.
Consider the case of adding a single object to a (large) number of different dictionaries. It would become quite expensive for Python to track that for you, it would cost a lot for a feature not used by most.
The dict mapping is not trivially "reversible" as you describe.
The key must be immutable. It must be immutable so that it can be hashed for lookup and not suffer spontaneous changes.
The value does not have to be immutable, it is not hashed for quick lookup.
You cannot simply go from value back to key without (1) creating an immutable value and (2) populating some other kind of mapping with the "reversed" value -> key mapping.
Is there a way to get the Key 'one' if
all you have is the variable object_a
(without looping over the dictionary,
of course)?
No, Python imposes no such near-useless redundancy on you. If objA is a factory callable:
d = {'zap': objA()}
a = d['zap']
and
b = objA()
just as well as
L = [objA()]
c = L[0]
all result in exactly the same kind of references in a, b and c, to exactly equivalent objects (if that's what objA gives you in the first place), without one bit wasted (neither in said objects nor in any redundant and totally hypothetical auxiliary structure) to record "this is/was a value in list L and/or dict d at these index/key" ((or indices/keys since of cource there could be many)).
Like others have said, there is no built-in way to do this, since it takes up memory and is not usually needed.
If not, I can store the value 'one' inside objectA(), but I'm just curious if Python already stores that info.
Just wanted to add that it should be pretty easy to add a more general solution which does this automatically. For example:
def MakeDictReversible(dict):
for k, v in dict.iteritems():
v.dict_key = k
This function just embeds every object in the dictionary with a member "dict_key", which is the dictionary key used to store the object.
Of course, this code can only work once (i.e., run this on two different dictionaries which share an object, and the object's "dict_key" member will be overwritten by the second dictionary).

Categories