What is the complexity of calling of dict.keys() in Python 3? - python

What's the asymptotic complexity of dict.keys() in python?
I found this website but it does not have the answer. I am using Python 3, but I guess this is not version specific.

In Python 3, dict.keys() returns a view object. Essentially, this is just a window directly onto the dictionary's keys. There is no looping over the hashtable to build a new object, for example. This makes calling it a constant-time, that is O(1), operation.
View objects for dictionaries are implemented starting here; the creation of new view objects uses dictview_new. All that this function does is create the new object that points back at the dictionary and increase reference counts (for garbage tracking).
In Python 2, dict.keys() returns a list object. To create this new list, Python must loop over the hashtable, putting the dictionary's keys into the list. This is implemented as the function dict_keys. The time complexity here is linear with the size of the dictionary, that is O(n), since every slot in the table must be visited.
N.B. dict.viewkeys() in Python 2 does the same as dict.keys() in Python 3.

Related

Storing Objects in a List in Python

If an object exists with variable .X
randomData.X
is the created object. If multiple objects are stored in a list and can be accessed via
randomList[3].X
Is there a way to pull all values of X from the list without looping through every object in the list as below:
for x i range(0,10)
randomList[x].X
You are probably looking for a list comprehension.
[obj.X for obj in randomList]
This produces a list with all properties X of every object in your list of objects.
Keep in mind that you can't get away from looping over the list. This is just syntactic sugar for the same loop as before.
Just in case you're looking for maximum efficiency on larger lists, an alternative to the list comprehension in this case is using map + operator.attrgetter. You can either loop over the map directly:
from operator import attrgetter
for X in map(attrgetter('X'), randomList):
which involves no temporary list (map lazily pulls items on demand in Python 3), or if you really need the list, just wrap in the list constructor or use list unpacking to run it out eagerly:
Xs = list(map(attrgetter('X'), randomList))
# or
Xs = [*map(attrgetter('X'), randomList)]
For small input lists, this will be slower than the list comprehension (it has a slightly higher setup overhead), but for medium to large inputs, it will be faster (the per-item overhead is slightly lower, as it involves no per-item bytecode execution).
To be clear, it's still going to have to loop over the list. There is no magic way to get the attributes of every item in a list without looping over it; you could go to extreme lengths to make views of the list that seamlessly read the attribute from the underlying list, but if you accessed every element of that view it would be equivalent to the loop in work required.

Python iter() time complexity?

I was looking up an efficient way to retrieve an (any) element from a set in Python and came across this method:
anyElement = next(iter(SET))
What exactly happens when you generate an iterator out of a container such as a set? Does it simply create a pointer to the location of the object in memory and move that pointer whenever next is called? Or does it convert the set to a list then create an iterator out of that?
My main concern is if it were the latter, it seems iter() would be an O(n) operation. At that point it would be better to just pop an item from the set, store the popped item in a variable, then re-insert the popped item back into the set.
Thanks for any information in advance!
sets are iterable, but don't have a .__next__() method, so iter() is calling the .__iter__() method of the set instance, returning an iterable which does have the __next__ method.
As this is a wrapper around an O(1) call, it will operate in O(1) time once declared
https://wiki.python.org/moin/TimeComplexity
See also Retrieve an arbitrary key from python3 dict in O(1) time for an extended answer on .__next__()!

dict_key object does not support indexing-python 3

I am getting an error saying "dict_key object does not support indexing" at:
return len(G[G.keys()[0]])
I realised it used to work in python 2.7.x but not in python 3.How should i change this statement to make it work in python 3.
In Python 2.x, G.keys() returns a list, but Python 3.x returns a dict_keys object instead. The solution is to wrap G.keys() with call to list(), to convert it into the correct type:
return len(G[list(G.keys())[0]])
In Python 3, the objects returned by keys, values, and items are dictionary view objects, which don't support indexing.
Try, instead:
len(next(iter(G.values())))
This gets the dictionary view object for the dictionary's values, gets its iterator, grabs the first item from the iterator (the first value in the dictionary), and returns its length.
Unlike other methods that create a new list of the keys or values, it should take approximately the same amount of time no matter the size of the dictionary.
It works in both Python 2 and Python 3 (though to be efficient you'd need to use itervalues or viewvalues on Python 2).

Why use dict.keys?

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?
On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.
Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.
Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().
key in dict
is much faster than checking
key in dict.keys()

What are dictionary view objects?

In python 2.7, we got the dictionary view methods available.
Now, I know the pro and cons of the following:
dict.items() (and values, keys): returns a list, so you can actually store the result, and
dict.iteritems() (and the like): returns a generator, so you can iterate over each value generated one by one.
What are dict.viewitems() (and the like) for? What are their benefits? How does it work? What is a view after all?
I read that the view is always reflecting the changes from the dictionary. But how does it behave from the perf and memory point of view? What are the pro and cons?
Dictionary views are essentially what their name says: views are simply like a window on the keys and values (or items) of a dictionary. Here is an excerpt from the official documentation for Python 3:
>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.keys()
>>> values = dishes.values()
>>> # view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> keys # No eggs anymore!
dict_keys(['sausage', 'bacon', 'spam'])
>>> values # No eggs value (2) anymore!
dict_values([1, 1, 500])
(The Python 2 equivalent uses dishes.viewkeys() and dishes.viewvalues().)
This example shows the dynamic character of views: the keys view is not a copy of the keys at a given point in time, but rather a simple window that shows you the keys; if they are changed, then what you see through the window does change as well. This feature can be useful in some circumstances (for instance, one can work with a view on the keys in multiple parts of a program instead of recalculating the current list of keys each time they are needed)—note that if the dictionary keys are modified while iterating over the view, how the iterator should behave is not well defined, which can lead to errors.
One advantage is that looking at, say, the keys uses only a small and fixed amount of memory and requires a small and fixed amount of processor time, as there is no creation of a list of keys (Python 2, on the other hand, often unnecessarily creates a new list, as quoted by Rajendran T, which takes memory and time in an amount proportional to the length of the list). To continue the window analogy, if you want to see a landscape behind a wall, you simply make an opening in it (you build a window); copying the keys into a list would correspond to instead painting a copy of the landscape on your wall—the copy takes time, space, and does not update itself.
To summarize, views are simply… views (windows) on your dictionary, which show the contents of the dictionary even after it changes. They offer features that differ from those of lists: a list of keys contain a copy of the dictionary keys at a given point in time, while a view is dynamic and is much faster to obtain, as it does not have to copy any data (keys or values) in order to be created.
Just from reading the docs I get this impression:
Views are "pseudo-set-like", in that they don't support indexing, so what you can do with them is test for membership and iterate over them (because keys are hashable and unique, the keys and items views are more "set-like" in that they don't contain duplicates).
You can store them and use them multiple times, like the list versions.
Because they reflect the underlying dictionary, any change in the dictionary will change the view, and will almost certainly change the order of iteration. So unlike the list versions, they're not "stable".
Because they reflect the underlying dictionary, they're almost certainly small proxy objects; copying the keys/values/items would require that they watch the original dictionary somehow and copy it multiple times when changes happen, which would be an absurd implementation. So I would expect very little memory overhead, but access to be a little slower than directly to the dictionary.
So I guess the key usecase is if you're keeping a dictionary around and repeatedly iterating over its keys/items/values with modifications in between. You could just use a view instead, turning for k, v in mydict.iteritems(): into for k, v in myview:. But if you're just iterating over the dictionary once, I think the iter- versions are still preferable.
As you mentioned dict.items() returns a copy of the dictionary’s list of (key, value) pairs which is wasteful and dict.iteritems() returns an iterator over the dictionary’s (key, value) pairs.
Now take the following example to see the difference between an interator of dict and a view of dict
>>> d = {"x":5, "y":3}
>>> iter = d.iteritems()
>>> del d["x"]
>>> for i in iter: print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Whereas a view simply shows you what's in the dict. It doesn't care if it changed:
>>> d = {"x":5, "y":3}
>>> v = d.viewitems()
>>> v
dict_items([('y', 3), ('x', 5)])
>>> del d["x"]
>>> v
dict_items([('y', 3)])
A view is simply a what the dictionary looks like now. After deleting an entry .items() would have been out-of-date and .iteritems() would have thrown an error.
The view methods return a list(not a copy of the list, compared to .keys(), .items() and .values()), so it is more lightweight, but reflects the current contents of dictionary.
From Python 3.0 - dict methods return views - why?
The main reason is that for many use cases returning a completely
detached list is unnecessary and wasteful. It would require copying
the entire content (which may or many not be a lot).
If you simply want to iterate over the keys then creating a new list
is not necessary. And if you indeed need it as a separate list (as a
copy) then you can easily create that list from the view.
Views let you access the underlaying data structure, without copying it. Besides being dynamic as opposed to creating a list, one of their most useful usage is in test. Say you want to check if a value is in the dict or not (either it be key or value).
Option one is to create a list of the keys using dict.keys(), this works but obviously consumes more memory. If the dict is very large? That would be wasteful.
With views you can iterate the actual data-structure, without intermediate list.
Let's use examples. I've a dict with 1000 keys of random strings and digits and k is the key I want to look for
large_d = { .. 'NBBDC': '0RMLH', 'E01AS': 'UAZIQ', 'G0SSL': '6117Y', 'LYBZ7': 'VC8JQ' .. }
>>> len(large_d)
1000
# this is one option; It creates the keys() list every time, it's here just for the example
timeit.timeit('k in large_d.keys()', setup='from __main__ import large_d, k', number=1000000)
13.748743600954867
# now let's create the list first; only then check for containment
>>> list_keys = large_d.keys()
>>> timeit.timeit('k in list_keys', setup='from __main__ import large_d, k, list_keys', number=1000000)
8.874809793833492
# this saves us ~5 seconds. Great!
# let's try the views now
>>> timeit.timeit('k in large_d.viewkeys()', setup='from __main__ import large_d, k', number=1000000)
0.08828549011070663
# How about saving another 8.5 seconds?
As you can see, iterating view object gives a huge boost to performance, reducing memory overhead at the same time. You should use them when you need to perform Set like operations.
Note: I'm running on Python 2.7

Categories