Why use dict.keys? - python

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?

On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.

Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.

Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().

key in dict
is much faster than checking
key in dict.keys()

Related

Why there is no popitem for set in python?

Set is unordered and unindexed. Thus, there is no concept of last entered element. Thus, there is no popitem. Is this the reasoning for no popitem in set?
If this is valid reasoning then why dictionary has popitem. dictionary is also unordered like Set.
The corresponding method for sets is pop():
pop()
Remove and return an arbitrary element from the set. Raises KeyError if the set is empty.
Prior to Python 3.7 dicts were unordered and popitem() returned an arbitrary key-value pair. It's only since 3.7 that dicts have been ordered and popitem() defined to return items in LIFO order.
It's called popitem() for dicts because there's already a pop(key) method that removes the item with the specified key.
Python set has pop(), to return an arbitrary item. You don't know which one it turn, though.
There is no method to remove the last-entered element from a set because set is unordered in Python.
If you want an easy way to emulate an ordered set using the standard library you can use collections.OrderedDict instead:
from collections import OrderedDict
s = OrderedDict.fromkeys((2,5,2,6,7))
s.popitem()
From the example above, s.keys() would become:
odict_keys([2, 5, 6])

conditionals with dicts Python

I was wondering what is the correct way to check a key:value pair of a dict. Lets say I have this dict
dict_ = {
'key1':'val1',
'key2':'val2'
}
I can check a condition like this
if dict_['key1'] == 'val1'
but I feel like there is a more elegant way that takes advantage of the dict data structure.
What you're doing already does take advantage of the data structure, which is why it's "the one obvious way" to do what you want to do. (You can find examples like this all over the tutorial, the reference docs, and the stdlib implementation.)
However, I can see what you're thinking: the dict is in some sense a container of key-value pairs (even if it's only a collections.Container of keys…), so… shouldn't there be some way to just check whether a key-value pair exists?
Up to Python 2.6, there really isn't.* But in 3.0, the items() method returns a special set-like view of the key-value pairs. And 2.7 backported that functionality, under the name viewitems. So:
('key1', 'val1') in d.viewitems()
But I don't think that's really clearer or cleaner; "items" feels like a lower-level way to think of dictionaries than "mappings", which is what both your original code and smci's answer rely on.
It's also less concise, it doesn't work in 2.6 or earlier, and many dict-like mapping objects don't support it,** and it's and slightly slower on 2.7 to boot, but these are probably less important, and not what you asked about.
* Well, there is, but only by iterating over all of the items with iteritems, or using items to effectively do the same exhaustive search behind your back, neither of which is what you want.
** In fact, in 2.7, it's not actually possible to support it with a pure-Python class…
If you want to avoid throwing KeyError if dict doesn't even contain 'key1':
if dict_.get('key1')=='val1':
(However, throwing an exception for missing key is perfectly fine Python idiom.)
Otherwise, #Cyber is correct that it's already fine! (What exactly is the problem?)
There is a has_key function
dict_.has_key('key1')
This returns a boolean true or false.
Alternatively, you can have you get function return a default value when the key is not present.
dict_.get('key3','Default Value')
Modified typo*

PERL-like autovivification with default value in Python, and returns a default value from non-existing arbitrary nesting?

Suppose I want PERL-like autovivication in Python, i.e.:
>>> d = Autovivifier()
>>> d = ['nested']['key']['value']=10
>>> d
{'nested': {'key': {'value': 10}}}
There are a couple of dominant ways to do that:
Use a recursive default dict
Use a __missing__ hook to return the nested structure
OK -- easy.
Now suppose I want to return a default value from a dict with a missing key. Once again, few way to do that:
For a non-nested path, you can use a __missing__ hook
try/except block wrapping the access to potentially missing key path
Use {}.get(key, default) (does not easily work with a nested dict) i.e., There is no version of autoviv.get(['nested']['key']['no key of this value'], default)
The two goals seem in irreconcilable conflict (based on me trying to work this out the last couple hours.)
Here is the question:
Suppose I want to have an Autovivifying dict that 1) creates the nested structure for d['arbitrary']['nested']['path']; AND 2) returns a default value from a non-existing arbitrary nesting without wrapping that in try/except?
Here are the issues:
The call of d['nested']['key']['no key of this value'] is equivalent to (d['nested'])['key']['no key of this value']. Overiding __getitem__ does not work without returning an object that ALSO overrides __getitem__.
Both the methods for creating an Autovivifier will create a dict entry if you test that path for existence. i.e., I do not want if d['p1']['sp2']['etc.'] to create that whole path if you just test it with the if.
How can I provide a dict in Python that will:
Create an access path of the type d['p1']['p2'][etc]=val (Autovivication);
NOT create that same path if you test for existence;
Return a default value (like {}.get(key, default)) without wrapping in try/except
I do not need the FULL set of dict operations. Really only d=['nested']['key']['value']=val and d['nested']['key']['no key of this value'] is equal to a default value. I would prefer that testing d['nested']['key']['no key of this value'] does not create it, but would accept that.
?
To create a recursive tree of dictionaries, use defaultdict with a trick:
from collections import defaultdict
tree = lambda: defaultdict(tree)
Then you can create your x with x = tree().
above from #BrenBarn -- defaultdict of defaultdict, nested
Don't do this. It could be solved much more easily by just writing a class that has the operations you want, and even in Perl it's not a universally-appraised feature.
But, well, it is possible, with a custom autoviv class. You'd need a __getitem__ that returns an empty autoviv dict but doesn't store it. The new autoviv dict would remember the autoviv dict and key that created it, then insert itself into its parent only when a "real" value is stored in it.
Since an empty dict tests as falsey, you could then test for existence Perl-style, without ever actually creating the intermediate dicts.
But I'm not going to write the code out, because I'm pretty sure this is a terrible idea.
While it does not precisely match the dictionary protocol in Python, you could achieve reasonable results by implementing your own auto-vivification dictionary that uses variable getitem arguments. Something like (2.x):
class ExampleVivifier(object):
""" Small example class to show how to use varargs in __getitem__. """
def __getitem__(self, *args):
print args
Example usage would be:
>>> v = ExampleVivifier()
>>> v["nested", "dictionary", "path"]
(('nested', 'dictionary', 'path'),)
You can fill in the blanks to see how you can achieve your desired behaviour here.

What's the pythonic way to distinguish between a dict and a list of dicts?

So, I'm trying to be a good Python programmer and duck-type wherever I can, but I've got a bit of a problem where my input is either a dict or a list of dicts.
I can't distinguish between them being iterable, because they both are.
My next thought was simply to call list(x) and hope that returned my list intact and gave me my dict as the only item in a list; alas, it just gives me the list of the dict's keys.
I'm now officially out of ideas (short of calling isinstance which is, as we all know, not very pythonic). I just want to end up with a list of dicts, even if my input is a single solitary dict.
Really, there is no obvious pythonic way to do this, because it's an unreasonable input format, and the obvious pythonic way to do it is to fix the input…
But if you can't do that, then yes, you need to write an adapter (as close to the input edge as possible). The best way to do that depends on the actual data. If it really is either a dict, or a list of dicts, and nothing else is possible (e.g., you're calling json.loads on the results from some badly-written service that returns an object or an array of objects), then there's nothing wrong with isinstance.
If you want to make it a bit more general, you can use the appropriate ABCs. For example:
if isinstance(dict_or_list, collections.abc.Mapping):
return [dict_or_list]
else:
return dict_or_list
But unless you have some good reason to need this generality, you're just hiding the hacky workaround, when you're better off keeping it as visible as possible. If it's, e.g., coming out of json.loads from some remote server, handling a Mapping that isn't a dict is not useful, right?
(If you're using some third-party client library that just returns you "something dict-like" or "something list-like containing dict-like things", then yes, use ABCs. Or, if that library doesn't even support the proper ABCs, you can write code that tries a specific method like keys. But if that's an issue, you'll know the specific details you're working around, and can code and document appropriately.)
Accessing a dict using a non-int key will get you either an item, or a KeyError. It will get you a TypeError with a list. So you can use exception handling:
def list_dicts(dict_or_list):
try:
dict_or_list[None]
return [dict_or_list] # no error, we have a dict
except TypeError:
return dict_or_list # wrong index type, we have a list
except Exception:
return [dict_or_list] # probably KeyError but catch anything to be safe
This function will give you a list of dicts regardless of whether it got a list or a dict. (If it got a dict, it makes a list of one item out of it.) This should be fairly safe type-wise, too; other dict-like or list-like objects would probably be considered broken if they didn't have similar behavior.
You could check for the presence of an items attribute.
dict has it and list does not.
>>> hasattr({}, 'items')
True
>>> hasattr([], 'items')
False
Here's a complete list of the differences in attribute names between dict and list (in Python 3.3.2).
Attributes on list but not dict:
>>> print('\n'.join(sorted(list(set(dir([])) - set(dir({}))))))
__add__
__iadd__
__imul__
__mul__
__reversed__
__rmul__
append
count
extend
index
insert
remove
reverse
sort
Attributes on dict but not list:
>>> print('\n'.join(sorted(list(set(dir({})) - set(dir([]))))))
fromkeys
get
items
keys
popitem
setdefault
update
values
Maybe I'm being naive, but how about something like
try:
data.keys()
print "Probs just a dictionary"
except AttributeError:
print "List o' dictionaries!"
Can you just go ahead and do whatever you were going to do anyways with the data, and decide whether it's a dict or list when something goes awry?
Don't use the types module:
import types
d = {}
print type(d) is types.DictType
l = [{},{}]
print type(l) is types.ListType and len(l) and type(l[0]) is types.DictType

What are dictionary view objects?

In python 2.7, we got the dictionary view methods available.
Now, I know the pro and cons of the following:
dict.items() (and values, keys): returns a list, so you can actually store the result, and
dict.iteritems() (and the like): returns a generator, so you can iterate over each value generated one by one.
What are dict.viewitems() (and the like) for? What are their benefits? How does it work? What is a view after all?
I read that the view is always reflecting the changes from the dictionary. But how does it behave from the perf and memory point of view? What are the pro and cons?
Dictionary views are essentially what their name says: views are simply like a window on the keys and values (or items) of a dictionary. Here is an excerpt from the official documentation for Python 3:
>>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500}
>>> keys = dishes.keys()
>>> values = dishes.values()
>>> # view objects are dynamic and reflect dict changes
>>> del dishes['eggs']
>>> keys # No eggs anymore!
dict_keys(['sausage', 'bacon', 'spam'])
>>> values # No eggs value (2) anymore!
dict_values([1, 1, 500])
(The Python 2 equivalent uses dishes.viewkeys() and dishes.viewvalues().)
This example shows the dynamic character of views: the keys view is not a copy of the keys at a given point in time, but rather a simple window that shows you the keys; if they are changed, then what you see through the window does change as well. This feature can be useful in some circumstances (for instance, one can work with a view on the keys in multiple parts of a program instead of recalculating the current list of keys each time they are needed)—note that if the dictionary keys are modified while iterating over the view, how the iterator should behave is not well defined, which can lead to errors.
One advantage is that looking at, say, the keys uses only a small and fixed amount of memory and requires a small and fixed amount of processor time, as there is no creation of a list of keys (Python 2, on the other hand, often unnecessarily creates a new list, as quoted by Rajendran T, which takes memory and time in an amount proportional to the length of the list). To continue the window analogy, if you want to see a landscape behind a wall, you simply make an opening in it (you build a window); copying the keys into a list would correspond to instead painting a copy of the landscape on your wall—the copy takes time, space, and does not update itself.
To summarize, views are simply… views (windows) on your dictionary, which show the contents of the dictionary even after it changes. They offer features that differ from those of lists: a list of keys contain a copy of the dictionary keys at a given point in time, while a view is dynamic and is much faster to obtain, as it does not have to copy any data (keys or values) in order to be created.
Just from reading the docs I get this impression:
Views are "pseudo-set-like", in that they don't support indexing, so what you can do with them is test for membership and iterate over them (because keys are hashable and unique, the keys and items views are more "set-like" in that they don't contain duplicates).
You can store them and use them multiple times, like the list versions.
Because they reflect the underlying dictionary, any change in the dictionary will change the view, and will almost certainly change the order of iteration. So unlike the list versions, they're not "stable".
Because they reflect the underlying dictionary, they're almost certainly small proxy objects; copying the keys/values/items would require that they watch the original dictionary somehow and copy it multiple times when changes happen, which would be an absurd implementation. So I would expect very little memory overhead, but access to be a little slower than directly to the dictionary.
So I guess the key usecase is if you're keeping a dictionary around and repeatedly iterating over its keys/items/values with modifications in between. You could just use a view instead, turning for k, v in mydict.iteritems(): into for k, v in myview:. But if you're just iterating over the dictionary once, I think the iter- versions are still preferable.
As you mentioned dict.items() returns a copy of the dictionary’s list of (key, value) pairs which is wasteful and dict.iteritems() returns an iterator over the dictionary’s (key, value) pairs.
Now take the following example to see the difference between an interator of dict and a view of dict
>>> d = {"x":5, "y":3}
>>> iter = d.iteritems()
>>> del d["x"]
>>> for i in iter: print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: dictionary changed size during iteration
Whereas a view simply shows you what's in the dict. It doesn't care if it changed:
>>> d = {"x":5, "y":3}
>>> v = d.viewitems()
>>> v
dict_items([('y', 3), ('x', 5)])
>>> del d["x"]
>>> v
dict_items([('y', 3)])
A view is simply a what the dictionary looks like now. After deleting an entry .items() would have been out-of-date and .iteritems() would have thrown an error.
The view methods return a list(not a copy of the list, compared to .keys(), .items() and .values()), so it is more lightweight, but reflects the current contents of dictionary.
From Python 3.0 - dict methods return views - why?
The main reason is that for many use cases returning a completely
detached list is unnecessary and wasteful. It would require copying
the entire content (which may or many not be a lot).
If you simply want to iterate over the keys then creating a new list
is not necessary. And if you indeed need it as a separate list (as a
copy) then you can easily create that list from the view.
Views let you access the underlaying data structure, without copying it. Besides being dynamic as opposed to creating a list, one of their most useful usage is in test. Say you want to check if a value is in the dict or not (either it be key or value).
Option one is to create a list of the keys using dict.keys(), this works but obviously consumes more memory. If the dict is very large? That would be wasteful.
With views you can iterate the actual data-structure, without intermediate list.
Let's use examples. I've a dict with 1000 keys of random strings and digits and k is the key I want to look for
large_d = { .. 'NBBDC': '0RMLH', 'E01AS': 'UAZIQ', 'G0SSL': '6117Y', 'LYBZ7': 'VC8JQ' .. }
>>> len(large_d)
1000
# this is one option; It creates the keys() list every time, it's here just for the example
timeit.timeit('k in large_d.keys()', setup='from __main__ import large_d, k', number=1000000)
13.748743600954867
# now let's create the list first; only then check for containment
>>> list_keys = large_d.keys()
>>> timeit.timeit('k in list_keys', setup='from __main__ import large_d, k, list_keys', number=1000000)
8.874809793833492
# this saves us ~5 seconds. Great!
# let's try the views now
>>> timeit.timeit('k in large_d.viewkeys()', setup='from __main__ import large_d, k', number=1000000)
0.08828549011070663
# How about saving another 8.5 seconds?
As you can see, iterating view object gives a huge boost to performance, reducing memory overhead at the same time. You should use them when you need to perform Set like operations.
Note: I'm running on Python 2.7

Categories