conditionals with dicts Python - python

I was wondering what is the correct way to check a key:value pair of a dict. Lets say I have this dict
dict_ = {
'key1':'val1',
'key2':'val2'
}
I can check a condition like this
if dict_['key1'] == 'val1'
but I feel like there is a more elegant way that takes advantage of the dict data structure.

What you're doing already does take advantage of the data structure, which is why it's "the one obvious way" to do what you want to do. (You can find examples like this all over the tutorial, the reference docs, and the stdlib implementation.)
However, I can see what you're thinking: the dict is in some sense a container of key-value pairs (even if it's only a collections.Container of keys…), so… shouldn't there be some way to just check whether a key-value pair exists?
Up to Python 2.6, there really isn't.* But in 3.0, the items() method returns a special set-like view of the key-value pairs. And 2.7 backported that functionality, under the name viewitems. So:
('key1', 'val1') in d.viewitems()
But I don't think that's really clearer or cleaner; "items" feels like a lower-level way to think of dictionaries than "mappings", which is what both your original code and smci's answer rely on.
It's also less concise, it doesn't work in 2.6 or earlier, and many dict-like mapping objects don't support it,** and it's and slightly slower on 2.7 to boot, but these are probably less important, and not what you asked about.
* Well, there is, but only by iterating over all of the items with iteritems, or using items to effectively do the same exhaustive search behind your back, neither of which is what you want.
** In fact, in 2.7, it's not actually possible to support it with a pure-Python class…

If you want to avoid throwing KeyError if dict doesn't even contain 'key1':
if dict_.get('key1')=='val1':
(However, throwing an exception for missing key is perfectly fine Python idiom.)
Otherwise, #Cyber is correct that it's already fine! (What exactly is the problem?)

There is a has_key function
dict_.has_key('key1')
This returns a boolean true or false.
Alternatively, you can have you get function return a default value when the key is not present.
dict_.get('key3','Default Value')
Modified typo*

Related

Use Python 2 Dict Comparison in Python 3

I'm trying to port some code from Python 2 to Python 3. It's ugly stuff but I'm trying to get the Python 3 results to be as identical to the Python 2 results as possible. I have code similar to this:
import json
# Read a list of json dictionaries by line from file.
objs = []
with open('data.txt') as fptr:
for line in fptr:
objs.append(json.loads(line))
# Give the dictionaries a reliable order.
objs = sorted(objs)
# Do something externally visible with each dictionary:
for obj in objs:
do_stuff(obj)
When I port this code from Python 2 to Python 3, I get an error:
TypeError: unorderable types: dict() < dict()
So I changed the sorted line to this:
objs = sorted(objs, key=id)
But the ordering of the dictionaries still changed between Python 2 and Python 3.
Is there a way to replicate the Python 2 comparison logic in Python 3? Is it simply that id was used before and is not reliable between Python versions?
If you want the same behavior as earlier versions of Python 2.x in both 2.7 (which uses an arbitrary sort order instead) and 3.x (which refuses to sort dicts), Ned Batchelder's answer to a question about how sorting dicts works gets you part of the way there, but not all the way.
First, it gives you an old-style cmp function, not a new-style key function. Fortunately, both 2.7 and 3.x have functools.cmp_to_key to solve that. (You could of course instead rewrite the code as a key function, but that may make it harder to see any differences between the posted code and your code…)
More importantly, it not only doesn't do the same thing in 2.7 and 3.x, it doesn't even work in 2.7 and 3.x. To understand why, look at the code:
def smallest_diff_key(A, B):
"""return the smallest key adiff in A such that A[adiff] != B[bdiff]"""
diff_keys = [k for k in A if A.get(k) != B.get(k)]
return min(diff_keys)
def dict_cmp(A, B):
if len(A) != len(B):
return cmp(len(A), len(B))
adiff = smallest_diff_key(A, B)
bdiff = smallest_diff_key(B, A)
if adiff != bdiff:
return cmp(adiff, bdiff)
return cmp(A[adiff], b[bdiff])
Notice that it's calling cmp on the mismatched values.
If the dicts can contain other dicts, that's relying on the fact that cmp(d1, d2) is going to end up calling this function… which is obviously not true in newer Python.
On top of that, in 3.x cmp doesn't even exist anymore.
Also, this relies on the fact that any value can be compared with any other value—you might get back arbitrary results, but you won't get an exception. That was true (except in a few rare cases) in 2.x, but it's not true in 3.x. That may not be a problem for you if you don't want to compare dicts with non-comparable values (e.g., if it's OK for {1: 2} < {1: 'b'} to raise an exception), but otherwise, it is.
And of course if you don't want arbitrary results for dict comparison, do you really want arbitrary results for value comparisons?
The solution to all three problems is simple: you have to replace cmp, instead of calling it. So, something like this:
def mycmp(A, B):
if isinstance(A, dict) and isinstance(B, dict):
return dict_cmp(A, B)
try:
return A < B
except TypeError:
# what goes here depends on how far you want to go for consistency
If you want the exact rules for comparison of objects of different types that 2.7 used, they're documented, so you can implement them. But if you don't need that much detail, you can write something simpler here (or maybe even just not trap the TypeError, if the exception mentioned above is acceptable).
The logic in CPython2.x is somewhat complicated since the behavior is dictated by dict.__cmp__. A python implementation can be found here.
However, if you really want a reliable ordering, you'll need to sort on a better key than id. You could use functools.cmp_to_key to transform the comparison function from the linked answer to a key function, but really, it's not a good ordering anyway as it is completely arbitrary.
Your best bet is to sort all of the dictionaries by a field's value (or multiple fields). operator.itemgetter can be used for this purpose quite nicely. Using this as a key function should give you consistent results for any somewhat modern implementation and version of python.
Is there a way to replicate the Python 2 comparison logic in Python 3? Is it simply that id was used before and is not reliable between Python versions?
id is never "reliable". The id that you get for any given object is a completely arbitrary value; it can be different from one run to the next, even on the same machine and Python version.
Python 2.x doesn't actually document that it sorts by id. All it says is:
Outcomes other than equality are resolved consistently, but are not otherwise defined.
But that just makes the point even better: the order is explicitly defined to be arbitary (except for being consistent during any given run). Which is exactly the same guarantee you get by sorting with key=id in Python 3.x, whether or not it actually works the same way.*
So you're doing the same thing in 3.x. The fact that the two arbitrary orders are different just means that arbitrary is arbitrary.
If you want some kind of repeatable ordering for a dict based on what it contains, you just have to decide what that order is, and then you can build it. For example, you can sort the items in order then compare them (recursively passing down the same key function in case the items are or contain dicts).**
And, having designed and implemented some kind of sensible, non-arbitrary ordering, it will of course work the same way in 2.7 and 3.x.
* Note that it's not equivalent for identity comparisons, only for ordering comparisons. If you're only using it for sorted, this has the consequence that your sort will no longer be stable. But since it's in arbitrary order anyway, that hardly matters.
** Note that Python 2.x used to use a rule similar to this. From the footnote to the above: "Earlier versions of Python used lexicographic comparison of the sorted (key, value) lists, but this was very expensive for the common case of comparing for equality." So, that tells you that it's a reasonable rule—as long as it's actually the rule you want, and you don't mind the performance cost.
If you simply need an order that is consistent across multiple runs of Python on potentially different platforms but don't actually care about the actual order then a simple solution is to dump the dicts to JSON before sorting them:
import json
def sort_as_json(dicts):
return sorted(dicts, key=lambda d: json.dumps(d, sort_keys=True))
print(list(sort_as_json([{'foo': 'bar'}, {1: 2}])))
# Prints [{1: 2}, {'foo': 'bar'}]
Obviously this only works if your dicts are JSON-representable, but since you load them from JSON anyway that should be no problem. In your case, you could achieve the same result by simply sorting the file you're loading the objects from before deserializing the JSON.
You can compare .items()
d1 = {"key1": "value1"}
d2 = {"key1": "value1", "key2": "value2"}
d1.items() <= d2.items()
True
But this is not recursive
d1 = {"key1": "value1", "key2": {"key11": "value11"}}
d2 = {"key1": "value1", "key2": {"key11": "value11", "key12": "value12"}}
d1.items() <= d2.items()
False

Why aren't dictionaries callable?

It's a usual programming problem to have a list of objects that you want to map into another. This is usually done with a dictionary:
l1=[1,2,3]
d={1:'_ONE', 2:'_TWO', 3:'_THREE'}
l2= [d[i] for i in l1]
In many cases, it's natural to see a dictionary as a function that maps keys to values. Then, you'd simply l2= map(d,l1) instead of using the list comprehension (or the ugly map(d.__getitem__,l1).
Of course, since python is so flexible, you can get this behaviour easily:
class CallableDict(dict):
def __call__(self, k):
return self[k]
but this obviously doesn't work when accessing dicts you don't create yourself.
Are there good reasons why the base python dictionaries are not callable by default?
Disclaimer: The motivation for this question is mostly ruby, where dictionaries work this way
There should be one-- and preferably only one --obvious way to do it. http://legacy.python.org/dev/peps/pep-0020/
If you make dict callable to subscript a dictionary, now you have two ways of doing the same thing:
d[k] # Existing
d(k) # Proposed
Even the case that you bring up, having a callable to pass to map, already has one obvious solution:
map(d.get, it) # Existing
map(d, it) # Proposed
There doesn't seem to be any particular advantage to your proposal, and it duplicates what are already obvious ways.

OrderedDict comprehensions

Can I extend syntax in python for dict comprehensions for other dicts, like the OrderedDict in collections module or my own types which inherit from dict?
Just rebinding the dict name obviously doesn't work, the {key: value} comprehension syntax still gives you a plain old dict for comprehensions and literals.
>>> from collections import OrderedDict
>>> olddict, dict = dict, OrderedDict
>>> {i: i*i for i in range(3)}.__class__
<type 'dict'>
So, if it's possible how would I go about doing that? It's OK if it only works in CPython. For syntax I guess I would try it with a O{k: v} prefix like we have on the r'various' u'string' b'objects'.
note: Of course we can use a generator expression instead, but I'm more interested seeing how hackable python is in terms of the grammar.
Sorry, not possible. Dict literals and dict comprehensions map to the built-in dict type, in a way that's hardcoded at the C level. That can't be overridden.
You can use this as an alternative, though:
OrderedDict((i, i * i) for i in range(3))
Addendum: as of Python 3.6, all Python dictionaries are ordered. As of 3.7, it's even part of the language spec. If you're using those versions of Python, no need for OrderedDict: the dict comprehension will Just Work (TM).
There is no direct way to change Python's syntax from within the language. A dictionary comprehension (or plain display) is always going to create a dict, and there's nothing you can do about that. If you're using CPython, it's using special bytecodes that generate a dict directly, which ultimately call the PyDict API functions and/or the same underlying functions used by that API. If you're using PyPy, those bytecodes are instead implemented on top of an RPython dict object which in turn is implemented on top of a compiled-and-optimized Python dict. And so on.
There is an indirect way to do it, but you're not going to like it. If you read the docs on the import system, you'll see that it's the importer that searches for cached compiled code or calls the compiler, and the compiler that calls the parser, and so on. In Python 3.3+, almost everything in this chain either is written in pure Python, or has an alternate pure Python implementation, meaning you can fork the code and do your own thing. Which includes parsing source with your own PyParsing code that builds ASTs, or compiling a dict comprehension AST node into your own custom bytecode instead of the default, or post-processing the bytecode, or…
In many cases, an import hook is sufficient; if not, you can always write a custom finder and loader.
If you're not already using Python 3.3 or later, I'd strongly suggest migrating before playing with this stuff. In older versions, it's harder, and less well documented, and you'll ultimately be putting in 10x the effort to learn something that will be obsolete whenever you do migrate.
Anyway, if this approach sounds interesting to you, you might want to take a look at MacroPy. You could borrow some code from it—and, maybe more importantly, learn how some of these features (that have no good examples in the docs) are used.
Or, if you're willing to settle for something less cool, you can just use MacroPy to build an "odict comprehension macro" and use that. (Note that MacroPy currently only works in Python 2.7, not 3.x.) You can't quite get o{…}, but you can get, say, od[{…}], which isn't too bad. Download od.py, realmain.py, and main.py, and run python main.py to see it working. The key is this code, which takes a DictionaryComp AST, converts it to an equivalent GeneratorExpr on key-value Tuples, and wraps it in a Call to collections.OrderedDict:
def od(tree, **kw):
pair = ast.Tuple(elts=[tree.key, tree.value])
gx = ast.GeneratorExp(elt=pair, generators=tree.generators)
odict = ast.Attribute(value=ast.Name(id='collections'),
attr='OrderedDict')
call = ast.Call(func=odict, args=[gx], keywords=[])
return call
A different alternative is, of course, to modify the Python interpreter.
I would suggest dropping the O{…} syntax idea for your first go, and just making normal dict comprehensions compile to odicts. The good news is, you don't really need to change the grammar (which is beyond hairy…), just any one of:
the bytecodes that dictcomps compile to,
the way the interpreter runs those bytecodes, or
the implementation of the PyDict type
The bad news, while all of those are a lot easier than changing the grammar, none of them can be done from an extension module. (Well, you can do the first one by doing basically the same thing you'd do from pure Python… and you can do any of them by hooking the .so/.dll/.dylib to patch in your own functions, but that's the exact same work as hacking on Python plus the extra work of hooking at runtime.)
If you want to hack on CPython source, the code you want is in Python/compile.c, Python/ceval.c, and Objects/dictobject.c, and the dev guide tells you how to find everything you need. But you might want to consider hacking on PyPy source instead, since it's mostly written in (a subset of) Python rather than C.
As a side note, your attempt wouldn't have worked even if everything were done at the Python language level. olddict, dict = dict, OrderedDict creates a binding named dict in your module's globals, which shadows the name in builtins, but doesn't replace it. You can replace things in builtins (well, Python doesn't guarantee this, but there are implementation/version-specific things-that-happen-to-work for every implementation/version I've tried…), but what you did isn't the way to do it.
Slightly modifying the response of #Max Noel, you can use list comprehension instead of a generator to create an OrderedDict in an ordered way (which of course is not possible using dict comprehension).
>>> OrderedDict([(i, i * i) for i in range(5)])
OrderedDict([(0, 0),
(1, 1),
(2, 4),
(3, 9),
(4, 16)])

What's the pythonic way to distinguish between a dict and a list of dicts?

So, I'm trying to be a good Python programmer and duck-type wherever I can, but I've got a bit of a problem where my input is either a dict or a list of dicts.
I can't distinguish between them being iterable, because they both are.
My next thought was simply to call list(x) and hope that returned my list intact and gave me my dict as the only item in a list; alas, it just gives me the list of the dict's keys.
I'm now officially out of ideas (short of calling isinstance which is, as we all know, not very pythonic). I just want to end up with a list of dicts, even if my input is a single solitary dict.
Really, there is no obvious pythonic way to do this, because it's an unreasonable input format, and the obvious pythonic way to do it is to fix the input…
But if you can't do that, then yes, you need to write an adapter (as close to the input edge as possible). The best way to do that depends on the actual data. If it really is either a dict, or a list of dicts, and nothing else is possible (e.g., you're calling json.loads on the results from some badly-written service that returns an object or an array of objects), then there's nothing wrong with isinstance.
If you want to make it a bit more general, you can use the appropriate ABCs. For example:
if isinstance(dict_or_list, collections.abc.Mapping):
return [dict_or_list]
else:
return dict_or_list
But unless you have some good reason to need this generality, you're just hiding the hacky workaround, when you're better off keeping it as visible as possible. If it's, e.g., coming out of json.loads from some remote server, handling a Mapping that isn't a dict is not useful, right?
(If you're using some third-party client library that just returns you "something dict-like" or "something list-like containing dict-like things", then yes, use ABCs. Or, if that library doesn't even support the proper ABCs, you can write code that tries a specific method like keys. But if that's an issue, you'll know the specific details you're working around, and can code and document appropriately.)
Accessing a dict using a non-int key will get you either an item, or a KeyError. It will get you a TypeError with a list. So you can use exception handling:
def list_dicts(dict_or_list):
try:
dict_or_list[None]
return [dict_or_list] # no error, we have a dict
except TypeError:
return dict_or_list # wrong index type, we have a list
except Exception:
return [dict_or_list] # probably KeyError but catch anything to be safe
This function will give you a list of dicts regardless of whether it got a list or a dict. (If it got a dict, it makes a list of one item out of it.) This should be fairly safe type-wise, too; other dict-like or list-like objects would probably be considered broken if they didn't have similar behavior.
You could check for the presence of an items attribute.
dict has it and list does not.
>>> hasattr({}, 'items')
True
>>> hasattr([], 'items')
False
Here's a complete list of the differences in attribute names between dict and list (in Python 3.3.2).
Attributes on list but not dict:
>>> print('\n'.join(sorted(list(set(dir([])) - set(dir({}))))))
__add__
__iadd__
__imul__
__mul__
__reversed__
__rmul__
append
count
extend
index
insert
remove
reverse
sort
Attributes on dict but not list:
>>> print('\n'.join(sorted(list(set(dir({})) - set(dir([]))))))
fromkeys
get
items
keys
popitem
setdefault
update
values
Maybe I'm being naive, but how about something like
try:
data.keys()
print "Probs just a dictionary"
except AttributeError:
print "List o' dictionaries!"
Can you just go ahead and do whatever you were going to do anyways with the data, and decide whether it's a dict or list when something goes awry?
Don't use the types module:
import types
d = {}
print type(d) is types.DictType
l = [{},{}]
print type(l) is types.ListType and len(l) and type(l[0]) is types.DictType

Why use dict.keys?

I recently wrote some code that looked something like this:
# dct is a dictionary
if "key" in dct.keys():
However, I later found that I could achieve the same results with:
if "key" in dct:
This discovery got me thinking and I began to run some tests to see if there could be a scenario when I must use the keys method of a dictionary. My conclusion however is no, there is not.
If I want the keys in a list, I can do:
keys_list = list(dct)
If I want to iterate over the keys, I can do:
for key in dct:
...
Lastly, if I want to test if a key is in dct, I can use in as I did above.
Summed up, my question is: am I missing something? Could there ever be a scenario where I must use the keys method?...or is it simply a leftover method from an earlier installation of Python that should be ignored?
On Python 3, use dct.keys() to get a dictionary view object, which lets you do set operations on just the keys:
>>> for sharedkey in dct1.keys() & dct2.keys(): # intersection of two dictionaries
... print(dct1[sharedkey], dct2[sharedkey])
In Python 2.7, you'd use dct.viewkeys() for that.
In Python 2, dct.keys() returns a list, a copy of the keys in the dictionary. This can be passed around an a separate object that can be manipulated in its own right, including removing elements without affecting the dictionary itself; however, you can create the same list with list(dct), which works in both Python 2 and 3.
You indeed don't want any of these for iteration or membership testing; always use for key in dct and key in dct for those, respectively.
Source: PEP 234, PEP 3106
Python 2's relatively useless dict.keys method exists for historical reasons. Originally, dicts weren't iterable. In fact, there was no such thing as an iterator; iterating over sequences worked by calling __getitem__, the element access method, with increasing integer indices until an IndexError was raised. To iterate over the keys of a dict, you had to call the keys method to get an explicit list of keys and iterate over that.
When iterators went in, dicts became iterable, because it was more convenient, faster, and all around better to say
for key in d:
than
for key in d.keys()
This had the side-effect of making d.keys() utterly superfluous; list(d) and iter(d) now did everything d.keys() did in a cleaner, more general way. They couldn't get rid of keys, though, since so much code already called it.
(At this time, dicts also got a __contains__ method, so you could say key in d instead of d.has_key(key). This was shorter and nicely symmetrical with for key in d; the symmetry is also why iterating over a dict gives the keys instead of (key, value) pairs.)
In Python 3, taking inspiration from the Java Collections Framework, the keys, values, and items methods of dicts were changed. Instead of returning lists, they would return views of the original dict. The key and item views would support set-like operations, and all views would be wrappers around the underlying dict, reflecting any changes to the dict. This made keys useful again.
Assuming you're not using Python 3, list(dct) is equivalent to dct.keys(). Which one you use is a matter of personal preference. I personally think dct.keys() is slightly clearer, but to each their own.
In any case, there isn't a scenario where you "need" to use dct.keys() per se.
In Python 3, dct.keys() returns a "dictionary view object", so if you need to get a hold of an unmaterialized view to the keys (which could be useful for huge dictionaries) outside of a for loop context, you'd need to use dct.keys().
key in dict
is much faster than checking
key in dict.keys()

Categories