How to keep a unique bag of dicts? - python

I use a set when I need to keep a reference list of values which I want to keep unique (and later on, check if something is in that set). This does not work with dict because it is not hashable.
There are quite a few techniques to "uniquify" a list of dict but all of them assume that I have a final list, which I want to reduce to unique elements.
How to do that in a dynamic way? For a set I would just .add() and element and would know that it will be added only if it is unique. Is such a (EDIT: ideally, but not necessarily) built-in mechanism available for a bag of dict (I use the word "bag" because I do not want to limit possible answers to any data container)

You can use a frozen dict which is an immutable implementation of a regular dict.
This approach should allow you to use frozen dicts inside a set.
>>> from frozendict import frozendict
>>> x = [frozendict({'a':2, 'b':3}),frozendict({'b':3, 'a':2})]
>>> set(x)
{<frozendict {'b': 3, 'a': 2}>}
>>> frozendict({'b': 3, 'a': 2}) in set(x)
True
>>> frozendict({'b': 4, 'a': 2}) in set(x)
False
>>> frozendict({'a': 2, 'b': 3}) in set(x)
True

The set classes are implemented using dictionaries. Accordingly, the
requirements for set elements are the same as those for dictionary
keys; namely, that the element defines both eq() and hash().
As a result, sets cannot contain mutable elements such as lists or
dictionaries. However, they can contain immutable collections such as
tuples or instances of ImmutableSet.
So if you only want to use built-ins, you could convert your dictionaries to a tuple of tuples upon entering them to a set, and converting them back to dictionaries when you want to use them.
dict_set = set()
dict_set.add(tuple(a_dict.items()))

For a set I would just .add() and element and would know that it will be added only if it is unique.
Instead of add(), use update() or |= with the dictionary's items. That will meet your goal of adding dynamically and incrementally while "knowing that it will be added only if it is unique.":
>>> d = dict(raymond='red')
>>> e = dict(raymond='blue')
>>> f = dict(raymond='red')
>>> s = set()
>>> s |= d.items()
>>> s |= e.items()
>>> s |= f.items()
>>> s

Related

Function with dictionary works unexpectedly [duplicate]

This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

Return the first item of a dictionar as a dictionary in Python?

Let's assume we have a dictionary like this:
>>> d={'a': 4, 'b': 2, 'c': 1.5}
If I want to select the first item of d, I can simply run the following:
>>> first_item = list(d.items())[0]
('a', 4)
However, I am trying to have first_item return a dict instead of a tuple i.e., {'a': 4}. Thanks for any tips.
Use itertools.islice to avoid creating the entire list, that is unnecessarily wasteful. Here's a helper function:
from itertools import islice
def pluck(mapping, pos):
return dict(islice(mapping.items(), pos, pos + 1))
Note, the above will return an empty dictionary if pos is out of bounds, but you can check that inside pluck and handle that case however you want (IMO it should probably raise an error).
>>> pluck(d, 0)
{'a': 4}
>>> pluck(d, 1)
{'b': 2}
>>> pluck(d, 2)
{'c': 1.5}
>>> pluck(d, 3)
{}
>>> pluck(d, 4)
{}
Note, accessing an element by position in a dict requires traversing the dict. If you need to do this more often, for arbitrary positions, consider using a sequence type like list which can do it in constant time. Although dict objects maintain insertion order, the API doesn't expose any way to manipulate the dict as a sequence, so you are stuck with using iteration.
Dictionary is a collections of key-value pairs. It is not really ordered collection, though since python 3.7 it keeps the order in which keys were added.
Anyway if you really want some "first" element you can get it in this manner:
some_item = next(iter(d.items()))
You should not convert it into a list because it will eat much (O(n)) memory and walk through whole dict as well.
Anyway I'd recommend not to think that dictionary has "first" element. It has keys and values. You can iterate over them in some unknown order (if you do not control how it is created)

How do I sort a list which is a value of dictionaries?

I need to sort a list which is a value of dictionaries by using a function with less computation cost. I would not able to share the original code, so please help me with the following example.
I tried with the standard approach, I parsed the values, used the intermediate list to sort and stored it in the new dictionary which is highly computation intensive. I am trying to streamline it, for that, I am expecting any suggestions or ways to incorporate.
Input
a= {'a':1, 'b': [2,8,4,3], 'c':['c',5,7,'a',6]}
Output
a= {'a':1, 'b': [2,3,4,8], 'c':['a','c',5,6,7]}
You do not need to sort the dict, you need to sort all values that are lists inside your dict. You do not need to create any new objects at all:
a= {'a':1, 'b': [2,8,4,3], 'c':['c',5,7,'a',6]} # changed c and a to be strings
for e in a:
if isinstance(a[e],list):
a[e].sort() # inplace sort the lists
print(a)
Output:
{'a': 1, 'c': [5, 6, 7, 'a', 'c'], 'b': [2, 3, 4, 8]}
This does not create new dicts nor does it create new lists - it simply sorts the list in-place. You can not get much faster/less computational then that unless you have special domainknowledge about your lists that would make programming a specialized in-place-sorter as replacement for list.sort() viable.
On Python 3 (thanks #Matthias Profil) comparison between int ansd str give TypeError - you can "fix" that with some optional computation ( inspired by answers at: python-list-sort-query-when-list-contains-different-element-types ):
def IsString(item):
return isinstance(item,str)
def IsInt(item):
return isinstance(item,int)
a= {'a':1, 'b': [2,8,4,3], 'c':['c',5,7,'a',6]} # changed c and a to be strings
for e in a:
if isinstance(a[e],list):
try:
a[e].sort() # inplace sort the lists
except TypeError:
str_list = sorted(filter(IsString,a[e]))
int_list = sorted(filter(IsInt,a[e]))
a[e] = int_list + str_list # default to numbers before strings
print(a)
In general (if your values are list of comparable items, eg numbers only), you could do something like this
sorted_dict = {key: sorted(value) for key, value in original_dict.items()}
If your values are single numbers/strings, you should change sorted(value) to sorted(value) if isinstance(value, list) else value. (thanks to user #DeepSpace for pointing out).
However, the example you give in not valid, unless a and c refer to integer values.

Appending object to a dictionary by key is replicating over the whole dictionary [duplicate]

This question already has answers here:
How do I initialize a dictionary of empty lists in Python?
(7 answers)
Closed 2 years ago.
I came across this behavior that surprised me in Python 2.6 and 3.2:
>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}
However, dict comprehensions in 3.2 show a more polite demeanor:
>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>>
Why does fromkeys behave like that?
Your Python 2.6 example is equivalent to the following, which may help to clarify:
>>> a = []
>>> xs = dict.fromkeys(range(2), a)
Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.
>>> xs[0] is a and xs[1] is a
True
Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:
xs = dict((i, []) for i in range(2))
In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.
Look at this:
>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}
In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.
As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?
To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.
The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.
Here are some options for creating multiple actual copies of a mutable container:
As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:
d = {k: [] for k in range(2)}
The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.
Use the form of the dict constructor suggested by #Andrew Clark:
d = dict((k, []) for k in range(2))
This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.
Use a collections.defaultdict instead of a regular dict:
d = collections.defaultdict(list)
This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:
for k in range(2):
d[k].append(42)
Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.
Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:
d = {}
for k in range(2):
d.setdefault(k, []).append(42)
The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.

Python map() dictionary values

I'm trying to use map() on the dict_values object returned by the values() function on a dictionary. However, I can't seem to be able to map() over a dict_values:
map(print, h.values())
Out[31]: <builtins.map at 0x1ce1290>
I'm sure there's an easy way to do this. What I'm actually trying to do is create a set() of all the Counter keys in a dictionary of Counters, doing something like this:
# counters is a dict with Counters as values
whole_set = set()
map(lambda x: whole_set.update(set(x)), counters.values())
Is there a better way to do this in Python?
In Python 3, map returns an iterator, not a list. You still have to iterate over it, either by calling list on it explicitly, or by putting it in a for loop. But you shouldn't use map this way anyway. map is really for collecting return values into an iterable or sequence. Since neither print nor set.update returns a value, using map in this case isn't idiomatic.
Your goal is to put all the keys in all the counters in counters into a single set. One way to do that is to use a nested generator expression:
s = set(key for counter in counters.values() for key in counter)
There's also the lovely dict comprehension syntax, which is available in Python 2.7 and higher (thanks Lattyware!) and can generate sets as well as dictionaries:
s = {key for counter in counters.values() for key in counter}
These are both roughly equivalent to the following:
s = set()
for counter in counters.values():
for key in counter:
s.add(key)
You want the set-union of all the values of counters? I.e.,
counters[1].union(counters[2]).union(...).union(counters[n])
? That's just functools.reduce:
import functools
s = functools.reduce(set.union, counters.values())
If counters.values() aren't already sets (e.g., if they're lists), then you should turn them into sets first. You can do it using a dict comprehension using iteritems, which is a little clunky:
>>> counters = {1:[1,2,3], 2:[4], 3:[5,6]}
>>> counters = {k:set(v) for (k,v) in counters.iteritems()}
>>> print counters
{1: set([1, 2, 3]), 2: set([4]), 3: set([5, 6])}
or of course you can do it inline, since you don't care about counters.keys():
>>> counters = {1:[1,2,3], 2:[4], 3:[5,6]}
>>> functools.reduce(set.union, [set(v) for v in counters.values()])
set([1, 2, 3, 4, 5, 6])

Categories