Selecting random values from dictionary - python

Let's say I have this dictionary:
dict = {'a': 100, 'b': 5, 'c': 150, 'd': 60};
I get the key which has greatest value with this code:
most_similar = max(dic.iteritems(), key=operator.itemgetter(1))[0]
it returns 'c'
But I want to select a random key from top 3 greatest values. According to this dictionary top 3 are:
c
a
d
It should randomly select a key from them. How can I do that?

If you want to find the top 3 keys and then get one of the keys randomly, then I would recommend using random.choice and collections.Counter, like this
>>> d = {'a': 100, 'b': 5, 'c': 150, 'd': 60}
>>> from collections import Counter
>>> from random import choice
>>> choice(Counter(d).most_common(3))[0]
'c'
Counter(d).most_common(3) will get the top three values from the dictionary based on the values of the dictionary object passed to it and then we randomly pick one of the returned values and return only the key from it.

Get the keys with the three largest values.
>>> import heapq
>>> d = {'a': 100, 'b': 5, 'c': 150, 'd': 60}
>>> largest = heapq.nlargest(3, d, key=d.__getitem__)
>>> largest
['c', 'a', 'd']
Then select one of them randomly:
>>> import random
>>> random.choice(largest)
'c'

Sort the dictionary by descending value, get the first three objects from the resulting list, then use random.choice:
>>> import random
>>> d = {'a': 100, 'b': 5, 'c': 150, 'd': 60}
>>> random.choice(sorted(d, reverse=True, key=d.get)[:3])
'c'
And don't call it dict or you'll mask the built-in.

Related

How to combine 2 dictionaries and add the values of the duplicate keys together [duplicate]

For example I have two dicts:
Dict A: {'a': 1, 'b': 2, 'c': 3}
Dict B: {'b': 3, 'c': 4, 'd': 5}
I need a pythonic way of 'combining' two dicts such that the result is:
{'a': 1, 'b': 5, 'c': 7, 'd': 5}
That is to say: if a key appears in both dicts, add their values, if it appears in only one dict, keep its value.
Use collections.Counter:
>>> from collections import Counter
>>> A = Counter({'a':1, 'b':2, 'c':3})
>>> B = Counter({'b':3, 'c':4, 'd':5})
>>> A + B
Counter({'c': 7, 'b': 5, 'd': 5, 'a': 1})
Counters are basically a subclass of dict, so you can still do everything else with them you'd normally do with that type, such as iterate over their keys and values.
A more generic solution, which works for non-numeric values as well:
a = {'a': 'foo', 'b':'bar', 'c': 'baz'}
b = {'a': 'spam', 'c':'ham', 'x': 'blah'}
r = dict(a.items() + b.items() +
[(k, a[k] + b[k]) for k in set(b) & set(a)])
or even more generic:
def combine_dicts(a, b, op=operator.add):
return dict(a.items() + b.items() +
[(k, op(a[k], b[k])) for k in set(b) & set(a)])
For example:
>>> a = {'a': 2, 'b':3, 'c':4}
>>> b = {'a': 5, 'c':6, 'x':7}
>>> import operator
>>> print combine_dicts(a, b, operator.mul)
{'a': 10, 'x': 7, 'c': 24, 'b': 3}
>>> A = {'a':1, 'b':2, 'c':3}
>>> B = {'b':3, 'c':4, 'd':5}
>>> c = {x: A.get(x, 0) + B.get(x, 0) for x in set(A).union(B)}
>>> print(c)
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
Intro:
There are the (probably) best solutions. But you have to know it and remember it and sometimes you have to hope that your Python version isn't too old or whatever the issue could be.
Then there are the most 'hacky' solutions. They are great and short but sometimes are hard to understand, to read and to remember.
There is, though, an alternative which is to to try to reinvent the wheel.
- Why reinventing the wheel?
- Generally because it's a really good way to learn (and sometimes just because the already-existing tool doesn't do exactly what you would like and/or the way you would like it) and the easiest way if you don't know or don't remember the perfect tool for your problem.
So, I propose to reinvent the wheel of the Counter class from the collections module (partially at least):
class MyDict(dict):
def __add__(self, oth):
r = self.copy()
try:
for key, val in oth.items():
if key in r:
r[key] += val # You can custom it here
else:
r[key] = val
except AttributeError: # In case oth isn't a dict
return NotImplemented # The convention when a case isn't handled
return r
a = MyDict({'a':1, 'b':2, 'c':3})
b = MyDict({'b':3, 'c':4, 'd':5})
print(a+b) # Output {'a':1, 'b': 5, 'c': 7, 'd': 5}
There would probably others way to implement that and there are already tools to do that but it's always nice to visualize how things would basically works.
Definitely summing the Counter()s is the most pythonic way to go in such cases but only if it results in a positive value. Here is an example and as you can see there is no c in result after negating the c's value in B dictionary.
In [1]: from collections import Counter
In [2]: A = Counter({'a':1, 'b':2, 'c':3})
In [3]: B = Counter({'b':3, 'c':-4, 'd':5})
In [4]: A + B
Out[4]: Counter({'d': 5, 'b': 5, 'a': 1})
That's because Counters were primarily designed to work with positive integers to represent running counts (negative count is meaningless). But to help with those use cases,python documents the minimum range and type restrictions as follows:
The Counter class itself is a dictionary
subclass with no restrictions on its keys and values. The values are
intended to be numbers representing counts, but you could store
anything in the value field.
The most_common() method requires only
that the values be orderable.
For in-place operations such as c[key]
+= 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are
supported. The same is also true for update() and subtract() which
allow negative and zero values for both inputs and outputs.
The multiset methods are designed only for use cases with positive values.
The inputs may be negative or zero, but only outputs with positive
values are created. There are no type restrictions, but the value type
needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So for getting around that problem after summing your Counter you can use Counter.update in order to get the desire output. It works like dict.update() but adds counts instead of replacing them.
In [24]: A.update(B)
In [25]: A
Out[25]: Counter({'d': 5, 'b': 5, 'a': 1, 'c': -1})
myDict = {}
for k in itertools.chain(A.keys(), B.keys()):
myDict[k] = A.get(k, 0)+B.get(k, 0)
The one with no extra imports!
Their is a pythonic standard called EAFP(Easier to Ask for Forgiveness than Permission). Below code is based on that python standard.
# The A and B dictionaries
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
# The final dictionary. Will contain the final outputs.
newdict = {}
# Make sure every key of A and B get into the final dictionary 'newdict'.
newdict.update(A)
newdict.update(B)
# Iterate through each key of A.
for i in A.keys():
# If same key exist on B, its values from A and B will add together and
# get included in the final dictionary 'newdict'.
try:
addition = A[i] + B[i]
newdict[i] = addition
# If current key does not exist in dictionary B, it will give a KeyError,
# catch it and continue looping.
except KeyError:
continue
EDIT: thanks to jerzyk for his improvement suggestions.
import itertools
import collections
dictA = {'a':1, 'b':2, 'c':3}
dictB = {'b':3, 'c':4, 'd':5}
new_dict = collections.defaultdict(int)
# use dict.items() instead of dict.iteritems() for Python3
for k, v in itertools.chain(dictA.iteritems(), dictB.iteritems()):
new_dict[k] += v
print dict(new_dict)
# OUTPUT
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
OR
Alternative you can use Counter as #Martijn has mentioned above.
For a more generic and extensible way check mergedict. It uses singledispatch and can merge values based on its types.
Example:
from mergedict import MergeDict
class SumDict(MergeDict):
#MergeDict.dispatch(int)
def merge_int(this, other):
return this + other
d2 = SumDict({'a': 1, 'b': 'one'})
d2.merge({'a':2, 'b': 'two'})
assert d2 == {'a': 3, 'b': 'two'}
From python 3.5: merging and summing
Thanks to #tokeinizer_fsj that told me in a comment that I didn't get completely the meaning of the question (I thought that add meant just adding keys that eventually where different in the two dictinaries and, instead, i meant that the common key values should be summed). So I added that loop before the merging, so that the second dictionary contains the sum of the common keys. The last dictionary will be the one whose values will last in the new dictionary that is the result of the merging of the two, so I thing the problem is solved. The solution is valid from python 3.5 and following versions.
a = {
"a": 1,
"b": 2,
"c": 3
}
b = {
"a": 2,
"b": 3,
"d": 5
}
# Python 3.5
for key in b:
if key in a:
b[key] = b[key] + a[key]
c = {**a, **b}
print(c)
>>> c
{'a': 3, 'b': 5, 'c': 3, 'd': 5}
Reusable code
a = {'a': 1, 'b': 2, 'c': 3}
b = {'b': 3, 'c': 4, 'd': 5}
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
print(mergsum(a, b))
Additionally, please note a.update( b ) is 2x faster than a + b
from collections import Counter
a = Counter({'menu': 20, 'good': 15, 'happy': 10, 'bar': 5})
b = Counter({'menu': 1, 'good': 1, 'bar': 3})
%timeit a + b;
## 100000 loops, best of 3: 8.62 µs per loop
## The slowest run took 4.04 times longer than the fastest. This could mean that an intermediate result is being cached.
%timeit a.update(b)
## 100000 loops, best of 3: 4.51 µs per loop
One line solution is to use dictionary comprehension.
C = { k: A.get(k,0) + B.get(k,0) for k in list(B.keys()) + list(A.keys()) }
def merge_with(f, xs, ys):
xs = a_copy_of(xs) # dict(xs), maybe generalizable?
for (y, v) in ys.iteritems():
xs[y] = v if y not in xs else f(xs[x], v)
merge_with((lambda x, y: x + y), A, B)
You could easily generalize this:
def merge_dicts(f, *dicts):
result = {}
for d in dicts:
for (k, v) in d.iteritems():
result[k] = v if k not in result else f(result[k], v)
Then it can take any number of dicts.
This is a simple solution for merging two dictionaries where += can be applied to the values, it has to iterate over a dictionary only once
a = {'a':1, 'b':2, 'c':3}
dicts = [{'b':3, 'c':4, 'd':5},
{'c':9, 'a':9, 'd':9}]
def merge_dicts(merged,mergedfrom):
for k,v in mergedfrom.items():
if k in merged:
merged[k] += v
else:
merged[k] = v
return merged
for dct in dicts:
a = merge_dicts(a,dct)
print (a)
#{'c': 16, 'b': 5, 'd': 14, 'a': 10}
Here's yet another option using dictionary comprehensions combined with the behavior of dict():
dict3 = dict(dict1, **{ k: v + dict1.get(k, 0) for k, v in dict2.items() })
# {'a': 4, 'b': 2, 'c': 7, 'g': 1}
From https://docs.python.org/3/library/stdtypes.html#dict:
https://docs.python.org/3/library/stdtypes.html#dict
and also
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument.
The dict comprehension
**{ k: v + dict1.get(v, 0), v in dict2.items() }
handles adding dict1[1] to v. We don't need an explicit if here because the default value for our dict1.get can be set to 0 instead.
This solution is easy to use, it is used as a normal dictionary, but you can use the sum function.
class SumDict(dict):
def __add__(self, y):
return {x: self.get(x, 0) + y.get(x, 0) for x in set(self).union(y)}
A = SumDict({'a': 1, 'c': 2})
B = SumDict({'b': 3, 'c': 4}) # Also works: B = {'b': 3, 'c': 4}
print(A + B) # OUTPUT {'a': 1, 'b': 3, 'c': 6}
The above solutions are great for the scenario where you have a small number of Counters. If you have a big list of them though, something like this is much nicer:
from collections import Counter
A = Counter({'a':1, 'b':2, 'c':3})
B = Counter({'b':3, 'c':4, 'd':5})
C = Counter({'a': 5, 'e':3})
list_of_counts = [A, B, C]
total = sum(list_of_counts, Counter())
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
The above solution is essentially summing the Counters by:
total = Counter()
for count in list_of_counts:
total += count
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
This does the same thing but I think it always helps to see what it is effectively doing underneath.
What about:
def dict_merge_and_sum( d1, d2 ):
ret = d1
ret.update({ k:v + d2[k] for k,v in d1.items() if k in d2 })
ret.update({ k:v for k,v in d2.items() if k not in d1 })
return ret
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
print( dict_merge_and_sum( A, B ) )
Output:
{'d': 5, 'a': 1, 'c': 7, 'b': 5}
More conventional way to combine two dict. Using modules and tools are good but understanding the logic behind it will help in case you don't remember the tools.
Program to combine two dictionary adding values for common keys.
def combine_dict(d1,d2):
for key,value in d1.items():
if key in d2:
d2[key] += value
else:
d2[key] = value
return d2
combine_dict({'a':1, 'b':2, 'c':3},{'b':3, 'c':4, 'd':5})
output == {'b': 5, 'c': 7, 'd': 5, 'a': 1}
Here's a very general solution. You can deal with any number of dict + keys that are only in some dict + easily use any aggregation function you want:
def aggregate_dicts(dicts, operation=sum):
"""Aggregate a sequence of dictionaries using `operation`."""
all_keys = set().union(*[el.keys() for el in dicts])
return {k: operation([dic.get(k, None) for dic in dicts]) for k in all_keys}
example:
dicts_same_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3}]
aggregate_dicts(dicts_same_keys, operation=sum)
#{'x': 3, 'y': 6}
example non-identical keys and generic aggregation:
dicts_diff_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3, 'c': 4}]
def mean_no_none(l):
l_no_none = [el for el in l if el is not None]
return sum(l_no_none) / len(l_no_none)
aggregate_dicts(dicts_diff_keys, operation=mean_no_none)
# {'x': 1.0, 'c': 4.0, 'y': 2.0}
dict1 = {'a':1, 'b':2, 'c':3}
dict2 = {'a':3, 'g':1, 'c':4}
dict3 = {} # will store new values
for x in dict1:
if x in dict2: #sum values with same key
dict3[x] = dict1[x] +dict2[x]
else: #add the values from x to dict1
dict3[x] = dict1[x]
#search for new values not in a
for x in dict2:
if x not in dict1:
dict3[x] = dict2[x]
print(dict3) # {'a': 4, 'b': 2, 'c': 7, 'g': 1}
Merging three dicts a,b,c in a single line without any other modules or libs
If we have the three dicts
a = {"a":9}
b = {"b":7}
c = {'b': 2, 'd': 90}
Merge all with a single line and return a dict object using
c = dict(a.items() + b.items() + c.items())
Returning
{'a': 9, 'b': 2, 'd': 90}

Histogram of lists enteries

I have a number of lists as follows:
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
I wish to generate a histogram based on the labels, ignoring the numbering, that is, a has 4 entries over all the lists, ba 1 entry, u has 1 entry, and so on. The labels, are file names from a specific folder, before adding the numbers, so it is a finite known list.
How can I perform such a count without a bunch of ugly loops? Can I use unique here, somehow?
You cannot acheive it without a loop. But you can instead use list comphrension to make it into a single line. Something like this.
list1 = ['a_1','a_2','b_17','c_19']
list2 = ['aa_1','a_12','b_15','d_39']
list3 = ['a_1','a_200','ba_1','u_0']
lst = [x.split('_')[0] for x in (list1 + list2 + list3)]
print({x: lst.count(x) for x in lst})
You can use a defaultdict initialized to 0 to count the occurrence and get a nice container with the required information.
So, define the container:
from collections import defaultdict
histo = defaultdict(int)
I'd like to split the operation into methods.
First get the prefix from the string, to be used as key in the dictionary:
def get_prefix(string):
return string.split('_')[0]
This works like
get_prefix('az_1')
#=> 'az'
Then a method to update de dictionary:
def count_elements(lst):
for e in lst:
histo[get_prefix(e)] += 1
Finally you can call this way:
count_elements(list1)
count_elements(list2)
count_elements(list3)
dict(histo)
#=> {'a': 5, 'b': 2, 'c': 1, 'aa': 1, 'd': 1, 'ba': 1, 'u': 1}
Or directly
count_elements(list1 + list2 + list3)
To get the unique count, call it using set:
count_elements(set(list1 + list2 + list3))
dict(histo)
{'ba': 1, 'a': 4, 'aa': 1, 'b': 2, 'u': 1, 'd': 1, 'c': 1}

Using other dictionary values to define a dictionary value during initialization

Say I have three variables which I want to store in a dictionary such that the third is the sum of the first two. Is there a way to do this in one call when the dictionary is initialized? For example:
myDict = {'a': 1, 'b': 2, 'c': myDict['a'] + myDict['b']}
Python>=3.8's named assignment allows something like the following, which I guess you could interpret as one call:
>>> md = {**(md := {'a': 2, 'b': 3}), **{'c': md['a'] + md['b']}}
>>> md
{'a': 2, 'b': 3, 'c': 5}
But this is really just a fanciful way of forcing a two-liner into a single line and making it less readable and less memory-efficient (because of the intermediate dicts). Also note that the md used on the right hand side of the = really could be any name.
You could actually be a little more efficient and get rid of one spurious auxiliary dict:
(md := {'a': 2, 'b': 3}).update({'c': md['a'] + md['b']})
You can do:
>>> myDict = {'a': 1, 'b': 2}
>>> myDict["c"] = myDict["a"] + myDict["b"]
>>> myDict
{'a': 1, 'b': 2, 'c': 3}
You can not do this in 1 line, because myDict is not even exist while assigning to c

Merge two dictionary views in python

How do I merge the views of the items of two dicts in python?
My use case is: I have two private dictionaries in a class. I want the API to treat them as one, and provide an items method as such. The only way I know of is to combine them then provide a view on the two, but for large dictionaries this seems expensive. I'm thinking of sth like a.items() + b.items()
Note: I don't care about key clashes.
This is what I'd like to improve:
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return {**self._priv1, **self._priv2}.items()
You can use ChainMap:
from collections import ChainMap
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return ChainMap(self._priv1, self._priv2)
Merging views of dictionaries means nothing because a view always reflects the content of the corresponding dictionary.
So to have a different view, either you edit one of your dictionaries or you instantiate a new one (like you did), there is no way around it. See this.
But maybe what you want is itertools.chain to iterate across multiple iterables. This solution doesn't insatiate. Or as other have said collections.ChainMap. I would use chain to iterate and ChainMap to make lookups.
You can use ChainMap:
A ChainMap groups multiple dicts or other mappings together to create a single, updateable view. If no maps are specified, a single empty dictionary is provided so that a new chain always has at least one mapping.
from collections import ChainMap
context = ChainMap(_priv1, _priv2, ...)
Example:
In [3]: _priv1 = {1: 2, 3: 4}
In [4]: _priv2 = {5: 6, 7: 8}
In [5]: from collections import ChainMap
...: context = ChainMap(_priv1, _priv2)
In [6]: context
Out[6]: ChainMap({1: 2, 3: 4}, {5: 6, 7: 8})
In [7]: _priv1.update({9: 10})
In [8]: context
Out[8]: ChainMap({1: 2, 3: 4, 9: 10}, {5: 6, 7: 8})
In [9]: context.get(9)
Out[9]: 10
For your code example, I'd use:
from collections import ChainMap
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
_union_dict = ChainMap(_priv1, _priv2)
#classmethod
def items(cls):
return cls._union_dict.items()
You can use chain to combine two or more views. As stated:
Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.
That way, the data is not copied.
from itertools import chain
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return chain(self._priv1.items(), self._priv2.items())

Update dictionary during iteration

Is it possible to iterate through a dictionary while updating a separate dictionary? I tried to create a copy of my original dictionary and edit that one but I'm still getting an error
d = {30:3, 54:5, 16:2}
r = d
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
I need to find the largest value of the dictionary each time it loops through. Then I need to remove that item from the dictionary so I can find the next largest item.
The error I get when I run this code is as follows.
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
for k,v in d.items():
RuntimeError: dictionary changed size during iteration
You are actually iterating over the same dictionary since r = d does not creates a new dictionary. r is just another reference to the same dictionary. You can check the object identities to confirm that:
>>> r = d
>>> r is d
True
Please, see this discussion for more details about object identity:
"is" operator behaves unexpectedly with integers
So, the right thing to do is first create a copy of the dictionary and then alter it:
>>> r = d.copy()
>>> r is d
False
And the iteration:
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
So, from your code we just need to change a single line:
d = {30:3, 54:5, 16:2}
r = d.copy() # changed: use copy here
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
Iterate on a copy of the dict:
d = {30:3, 54:5, 16:2}
r = d
for k,v in dict(d).items():
biggest = max(d,key = d.get)
del(r[biggest])
As others have pointed out, you cannot change the size of the dictionary while iterating thorough it. It has also been noted by #user312016 that you can iterate over a copy and modify the original.
I am not sure what the intention is, but this code will sort the items from largest value to smallest so you don't have to find the max on each iteration:
d = {30:3, 54:5, 16:2}
d_ = sorted(d.items(), key=lambda x: x[1], reverse=True)
for k, v in d_:
print(k, v)
54, 5
30, 3
16, 2
The problem is that when you use = in the line r = d, then r is not a new object. It is the same d. I mean they are refer to a single dictionary:
>>> x = {'a':1, 'b':2}
>>> y = x
>>> x
{'a': 1, 'b': 2}
>>> y
{'a': 1, 'b': 2}
>>> x is y
True
So if you change one of them, the other will change too:
>>> y['c']=3
>>> y
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
Using id() method you can check if they are referring to different places in memory or not:
>>> id(y)
44703816L
>>> id(x)
44703816L
>>>
So, you need to use copy() method instead of =:
>>> import copy
>>> z = copy.copy(x)
>>> z
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>> z is x
False
>>>
That cause changing one of them, don't change the other:
>>> z
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>> z['d']=4
>>> z
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>>

Categories