Creating nested dictionary giving range of values for each key - python

I am new to the python and dealing with dictionaries.
I am getting 10 dictionaries like this
d1 = {'a': 13, 'b':4.53, 'c':3243, 'd':0.2323}
d2 = {'a': 12, 'b':4.3, 'c':373, 'd':0.263}
…
dn = {'a': 16, 'b':9.53, 'c':76843, 'd':13}
what I want to return nested dictionary which will contain min and max for each of the values. like this
d_out = {'a': {'min': 12,'max': 16},
'b': {'min': 4.3,'max': 9.53},
'c': {'min': 373,'max': 76843},
'd': {'min': 0.2323,'max': 13},
}

Assuming the following input:
ds = [{'a': 13, 'b':4.53, 'c':3243, 'd':0.2323},
{'a': 12, 'b':4.3, 'c':373, 'd':0.263},
{'a': 16, 'b':9.53, 'c':76843, 'd':13},]
## or
# ds = [d1, d2, d3, ...]
You can use a dictionary comprehension with a bit of zip/map help:
dict(zip(ds[0],
map(lambda x: {'min': min(x), 'max': max(x)},
zip(*(d.values()
for d in ds)))))
output:
{'a': {'min': 12, 'max': 16},
'b': {'min': 4.3, 'max': 9.53},
'c': {'min': 373, 'max': 76843},
'd': {'min': 0.2323, 'max': 13}}
comparison of this approach and the nice Counter trick from #Dani. Timing on 30k elements:
# dictionary comprehension
5.09 ms ± 202 µs per loop
# Counter trick
112 ms ± 2.65 ms per loop

One approach is to use collections.Counter and take advantage of the fact that intersection computes the min, and union computes the max:
from collections import Counter
from operator import and_, or_
from functools import reduce
# toy data
dict1 = {'a': 13, 'b': 4.53, 'c': 3243, 'd': 0.2323}
dict2 = {'a': 12, 'b': 4.3, 'c': 373, 'd': 0.263}
dict3 = {'a': 16, 'b': 9.53, 'c': 76843, 'd': 13}
# conver to Counter
d = list(Counter(**di) for di in [dict1, dict2, dict3])
# find intersection i.e min
mi = reduce(and_, d)
# find union i.e max
ma = reduce(or_, d)
# build result dictionary
res = {key: {"min": value, "ma": ma[key]} for key, value in mi.items()}
print(res)
Output
{'a': {'min': 12, 'ma': 16}, 'b': {'min': 4.3, 'ma': 9.53}, 'c': {'min': 373, 'ma': 76843}, 'd': {'min': 0.2323, 'ma': 13}}

Related

Find All Permutations of a List of Dicts

So I have a list of dicts containing letters and their frequencies.
letter_freq = [
{'a': 10, 'b': 7},
{'d': 15, 'g': 8},
{'a': 12, 'q': 2}
]
I want to find all possible combinations of these dictionaries, as well as the total of their values:
perms = {
'ada': 37, 'adq': 27, 'aga': 30, 'agq': 20, 'bda': 34, 'bdq': 24, 'bga': 27, 'bgq': 17
}
I've looked at itertools.product(), but I don't see how to apply that to this specific use case. My intuition is that the easiest way to implement this is to make a recursive function, but I'm struggling to see how to add the values and the strings for the keys and make it all work.
Also, this list and the dicts can be of any length. Is there a simple way to do this that I haven't found yet? Thank you!
Solutions and Benchmark:
Yes, itertools.product works:
from itertools import product
perms = {
''.join(keys): sum(vals)
for prod in product(*map(dict.items, letter_freq))
for keys, vals in [zip(*prod)]
}
Alternatively, build the products for the keys and the values separately, so we don't have to separate them:
perms = {
''.join(keys): sum(vals)
for keys, vals in zip(product(*letter_freq),
product(*map(dict.values, letter_freq)))
}
Or fully separate their constructions (my favorite one):
keys = map(''.join, product(*letter_freq))
vals = map(sum, product(*map(dict.values, letter_freq)))
perms = dict(zip(keys, vals))
Benchmark would be interesting, I suspect my last one will be fastest of these and also faster than Samwise's.
Yet another, inspired by a glance at constantstranger's (but much faster than theirs in some initial benchmark):
items = [('', 0)]
for d in letter_freq:
items = [(k0+k, v0+v)
for k, v in d.items()
for k0, v0 in items]
perms = dict(items)
Benchmark:
With your example list of dicts:
6.6 μs perms1
4.5 μs perms2
4.1 μs perms3
4.0 μs perms4
11.0 μs perms_Samwise
12.7 μs perms_constantstranger
With a list of seven dicts with four items each:
15.5 ms perms1
7.6 ms perms2
5.5 ms perms3
4.8 ms perms4
27.2 ms perms_Samwise
42.2 ms perms_constantstranger
Code (Try it online!):
def perms1(letter_freq):
return {
''.join(keys): sum(vals)
for prod in product(*map(dict.items, letter_freq))
for keys, vals in [zip(*prod)]
}
def perms2(letter_freq):
return {
''.join(keys): sum(vals)
for keys, vals in zip(product(*letter_freq),
product(*map(dict.values, letter_freq)))
}
def perms3(letter_freq):
keys = map(''.join, product(*letter_freq))
vals = map(sum, product(*map(dict.values, letter_freq)))
return dict(zip(keys, vals))
def perms4(letter_freq):
items = [('', 0)]
for d in letter_freq:
items = [(k0+k, v0+v)
for k, v in d.items()
for k0, v0 in items]
return dict(items)
def perms_Samwise(letter_freq):
return {''.join(k for k, _ in p): sum(v for _, v in p) for p in itertools.product(*(d.items() for d in letter_freq))}
def perms_constantstranger(letter_freq):
stack = [['', 0]]
[stack.append((stack[i][0] + k, stack[i][1] + v)) for row in letter_freq if (lenStack := len(stack)) for k, v in row.items() for i in range(lenStack)]
return dict(row for row in stack if len(row[0]) == len(letter_freq))
funcs = perms1, perms2, perms3, perms4, perms_Samwise, perms_constantstranger
letter_freq = [
{'a': 10, 'b': 7, 'c': 5, 'd': 2},
{'d': 15, 'g': 8, 'j': 6, 'm': 3},
{'a': 12, 'q': 2, 'x': 1, 'z': 4},
{'a': 10, 'b': 7, 'c': 5, 'd': 2},
{'d': 15, 'g': 8, 'j': 6, 'm': 3},
{'a': 12, 'q': 2, 'x': 1, 'z': 4},
{'a': 10, 'b': 7, 'c': 5, 'd': 2},
]
from timeit import repeat
import itertools
from itertools import product
expect = funcs[0](letter_freq)
for func in funcs:
result = func(letter_freq)
assert result == expect
for _ in range(3):
for func in funcs:
t = min(repeat(lambda: func(letter_freq), number=1))
print('%5.1f ms ' % (t * 1e3), func.__name__)
print()
itertools.product is indeed what you want.
>>> letter_freq = [
... {'a': 10, 'b': 7},
... {'d': 15, 'g': 8},
... {'a': 12, 'q': 2}
... ]
>>> import itertools
>>> {''.join(k for k, _ in p): sum(v for _, v in p) for p in itertools.product(*(d.items() for d in letter_freq))}
{'ada': 37, 'adq': 27, 'aga': 30, 'agq': 20, 'bda': 34, 'bdq': 24, 'bga': 27, 'bgq': 17}
If for any reason you wanted to roll your own permutations using comprehensions instead of product() and map(), you could do it this way:
letter_freq = [
{'a': 10, 'b': 7},
{'d': 15, 'g': 8},
{'a': 12, 'q': 2}
]
stack = [['', 0]]
[stack.append((stack[i][0] + k, stack[i][1] + v)) for row in letter_freq if (lenStack := len(stack)) for k, v in row.items() for i in range(lenStack)]
perms = dict(row for row in stack if len(row[0]) == len(letter_freq))
print(perms)
Output:
{'ada': 37, 'bda': 34, 'aga': 30, 'bga': 27, 'adq': 27, 'bdq': 24, 'agq': 20, 'bgq': 17}

How do I return a new dictionary if the keys in one dictionary, match the keys in another dictionary?

Currently, I have a dictionary, with its key representing a zip code, and the values are also a dictionary.
d = { 94111: {'a': 5, 'b': 7, 'd': 7},
95413: {'a': 6, 'd': 4},
84131: {'a': 5, 'b': 15, 'c': 10, 'd': 11},
73173: {'a': 15, 'c': 10, 'd': 15},
80132: {'b': 7, 'c': 7, 'd': 7} }
And then a second dictionary, which associates which state the zip code belongs to.
states = {94111: "TX", 84131: "TX", 95413: "AL", 73173: "AL", 80132: "AL"}
If the zip code in the dictionary states matches one of the keys in db then it would sum up those values and put it into a new dictionary like the expected output.
Expected Output:
{'TX': {'a': 10, 'b': 22, 'd': 18, 'c': 10}, 'AL': {'a': 21, 'd': 26, 'c': 17, 'b': 7}}
So far this is the direction I am looking to go into but I'm not sure when both the keys match, how to create a dictionary that will look like the expected output.
def zips(d, states):
result = dict()
for key, value in db.items():
for keys, values in states.items():
if key == keys:
zips(d, states)
Using collections module
Ex:
from collections import defaultdict, Counter
d = { 94111: {'a': 5, 'b': 7, 'd': 7},
95413: {'a': 6, 'd': 4},
84131: {'a': 5, 'b': 15, 'c': 10, 'd': 11},
73173: {'a': 15, 'c': 10, 'd': 15},
80132: {'b': 7, 'c': 7, 'd': 7} }
states = {94111: "TX", 84131: "TX", 95413: "AL", 73173: "AL", 80132: "AL"}
result = defaultdict(Counter)
for k,v in d.items():
if k in states:
result[states[k]] += Counter(v)
print(result)
Output:
defaultdict(<class 'collections.Counter'>, {'AL': Counter({'d': 26, 'a': 21, 'c': 17, 'b': 7}),
'TX': Counter({'b': 22, 'd': 18, 'a': 10, 'c': 10})})
You can just use defaultdict and count in a loop:
expected_output = defaultdict(lambda: defaultdict(int))
for postcode, state in states.items():
for key, value in d.get(postcode, {}).items():
expected_output[state][key] += value
Just as a complement of the answer of Rakesh, Here is an answer closer to your code:
res = {v:{} for v in states.values()}
for k,v in states.items():
if k in d:
sub_dict = d[k]
output_dict = res[v]
for sub_k,sub_v in sub_dict.items():
output_dict[sub_k] = output_dict.get(sub_k, 0) + sub_v
You can use something like this:
d = { 94111: {'a': 5, 'b': 7, 'd': 7},
95413: {'a': 6, 'd': 4},
84131: {'a': 5, 'b': 15, 'c': 10, 'd': 11},
73173: {'a': 15, 'c': 10, 'd': 15},
80132: {'b': 7, 'c': 7, 'd': 7} }
states = {94111: "TX", 84131: "TX", 95413: "AL", 73173: "AL", 80132: "AL"}
out = {i: 0 for i in states.values()}
for key, value in d.items():
if key in states:
if not out[states[key]]:
out[states[key]] = value
else:
for k, v in value.items():
if k in out[states[key]]:
out[states[key]][k] += v
else:
out[states[key]][k] = v
# out -> {'TX': {'a': 10, 'b': 22, 'd': 18, 'c': 10}, 'AL': {'a': 21, 'd': 26, 'c': 17, 'b': 7}}
You can use the class Counter for counting objects:
from collections import Counter
d = { 94111: {'a': 5, 'b': 7, 'd': 7},
95413: {'a': 6, 'd': 4},
84131: {'a': 5, 'b': 15, 'c': 10, 'd': 11},
73173: {'a': 15, 'c': 10, 'd': 15},
80132: {'b': 7, 'c': 7, 'd': 7} }
states = {94111: "TX", 84131: "TX", 95413: "AL", 73173: "AL", 80132: "AL"}
new_d = {}
for k, v in d.items():
if k in states:
new_d.setdefault(states[k], Counter()).update(v)
print(new_d)
# {'TX': Counter({'b': 22, 'd': 18, 'a': 10, 'c': 10}), 'AL': Counter({'d': 26, 'a': 21, 'c': 17, 'b': 7})}
You can convert new_d to the dictionary of dictionaries:
for k, v in new_d.items():
new_d[k] = dict(v)
print(new_d)
# {'TX': {'a': 10, 'b': 22, 'd': 18, 'c': 10}, 'AL': {'a': 21, 'd': 26, 'c': 17, 'b': 7}}
You can leverage dict's .items() method, which returns a list of tuples, and get the expected output in a simple one-liner:
new_dict = {value:d[key] for key, value in states.items()}
Output:
{'AL': {'b': 7, 'c': 7, 'd': 7}, 'TX': {'a': 5, 'b': 15, 'c': 10, 'd': 11}}
You might want to reconsider your choice of dict for how to store your data. If you store your data using pandas, aggregation is a lot easier.
df = pd.DataFrame(d).transpose()
df['states']=pd.Series(states)
df.groupby('states').sum()
>> a b c d
>>states
>>AL 21.0 7.0 17.0 26.0
>>TX 10.0 22.0 10.0 18.0

Filter inner keys from 2 level nested dictionaries

I looking the most elegant way to get this:
{'i_1': {'a': 33, 't': 4}, 'i_2': {'a': 9, 't': 0}}
From this:
{'i_1': {'a': 33, 'b': 55, 't': 4}, 'i_2': {'a': 9, 'b': 11, 't': 0}}
Each inner dict can have a lot of a, b, ..., z keys.
for now I have this::
In [3]: {k:dict(a=d[k]['a'], t=d[k]['t']) for k in d.keys()}
Out[3]: {'i_1': {'a': 33, 't': 4}, 'i_2': {'a': 9, 't': 0}}
but it's not very elegant
You can make your code a little bit more readable by using items instead of keys:
{k: dict(a=v['a'], t=v['t']) for k, v in d.items())
Here you go. This functions takes a dict in a format you specified and a list of keys that have to be removed from inner dictionaries:
def remove_inner_keys(data: dict, inner_keys_to_remove: list) -> dict:
result = dict()
for outer_key in data.keys():
partial_result = dict()
for inner_key in data[outer_key]:
if inner_key not in inner_keys_to_remove:
partial_result[inner_key] = data[outer_key][inner_key]
result[outer_key] = partial_result
return result
Testing:
data = { 'i_1': { 'a': 33, 'b': 55, 't': 4 }, 'i_2': { 'a': 9, 'b': 11, 't': 0 } }
print(str(remove_inner_keys(data, ["b"])))
output:
{'i_2': {'a': 9, 't': 0}, 'i_1': {'a': 33, 't': 4}}
import copy
def foo(d):
d_copy = copy.deepcopy(d)
for key in d_copy:
print(key, d[key])
if isinstance(d[key], dict):
foo(d[key])
if key == 'b':
d.pop(key)

Pythonic way to group items in a list [duplicate]

This question already has an answer here:
Group list of dictionaries to list of list of dictionaries with same property value
(1 answer)
Closed 8 years ago.
Consider a list of dicts:
items = [
{'a': 1, 'b': 9, 'c': 8},
{'a': 1, 'b': 5, 'c': 4},
{'a': 2, 'b': 3, 'c': 1},
{'a': 2, 'b': 7, 'c': 9},
{'a': 3, 'b': 8, 'c': 2}
]
Is there a pythonic way to extract and group these items by their a field, such that:
result = {
1 : [{'b': 9, 'c': 8}, {'b': 5, 'c': 4}]
2 : [{'b': 3, 'c': 1}, {'b': 7, 'c': 9}]
3 : [{'b': 8, 'c': 2}]
}
References to any similar Pythonic constructs are appreciated.
Use itertools.groupby:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> {k: list(g) for k, g in groupby(items, itemgetter('a'))}
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
If item are not in sorted order then you can either sort them and then use groupby or you can use collections.OrderedDict(if order matters) or collections.defaultdict to do it in O(N) time:
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> for item in items:
... d.setdefault(item['a'], []).append(item)
...
>>> dict(d.items())
{1: [{'a': 1, 'c': 8, 'b': 9},
{'a': 1, 'c': 4, 'b': 5}],
2: [{'a': 2, 'c': 1, 'b': 3},
{'a': 2, 'c': 9, 'b': 7}],
3: [{'a': 3, 'c': 2, 'b': 8}]}
Update:
I see that you only want the those keys to be returned that we didn't use for grouping, for that you'll need to do something like this:
>>> group_keys = {'a'}
>>> {k:[{k:d[k] for k in d.viewkeys() - group_keys} for d in g]
for k, g in groupby(items, itemgetter(*group_keys))}
{1: [{'c': 8, 'b': 9},
{'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3},
{'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}
Note: This code assumes the the data is already sorted. If it is not, we have to sort it manually
from itertools import groupby
print {key:list(grp) for key, grp in groupby(items, key=lambda x:x["a"])}
Output
{1: [{'a': 1, 'b': 9, 'c': 8}, {'a': 1, 'b': 5, 'c': 4}],
2: [{'a': 2, 'b': 3, 'c': 1}, {'a': 2, 'b': 7, 'c': 9}],
3: [{'a': 3, 'b': 8, 'c': 2}]}
To get the result in the same format you asked for,
from itertools import groupby
from operator import itemgetter
a_getter, getter, keys = itemgetter("a"), itemgetter("b", "c"), ("b", "c")
def recon_dicts(items):
return dict(zip(keys, getter(items)))
{key: map(recon_dicts, grp) for key, grp in groupby(items, key=a_getter)}
Output
{1: [{'c': 8, 'b': 9}, {'c': 4, 'b': 5}],
2: [{'c': 1, 'b': 3}, {'c': 9, 'b': 7}],
3: [{'c': 2, 'b': 8}]}
If the data is not sorted already, you can either use the defaultdict method in this answer, or you can use sorted function to sort based on a, like this
{key: map(recon_dicts, grp)
for key, grp in groupby(sorted(items, key=a_getter), key=a_getter)}
References:
operator.itemgetter
itertools.groupby
zip, map, dict, sorted

Get max keys of a list of dictionaries

If I have:
dicts = [{'a': 4,'b': 7,'c': 9},
{'a': 2,'b': 1,'c': 10},
{'a': 11,'b': 3,'c': 2}]
How can I get the maximum keys only, like this:
{'a': 11,'c': 10,'b': 7}
Use collection.Counter() objects instead, or convert your dictionaries:
from collections import Counter
result = Counter()
for d in dicts:
result |= Counter(d)
or even:
from collections import Counter
from operator import or_
result = reduce(or_, map(Counter, dicts), Counter())
Counter objects support finding the maximum per key natively through the | operation; & gives you the minimum.
Demo:
>>> result = Counter()
>>> for d in dicts:
... result |= Counter(d)
...
>>> result
Counter({'a': 11, 'c': 10, 'b': 7})
or using the reduce() version:
>>> reduce(or_, map(Counter, dicts), Counter())
Counter({'a': 11, 'c': 10, 'b': 7})
>>> dicts = [{'a': 4,'b': 7,'c': 9},
... {'a': 2,'b': 1,'c': 10},
... {'a': 11,'b': 3,'c': 2}]
>>> {letter: max(d[letter] for d in dicts) for letter in dicts[0]}
{'a': 11, 'c': 10, 'b': 7}
dicts = [{'a': 4,'b': 7,'c': 9},
{'a': 2,'b': 1,'c': 10},
{'a': 11,'b': 3,'c': 2}]
def get_max(dicts):
res = {}
for d in dicts:
for k in d:
res[k] = max(res.get(k, float('-inf')), d[k])
return res
>>> get_max(dicts)
{'a': 11, 'c': 10, 'b': 7}
Something like this should work:
dicts = [{'a': 4,'b': 7,'c': 9},
{'a': 2,'b': 1,'c': 10},
{'a': 11,'b': 3,'c': 2}]
max_keys= {}
for d in dicts:
for k, v in d.items():
max_keys.setdefault(k, []).append(v)
for k in max_keys:
max_keys[k] = max(max_keys[k])

Categories