Adding 700 dictionaries by value : too many values to unpack - python

I have a list of 2 lists, each with 700 dictionaries.
Each dictionary has a word count, and I want to combine them, such that values of same keys will be added.
I tried doing :
combine_dicts = collections.defaultdict(int)
for k, v in itertools.chain(x.iteritems() for x in tuple(dicts[0])):
combine_dicts[k] += v
dicts[0] and dicts[1] are 2 lists of dictionaries.
But it throws the following error:
ValueError: too many values to unpack.
Is there any better way of doing this?

You misused chain; you wanted chain.from_iterable to chain the iterable outputs of your generator expression, not just wrap the generator function as a no-op:
for k, v in itertools.chain.from_iterable(x.iteritems() for x in dicts[0]):
That only gets the first list of dicts though; to get both, we need MOAR CHAINING!:
# Qualifying chain over and over is a pain
from itertools import chain
for k, v in chain.from_iterable(x.iteritems() for x in chain(*dicts)):

combine_dicts = defaultdict(int)
for i in range(0,2):
for d in dicts[i]:
for k,v in d.iteritems():
combine_dicts[k] += v
This iterates each dictionary once so memory usage should be efficient.

Related

dict comprehension discards elements

from itertools import groupby
from operator import itemgetter
d = [{'k': 'v1'}]
r = ((k, v) for k, v in groupby(d, key=itemgetter('k')))
for k, v in r:
print(k, list(v)) # v1 [{'k': 'v1'}]
print('---')
r = {k: v for k, v in groupby(d, key=itemgetter('k'))}
for k, v in r.items():
print(k, list(v)) # v1 []
Seems like some quirk, or am I missing something?
This is a documented part of itertools.groupby:
The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list
In other words, you need to access the group before getting the next item in the iterator -- in this case, from inside the dict comprehension. To use it in a dict comprehension you need to make the list in the comprehension:
from itertools import groupby
from operator import itemgetter
d = [{'k': 'v1'}]
r = {k: list(v) for k, v in groupby(d, key=itemgetter('k'))}
for k, v in r.items():
print(k, v) # v1 [{'k': 'v1'}]
In your first example, because you are using a generator expression, you don't actually start iterating the groupby iterator until you start the for loop. However, you would have the same issue if you used a non-lazy list comprehension instead of a generator (i.e. r = [(k, v) for k, v in groupby(d, key=itemgetter('k'))]).
Why does it work this way?
Preserving lazy iteration is the motivating idea behind itertools. Because it is dealing with (possibly large, or infinite) iterators, it never wants to store any values in memory. It just calls next() on the underlying iterator and does something with that value. Once you've called next() you can't go back to earlier values (without storing them, which itertools doesn't want to do).
With groupby it's easier to see with an example. Here is a simple generator that makes alternating ranges of positive and negative numbers and a groupby iterator that groups them:
def make_groups():
i = 1
while True:
for n in range(1, 10):
print("yielding: ", n*i)
yield n * i
i *= -1
g = make_groups()
grouper = groupby(g, key=lambda x: x>0)
make_groups prints a line each time next() is called before yielding the value to help know what's happening. When we call next() on grouper this results in a next call to g and gets out first group and value:
> k, gr = next(grouper)
yielding: 1
Now each next() call on gr results in a next() call to the underlying g as you can see from the print:
> next(gr)
1 # already have this value from the initial next(grouper)
> next(gr)
yielding: 2 # gets the next value and clicks the underlying generator to the next yield
2
Now look what happens if we call next() on grouper to get the next group:
> next(grouper)
yielding: 3
yielding: 4
yielding: 5
yielding: 6
yielding: 7
yielding: 8
yielding: 9
yielding: -1
Groupby is iterated through the generator until it hit a value that changed the key. The values have been yielded by g. We can no longer the next value of gr (ie. 3) unless somehow we stored all those values or we somehow t-ed off the underlying g generator into two independent generators. Neither of these are good solutions for the default implementation (especially since the point of itertools is not to do this), so it leaves it up to you to do this, but you need to store these values before something causes next(grouper) to be called and advance the generator past the values you wanted.

Splitting a dictionary by key suffixes

I have a dictionary like so
d = {"key_a":1, "anotherkey_a":2, "key_b":3, "anotherkey_b":4}
So the values and key names are not important here. The key (no pun intended) thing, is that related keys share the same suffix in my example above that is _a and _b.
These suffixes are not known before hand (they are not always _a and _b for example, and there are an unknown number of different suffixes.
What I would like to do, is to extract out related keys into their own dictionaries, and have all generated dictionaries in a list.
The output from above would be
output = [{"key_a":1, "anotherkey_a":2},{"key_b":3, "anotherkey_b":4}]
My current approach is to first get all the suffixes, and then generate the sub-dicts one at a time and append to the new list
output = list()
# Generate a set of suffixes
suffixes = set([k.split("_")[-1] for k in d.keys()])
# Create the subdict and append to output
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
This works (and is not prohibitively slow or anyhting) but I am simply wondering if there is a more elegant way to do it with a list or dict comprehension? Just out of interest...
Make your output a defaultdict rather than a list, with suffixes as keys:
from collections import defaultdict
output = defaultdict(lambda: {})
for k, v in d.items():
prefix, suffix = k.rsplit('_', 1)
output[suffix][k] = v
This will split your dict in a single pass and result in something like:
output = {"a" : {"key_a":1, "anotherkey_a":2}, "b": {"key_b":3, "anotherkey_b":4}}
and if you insist on converting it to a list, you can simply use:
output = list(output.values())
You could condense the lines
output = list()
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
to a list comprehension, like this
[{k:v for k,v in d.items() if k.endswith(suffix)} for suffix in suffixes]
Whether it is more elegant is probably in the eyes of the beholder.
The approach suggested by #Błotosmętek will probably be faster though, given a large dictionary, since it results in less looping.
def sub_dictionary_by_suffix(dictionary, suffix):
sub_dictionary = {k: v for k, v in dictionary.items() if k.endswith(suffix)}
return sub_dictionary
I hope it helps

Use A Single Line To Loop Two Dictionaries To Create List

I have looked at several threads similar but unable to get a workable solution. I am looping (2) dictionaries trying to create a single list from the values of one based on the keys of another. I have it done with for loops but looking to use a single line if possible. My code with for loops
for k, v in dict1.items():
for value in dict2[k]:
temp.append(value)
On the first loop thru the temp list would be and is from above code:
[16,18,20,22,24,26]
I then use min to get the min value of the list. Now I want to condense the for loop to a one liner. I have put together
temp=[dict2.values() for k in dict1.keys() if k in dict2.keys()]
When executed, instead of temp being a single list for the k that exist in the dict1, I get a list of list for all the values from all dict2.
[[16,18,20,22,24,26], [12,16,18,20,22,24], [16,18,22,26,30,32]]
It seems to be ignoring the if statement. I know my dict1 has only 1 key in this situation and I know the 1 key exist in the dict2. Is my one liner wrong?
Input Values for dictionaries:
dict1={'table1':[16,18,20,22,24,26]}
dict2={'table1':[16,18,20,22,24,26],'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
You can iterate through one dictionary checking for matching keys and create a list of lists. Use chain.from_iterable to flatten list and call min():
from itertools import chain
dict1 = {'table1': [16,18,20,22,24,26]}
dict2 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
temp = [dict2[k] for k in dict1 if k in dict2]
print(min(chain.from_iterable(temp)))
# 16
The reason why your list comprehension does not work:
It looks like dict2 has 3 key-value pairs, and the values are [16,18,20,22,24,26], [12,16,18,20,22,24]and [16,18,22,26,30,32]. What you're doing in your list comprehension translates to
for k in dict1.keys():
if k in dict2.keys():
temp.append(dict2.values())
So if dict1has, let's say, 3 keys, this for loop will repeat 3 times. Because, as you said in a comment above, only one key is shared between dict1and dict2, the if statement only is True once, so all items of dict2.values() will be appended to temponce. What you want to do, if i got that right, is to append all items INSIDE one of the values of dict2, namely the one assigned to the one key that the two dicts share. Your idea was pretty close, you just have to add one little thing. As a one liner, it would look like this:
temp = [x for x in dict2[k] for k in dict1.keys() if k in dict2.keys()]
or, differently:
temp = [dict2[k] for k in set(dict1.keys()).intersection(set(dict2.keys()))]
You can use the operator itemgetter():
from operator import itemgetter
from itertools import chain
dict1 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24]}
dict2 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
common_keys = set(dict1).intersection(dict2)
sublists = itemgetter(*common_keys)(dict2)
if len(common_keys) == 1:
max_val = max(sublists)
else:
max_val = max(chain.from_iterable(sublists))
print(max_val)
# 26

Efficiently filtering a dictionary in-place

We have a dictionary d1 and a condition cond. We want d1 to contain only the values that satisfy the condition cond. One way to do it is:
d1 = {k:v for k,v in d1.items() if cond(v)}
But, this creates a new dictionary, which may be very memory-inefficient if d1 is large.
Another option is:
for k,v in d1.items():
if not cond(v):
d1.pop(k)
But, this modifies the dictionary while it is iterated upon, and generates an error: "RuntimeError: dictionary changed size during iteration".
What is the correct way in Python 3 to filter a dictionary in-place?
If there are not many keys the corresponding values of which satisfy the condition, then you might first aggregate the keys and then prune the dictionary:
for k in [k for k,v in d1.items() if cond(v)]:
del d1[k]
In case the list [k for k,v in d1.items() if cond(v)] would be too large, one might process the dictionary "in turns", i.e., to assemble the keys until their count does not exceed a threshold, prune the dictionary, and repeat until there are no more keys satisfying the condition:
from itertools import islice
def prune(d, cond, chunk_size = 1000):
change = True
while change:
change = False
keys = list(islice((k for k,v in d.items() if cond(v)), chunk_size))
for k in keys:
change = True
del d[k]

Returning unique elements from values in a dictionary

I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,
I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}
from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation
Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}

Categories