Reduction of a union of dictionary values produces unexpected results

Reduction of a union of dictionary values produces unexpected results - python

I would like to make a union of all dictionary values, which in this case are sets. I only get the expected result if there are exactly two dictionaries in the input list.
Two dictionaries in the input list produces the expected result:
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}])
set([1, 2, 3, 4])
Three dictionaries in the input list produces a TypeError.
Expected result: set([1, 2, 3, 4, 5, 6])
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
Traceback (most recent call last):
File "<input>", line 1, in <module>
reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
File "<input>", line 1, in <lambda>
reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
TypeError: 'set' object has no attribute '__getitem__'
One dictionary in the input list produces a dictionary instead of a set.
Expected result: set([1, 2])
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}])
{'a': set([1, 2])}
An empty input list also produces a different TypeError.
Expected result: set([])
>>> reduce((lambda x, y: x['a'] | y['a']), [])
Traceback (most recent call last):
File "<input>", line 1, in <module>
reduce((lambda x, y: x['a'] | y['a']), [])
TypeError: reduce() of empty sequence with no initial value
I need help with understanding what I'm doing wrong and why these results are produced.

TLDR:
The reduce(function, iterable) call recursively applies function to elements of the iterable and the previous results. That means the return type of function must be a valid input type!
In your case, function expects dicts but produces a set. Since it is not possible to call x['y'] on a set, a TypeError is raised.
When iterable has only two elements, function is applied only once and only to these elements. The problem of the return type of function not being a valid input type is thus never encountered.
You must first map from dict to set, then reduce the sets.
reduce(lambda x, y: x | y, map(lambda x: x['a'], [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]))
# merge via reduce ^ convert via map ^
Why reduce fails in some cases
Calling reduce(function, iterable) performs the equivalent of this code:
def reduce(function, iterable, start=None):
result = next(iterable) if start is None else start # 1.
for element in iterable:
result = function(result, element) # 2.
return result
This leads to several cases:
iterable has one element and start is not set
result is the first element of iterable (1.)
function is never called; its return and input types are inconsequential
iterable has two elements and start is not set
result is the first element of iterable(1.)
function is called on the first element and next element(2.)
function never receives its own result; its return type is meaningless
iterable has more than two elements and start is not set
result is the first element of iterable (1.)
function is called on the first element and next element (2.)
function is called on the previous result and next element (2.)
function receives its own result; its return type and input type must match
iterable is empty or not empty and start is set
same as above if start were the first element of iterable
iterable is empty and start is not set
result cannot be set and a TypeError is raised (1.)
In your case, that is:
Two dictionaries is 2. and works as expected.
Three dictionaries is 3. and chokes on the incompatible input and return type.
An empty input list is 5. and fails on the missing input - as expected.
How to do it instead
map/reduce
Your reduce is actually doing two things at once: it converts/extracts each element individually, then merges both the results. That is a classical map/reduce task: one for each element, one for all elements.
You can directly split this up into two separate operations with the map and reduce builtins:
sets = map(lambda x: x['a'], [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
result = reduce(lambda x, y: x | y, sets)
Of course, you can also nest the two expressions directly.
comprehension/reduce
The map portion can be expressed using a comprehension.
sets = (x['a'] for x in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
result = reduce(lambda x, y: x | y, sets)
comprehension/assignment
In Python3.8, you can use an assignment expression in place of the reduce as well.
result = set()
result = [(result := (result | x['a'])) for x in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]]
Use a for loop
Just, you know, write it out.
result = set()
for element in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]:
result |= element['a']

The output of the function passed to reduce must be of the same type as the items in the iterator, so that it can keep aggregating the item values with the same function.
In your case, the output of lambda x, y: x['a'] | y['a'] is a set {1, 2, 3, 4}, so when reduce tries to aggregate the third item {'a': {5, 6}} with {1, 2, 3, 4}, it fails because the lambda function treats both x and y as dicts and tries to get items of each by key 'a', which a set does not have.
As for the TypeError: reduce() of empty sequence with no initial value exception, you just need to provide reduce with an initial value as the third argument, which in your case should be an empty set {}, but you just need to first drop the idea of passing to it a list of dicts and instead pass to it a list of sets.

reduce works iteratively, it will apply a reducing aggregation across items of a sequence. For example, given elements i, j and k, together with function foo, it will process foo(foo(i, j), k).
In your example, foo(i, j) works fine, giving a set, but the outer call fails because the result, being a set, does not have the key 'a'. The syntax [] in the background calls __getitem__, which is why you see an error relating to this method.
What can you do about it?
A trivial hack is to have your function output a dictionary, and then access it's only value directly. This ensures that your function always outputs a dictionary with key 'a'.
reduce((lambda x, y: {'a': x['a'] | y['a']}),
[{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])['a']
# {1, 2, 3, 4, 5, 6}
More readable, you can define a named function:
def foo(x, y):
return {'a': x['a'] | y['a']}
L = [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]
reduce(foo, L)['a']

Related

Merging dictionaries

I need to append the values from one dictionary (N) to another (M) - pseudocode below
if[x] in M:
M[x]=M[x]+N[x]
else:
M[x]=N[x]
Doing this for every key in N seems quite untidy coding.
What would be the most efficient way to achieve this?

Of course you should be iterating your keys in "x" already - but a single line solution is:
M.update({key:((M[key] + value) if key in M else value) for key, value in N.items()})

not entirely sure what your x is (guessing the keys of both M and N), then this might work:
M = {key: M.get(key, 0) + N.get(key, 0) for key in set((*M, *N))}
for the example:
M = {'a': 1, 'c': 3, 'e': 5}
N = {'a': 2, 'b': 4, 'c': 6, 'd': 8}
you get:
print(M) # {'a': 3, 'e': 5, 'd': 8, 'c': 9, 'b': 4}
or please clarify what the desired output for the given example would be.

When you say "append", I assume that means that the values in your dicts are lists. However the techniques below can easily be adapted if they're simple objects like integers or strings.
Python doesn't provide any built-in methods for handling dicts of lists, and the most efficient way to do this depends on the data, because some ways work best when there are a high proportion of shared keys, other ways are better when there aren't many shared keys.
If the proportion of shared keys isn't too high, this code is reasonably efficient:
m = {1:[1, 2], 2:[3]}
n = {1:[4], 3:[5]}
for k, v in n.items():
m.setdefault(k, []).extend(v)
print(m)
output
{1: [1, 2, 4], 2: [3], 3: [5]}
You can make this slightly faster by caching the .setdefault method and the empty list:
def merge(m, n):
setdef = m.setdefault
empty = []
for k, v in n.items():
setdef(k, empty).extend(v)
If you expect a high proportion of shared keys, then it's better to perform set operations on the keys (in Python 3, dict.keys() return a set-like View object, which is extremely efficient to construct), and handle the shared keys separately from the unique keys of N.

Extending list copy in one line

When I execute the following:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
c = list(b)
c.extend(a.values())
print c
It prints out [2, 4, 6, 1, 5, 3] as I expected, but when I try to do the list copy and extension in one line:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
d = list(b).extend(a.values())
print d
It prints None. Why are these different?

Because list.extend() does not return the modified list but operate on the list itself.
I guess you may want to reuse d. If you want to create a new d to hold the result after extending and do it in one line, try:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
d = list(b) + list(a.values())
print d
Two points to note:
dictionary.value() returns a view object but not a plain list
values / keys in a default Python dictionary is randomly ordered

What you assign to d variable is the result of the list(b).extend function. This method does not return anything - it extends existing mutable object. list(b) is modified but since you did not save it anywhere the statement won't have any effect.

You don't even need to make the variable d. extend() will extend the list in place so just use this:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
b.extend(a.values())
print b
This will give you the desired output: [2, 4, 6, 5, 3, 1]
EDIT:
According to the OP he wanted b to remain unchanged. All you need to do is make a copy of b and then extend that. You can do that like this:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
c = b[:]
c.extend(a.values())
print c

Declaring multiple variables using the same function in Python

Long-hand this is how it would look like:
class TestClass(object):
def f(num):
"""In general, a complicated function."""
return num
self.a = f(1)
self.b = f(2)
self.c = f(3)
self.d = f(4)
self.e = f(5)
I'm thinking dictionary methods could help, but how?

As you said you better to use a dictionary.And as a more pythonic way you can use a dictionary comprehension.You can use enumerate to create a sequence of keys for your dictionary based on your items index. :
>>> my_dict = {'a{}'.format(i):f(j) for i,j in enumerate([3,4,5,1,2])}
{'a1': 4, 'a0': 3, 'a3': 1, 'a2': 5, 'a4': 2}
And for accessing to each value you can use a simple indexing :
>>> my_dict['a3']
1
Also if you want to use custom names for your keys you can use zip function to zip the variable names with values the use if within a dict comprehension:
>>> var_names=['a','b','c','d','e']
>>> values=[1,2,3,4,5]
>>>
>>> my_dict = {i:f(j) for i,j in zip(var_names,values)}
>>> my_dict
{'a': 1, 'c': 3, 'b': 2, 'e': 5, 'd': 4}

You're going in the wrong direction - if you want to assign several references based on the same function, you should be storing them in a data structure like a list instead of in discrete, manually-entered variables. You can unpack them like that later if you want, but you should start with a data structure. It then becomes easier to map() each value in an iterable to this function, and then turn it into a list.
def f(num):
"""In general, a complicated function."""
return num
my_numbers = list(map(f, range(1, 6)))
Your numbers were a tidy range this time so I just used a range() object, but you can use any iterable you like, such as [1, 2, 3] or (4, 2, 3).

how to use a sentinel list in a comprehension?

I have a list
In [4]: a = [1, 2, 3, 3, 2, 4]
from which I would like to remove duplicates via a comprehension using a sentinel list (see below why):
In [8]: [x if x not in seen else seen.append(x) for x in a]
Out[8]: [1, 2, 3, 3, 2, 4]
It seems that seen is not taken into account (neither updated, not checked). Why is it so?
As for the reason why using a convoluted method: The list I have is of the form
[{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
and I want to remove duplicates based on the value of a specific key (b in the case above, to leave [{'a': 3, 'b': 4}, {'a': 5, 'b': 5}] (I do not care which dict is removed). The idea would be to build a sentinel list with the values of b and keep only the dicts without b equal to any element in that sentinel list.

Since x is not in seen, you are never adding it to seen either; the else branch is not executed when x not in seen is true.
However, you are using a conditional expression; it always produces a value; either x or the result of seen.append() (which is None), so you are not filtering, you are mapping here.
If you wanted to filter, move the test to an if section after the for loop:
seen = set()
[x for x in a if not (x in seen or seen.add(x))]
Since you were using seen.append() I presume you were using a list; I switched you to a set() instead, as membership tests are way faster using a set.
So x is excluded only if a) x in seen is true (so we have already seen it), or seen.append(x) returned a true value (None is not true). Yes, this works, if only a little convoluted.
Demo:
>>> a = [1, 2, 3, 3, 2, 4]
>>> seen = set()
>>> [x for x in a if not (x in seen or seen.add(x))]
[1, 2, 3, 4]
>>> seen
set([1, 2, 3, 4])
Applying this to your specific problem:
>>> a = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> seen = set()
>>> [entry for entry in a if not (entry['b'] in seen or seen.add(entry['b']))]
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]

You never execute the else part of the if, because you do not update when you match the first time. You could do this:
[seen.append(x) or x for x in lst if x not in seen]
This way the or returns the last value (and executes the update using append (which always returns None, to let the or continue looking for truth-y value).
Maybe you can use the fact that dict keys are a set for this. If you want to prioritize the last items use reversed (last item is prioritized here):
>>> lst = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> filtered = {item['b']: item for item in reversed(lst)}
>>> filtered.values()
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
This uses 'b' as the key to map a value to, so only a single elemnt can be mapped to a value of 'b', which effectively creates a set over 'b'.
note: this will return the values in random order. To fix it nicely, for big datasets, I'd create another mapping, of each object to it's index in the original list (O(n)), and use that mapping as a sorting function of the final result (O(n*log(n))). That's beyond the scope of this answer.

I'm always queasy making use of operator precedence as execution flow control. I feel that the below is marginally more explicit and palatable, although it does carry the additional cost of tuple creation.
b_values = set()
[(item, b_values.add(item['b']))[0] for item in original_list
if item['b'] not in b_values]
But really when you're maintaining/updating some sort of state, I think the best format is the simple for-loop:
output_list = []
b_values = set()
for item in original_list:
if item['b'] not in b_values:
output_list.append(item)
b_values.add(item['b'])

Flatten a dictionary of dictionaries (2 levels deep) of lists

I'm trying to wrap my brain around this but it's not flexible enough.
In my Python script I have a dictionary of dictionaries of lists. (Actually it gets a little deeper but that level is not involved in this question.) I want to flatten all this into one long list, throwing away all the dictionary keys.
Thus I want to transform
{1: {'a': [1, 2, 3], 'b': [0]},
2: {'c': [4, 5, 1], 'd': [3, 8]}}
to
[1, 2, 3, 0, 4, 5, 1, 3, 8]
I could probably set up a map-reduce to iterate over items of the outer dictionary to build a sublist from each subdictionary and then concatenate all the sublists together.
But that seems inefficient for large data sets, because of the intermediate data structures (sublists) that will get thrown away. Is there a way to do it in one pass?
Barring that, I would be happy to accept a two-level implementation that works... my map-reduce is rusty!
Update:
For those who are interested, below is the code I ended up using.
Note that although I asked above for a list as output, what I really needed was a sorted list; i.e. the output of the flattening could be any iterable that can be sorted.
def genSessions(d):
"""Given the ipDict, return an iterator that provides all the sessions,
one by one, converted to tuples."""
for uaDict in d.itervalues():
for sessions in uaDict.itervalues():
for session in sessions:
yield tuple(session)
...
# Flatten dict of dicts of lists of sessions into a list of sessions.
# Sort that list by start time
sessionsByStartTime = sorted(genSessions(ipDict), key=operator.itemgetter(0))
# Then make another copy sorted by end time.
sessionsByEndTime = sorted(sessionsByStartTime, key=operator.itemgetter(1))
Thanks again to all who helped.
[Update: replaced nthGetter() with operator.itemgetter(), thanks to #intuited.]

I hope you realize that any order you see in a dict is accidental -- it's there only because, when shown on screen, some order has to be picked, but there's absolutely no guarantee.
Net of ordering issues among the various sublists getting catenated,
[x for d in thedict.itervalues()
for alist in d.itervalues()
for x in alist]
does what you want without any inefficiency nor intermediate lists.

edit: re-read the original question and reworked answer to assume that all non-dictionaries are lists to be flattened.
In cases where you're not sure how far down the dictionaries go, you would want to use a recursive function. #Arrieta has already posted a function that recursively builds a list of non-dictionary values.
This one is a generator that yields successive non-dictionary values in the dictionary tree:
def flatten(d):
"""Recursively flatten dictionary values in `d`.
>>> hat = {'cat': ['images/cat-in-the-hat.png'],
... 'fish': {'colours': {'red': [0xFF0000], 'blue': [0x0000FF]},
... 'numbers': {'one': [1], 'two': [2]}},
... 'food': {'eggs': {'green': [0x00FF00]},
... 'ham': ['lean', 'medium', 'fat']}}
>>> set_of_values = set(flatten(hat))
>>> sorted(set_of_values)
[1, 2, 255, 65280, 16711680, 'fat', 'images/cat-in-the-hat.png', 'lean', 'medium']
"""
try:
for v in d.itervalues():
for nested_v in flatten(v):
yield nested_v
except AttributeError:
for list_v in d:
yield list_v
The doctest passes the resulting iterator to the set function. This is likely to be what you want, since, as Mr. Martelli points out, there's no intrinsic order to the values of a dictionary, and therefore no reason to keep track of the order in which they were found.
You may want to keep track of the number of occurrences of each value; this information will be lost if you pass the iterator to set. If you want to track that, just pass the result of flatten(hat) to some other function instead of set. Under Python 2.7, that other function could be collections.Counter. For compatibility with less-evolved pythons, you can write your own function or (with some loss of efficiency) combine sorted with itertools.groupby.

A recursive function may work:
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
If you try it with :
>>> d = {1: {'a': [1, 2, 3], 'b': [0]}, 2: {'c': [4, 5, 6], 'd': [3, 8]}}
>>> out = []
>>> flat(d, out)
>>> print out
[1, 2, 3, 0, 4, 5, 6, 3, 8]
Notice that dictionaries have no order, so the list is in random order.
You can also return out (at the end of the loop) and don't call the function with a list argument.
def flat(d, out=[]):
for val in d.values():
if isinstance(val, dict):
flat(d, out)
else:
out+= val
return out
call as:
my_list = flat(d)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Reduction of a union of dictionary values produces unexpected results - python

Related

Merging dictionaries

Extending list copy in one line

Declaring multiple variables using the same function in Python

how to use a sentinel list in a comprehension?

Flatten a dictionary of dictionaries (2 levels deep) of lists

Categories

Resources