how to use a sentinel list in a comprehension? - python

I have a list
In [4]: a = [1, 2, 3, 3, 2, 4]
from which I would like to remove duplicates via a comprehension using a sentinel list (see below why):
In [8]: [x if x not in seen else seen.append(x) for x in a]
Out[8]: [1, 2, 3, 3, 2, 4]
It seems that seen is not taken into account (neither updated, not checked). Why is it so?
As for the reason why using a convoluted method: The list I have is of the form
[{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
and I want to remove duplicates based on the value of a specific key (b in the case above, to leave [{'a': 3, 'b': 4}, {'a': 5, 'b': 5}] (I do not care which dict is removed). The idea would be to build a sentinel list with the values of b and keep only the dicts without b equal to any element in that sentinel list.

Since x is not in seen, you are never adding it to seen either; the else branch is not executed when x not in seen is true.
However, you are using a conditional expression; it always produces a value; either x or the result of seen.append() (which is None), so you are not filtering, you are mapping here.
If you wanted to filter, move the test to an if section after the for loop:
seen = set()
[x for x in a if not (x in seen or seen.add(x))]
Since you were using seen.append() I presume you were using a list; I switched you to a set() instead, as membership tests are way faster using a set.
So x is excluded only if a) x in seen is true (so we have already seen it), or seen.append(x) returned a true value (None is not true). Yes, this works, if only a little convoluted.
Demo:
>>> a = [1, 2, 3, 3, 2, 4]
>>> seen = set()
>>> [x for x in a if not (x in seen or seen.add(x))]
[1, 2, 3, 4]
>>> seen
set([1, 2, 3, 4])
Applying this to your specific problem:
>>> a = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> seen = set()
>>> [entry for entry in a if not (entry['b'] in seen or seen.add(entry['b']))]
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]

You never execute the else part of the if, because you do not update when you match the first time. You could do this:
[seen.append(x) or x for x in lst if x not in seen]
This way the or returns the last value (and executes the update using append (which always returns None, to let the or continue looking for truth-y value).
Maybe you can use the fact that dict keys are a set for this. If you want to prioritize the last items use reversed (last item is prioritized here):
>>> lst = [{'a': 3, 'b': 4}, {'a': 10, 'b': 4}, {'a': 5, 'b': 5}]
>>> filtered = {item['b']: item for item in reversed(lst)}
>>> filtered.values()
[{'a': 3, 'b': 4}, {'a': 5, 'b': 5}]
This uses 'b' as the key to map a value to, so only a single elemnt can be mapped to a value of 'b', which effectively creates a set over 'b'.
note: this will return the values in random order. To fix it nicely, for big datasets, I'd create another mapping, of each object to it's index in the original list (O(n)), and use that mapping as a sorting function of the final result (O(n*log(n))). That's beyond the scope of this answer.

I'm always queasy making use of operator precedence as execution flow control. I feel that the below is marginally more explicit and palatable, although it does carry the additional cost of tuple creation.
b_values = set()
[(item, b_values.add(item['b']))[0] for item in original_list
if item['b'] not in b_values]
But really when you're maintaining/updating some sort of state, I think the best format is the simple for-loop:
output_list = []
b_values = set()
for item in original_list:
if item['b'] not in b_values:
output_list.append(item)
b_values.add(item['b'])

Related

Append elements in the value field of a dictionary using comprehensions

I have a list of elements, lets say:
y = [1, 3, 1, 5, 1]
And I would like to create a dictionary where:
Keys: are the elements in y
Values: is a list of the elements that appear before the Key in y
I attempted the following comprehension.
a={elem:y[i] for i, elem in enumerate(y[1:])}
However, since the value field in the dictionary is not a list, it only keeps the previous element in the last occurrence of the key.
In other words, for this example I get the following:
{3: 1, 1: 5, 5: 3}
Is there a way to do so using comprehensions ?
Note: I forgot to add the desired result.
{3: [1], 1: [3,5], 5: [1]}
Your keys are duplicated, so you cannot create a dictionary with them (you'll lose the first elements).
So comprehensions are difficult to use (and inefficient, as stated by other comprehension answers here) because of the accumulation effect that you need.
I suggest using collections.defaultdict(list) instead and a good old loop:
import collections
y = [1, 3, 1, 5, 1]
d = collections.defaultdict(list)
for i,x in enumerate(y[1:]):
d[x].append(y[i]) # i is the index of the previous element in y
print(d)
result:
defaultdict(<class 'list'>, {1: [3, 5], 3: [1], 5: [1]})
Use enumerate and set operations.
{value: set(y[:i]) - {value} for i, value in enumerate(y)}
Out: {1: {3, 5}, 3: {1}, 5: {1, 3}}
It's a bit ugly and inefficient because in your example it works out a new answer each time it encounters 1, but it works out right because the final time it does this is the final time it encounters 1.
Just for the fun of it. Here's a comprehension.
a = {y[i]: [y[x-1] for x in range(len(y)) if y[x]==y[i]] for i in range(1, len(y))}
>> {3: [1], 1: [3,5], 5: [1]}
Just note that it's too long and inefficient to be allowed in any practical program.
Using the defaultdict as Jean-François Fabre suggested in his answer below should be the proper way.

Reduction of a union of dictionary values produces unexpected results

I would like to make a union of all dictionary values, which in this case are sets. I only get the expected result if there are exactly two dictionaries in the input list.
Two dictionaries in the input list produces the expected result:
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}])
set([1, 2, 3, 4])
Three dictionaries in the input list produces a TypeError.
Expected result: set([1, 2, 3, 4, 5, 6])
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
Traceback (most recent call last):
File "<input>", line 1, in <module>
reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
File "<input>", line 1, in <lambda>
reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
TypeError: 'set' object has no attribute '__getitem__'
One dictionary in the input list produces a dictionary instead of a set.
Expected result: set([1, 2])
>>> reduce((lambda x, y: x['a'] | y['a']), [{'a': {1, 2}}])
{'a': set([1, 2])}
An empty input list also produces a different TypeError.
Expected result: set([])
>>> reduce((lambda x, y: x['a'] | y['a']), [])
Traceback (most recent call last):
File "<input>", line 1, in <module>
reduce((lambda x, y: x['a'] | y['a']), [])
TypeError: reduce() of empty sequence with no initial value
I need help with understanding what I'm doing wrong and why these results are produced.
TLDR:
The reduce(function, iterable) call recursively applies function to elements of the iterable and the previous results. That means the return type of function must be a valid input type!
In your case, function expects dicts but produces a set. Since it is not possible to call x['y'] on a set, a TypeError is raised.
When iterable has only two elements, function is applied only once and only to these elements. The problem of the return type of function not being a valid input type is thus never encountered.
You must first map from dict to set, then reduce the sets.
reduce(lambda x, y: x | y, map(lambda x: x['a'], [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]))
# merge via reduce ^ convert via map ^
Why reduce fails in some cases
Calling reduce(function, iterable) performs the equivalent of this code:
def reduce(function, iterable, start=None):
result = next(iterable) if start is None else start # 1.
for element in iterable:
result = function(result, element) # 2.
return result
This leads to several cases:
iterable has one element and start is not set
result is the first element of iterable (1.)
function is never called; its return and input types are inconsequential
iterable has two elements and start is not set
result is the first element of iterable(1.)
function is called on the first element and next element(2.)
function never receives its own result; its return type is meaningless
iterable has more than two elements and start is not set
result is the first element of iterable (1.)
function is called on the first element and next element (2.)
function is called on the previous result and next element (2.)
function receives its own result; its return type and input type must match
iterable is empty or not empty and start is set
same as above if start were the first element of iterable
iterable is empty and start is not set
result cannot be set and a TypeError is raised (1.)
In your case, that is:
Two dictionaries is 2. and works as expected.
Three dictionaries is 3. and chokes on the incompatible input and return type.
An empty input list is 5. and fails on the missing input - as expected.
How to do it instead
map/reduce
Your reduce is actually doing two things at once: it converts/extracts each element individually, then merges both the results. That is a classical map/reduce task: one for each element, one for all elements.
You can directly split this up into two separate operations with the map and reduce builtins:
sets = map(lambda x: x['a'], [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
result = reduce(lambda x, y: x | y, sets)
Of course, you can also nest the two expressions directly.
comprehension/reduce
The map portion can be expressed using a comprehension.
sets = (x['a'] for x in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])
result = reduce(lambda x, y: x | y, sets)
comprehension/assignment
In Python3.8, you can use an assignment expression in place of the reduce as well.
result = set()
result = [(result := (result | x['a'])) for x in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]]
Use a for loop
Just, you know, write it out.
result = set()
for element in [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]:
result |= element['a']
The output of the function passed to reduce must be of the same type as the items in the iterator, so that it can keep aggregating the item values with the same function.
In your case, the output of lambda x, y: x['a'] | y['a'] is a set {1, 2, 3, 4}, so when reduce tries to aggregate the third item {'a': {5, 6}} with {1, 2, 3, 4}, it fails because the lambda function treats both x and y as dicts and tries to get items of each by key 'a', which a set does not have.
As for the TypeError: reduce() of empty sequence with no initial value exception, you just need to provide reduce with an initial value as the third argument, which in your case should be an empty set {}, but you just need to first drop the idea of passing to it a list of dicts and instead pass to it a list of sets.
reduce works iteratively, it will apply a reducing aggregation across items of a sequence. For example, given elements i, j and k, together with function foo, it will process foo(foo(i, j), k).
In your example, foo(i, j) works fine, giving a set, but the outer call fails because the result, being a set, does not have the key 'a'. The syntax [] in the background calls __getitem__, which is why you see an error relating to this method.
What can you do about it?
A trivial hack is to have your function output a dictionary, and then access it's only value directly. This ensures that your function always outputs a dictionary with key 'a'.
reduce((lambda x, y: {'a': x['a'] | y['a']}),
[{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}])['a']
# {1, 2, 3, 4, 5, 6}
More readable, you can define a named function:
def foo(x, y):
return {'a': x['a'] | y['a']}
L = [{'a': {1, 2}}, {'a': {3, 4}}, {'a': {5, 6}}]
reduce(foo, L)['a']

Merging dictionaries

I need to append the values from one dictionary (N) to another (M) - pseudocode below
if[x] in M:
M[x]=M[x]+N[x]
else:
M[x]=N[x]
Doing this for every key in N seems quite untidy coding.
What would be the most efficient way to achieve this?
Of course you should be iterating your keys in "x" already - but a single line solution is:
M.update({key:((M[key] + value) if key in M else value) for key, value in N.items()})
not entirely sure what your x is (guessing the keys of both M and N), then this might work:
M = {key: M.get(key, 0) + N.get(key, 0) for key in set((*M, *N))}
for the example:
M = {'a': 1, 'c': 3, 'e': 5}
N = {'a': 2, 'b': 4, 'c': 6, 'd': 8}
you get:
print(M) # {'a': 3, 'e': 5, 'd': 8, 'c': 9, 'b': 4}
or please clarify what the desired output for the given example would be.
When you say "append", I assume that means that the values in your dicts are lists. However the techniques below can easily be adapted if they're simple objects like integers or strings.
Python doesn't provide any built-in methods for handling dicts of lists, and the most efficient way to do this depends on the data, because some ways work best when there are a high proportion of shared keys, other ways are better when there aren't many shared keys.
If the proportion of shared keys isn't too high, this code is reasonably efficient:
m = {1:[1, 2], 2:[3]}
n = {1:[4], 3:[5]}
for k, v in n.items():
m.setdefault(k, []).extend(v)
print(m)
output
{1: [1, 2, 4], 2: [3], 3: [5]}
You can make this slightly faster by caching the .setdefault method and the empty list:
def merge(m, n):
setdef = m.setdefault
empty = []
for k, v in n.items():
setdef(k, empty).extend(v)
If you expect a high proportion of shared keys, then it's better to perform set operations on the keys (in Python 3, dict.keys() return a set-like View object, which is extremely efficient to construct), and handle the shared keys separately from the unique keys of N.

Extending list copy in one line

When I execute the following:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
c = list(b)
c.extend(a.values())
print c
It prints out [2, 4, 6, 1, 5, 3] as I expected, but when I try to do the list copy and extension in one line:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
d = list(b).extend(a.values())
print d
It prints None. Why are these different?
Because list.extend() does not return the modified list but operate on the list itself.
I guess you may want to reuse d. If you want to create a new d to hold the result after extending and do it in one line, try:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
d = list(b) + list(a.values())
print d
Two points to note:
dictionary.value() returns a view object but not a plain list
values / keys in a default Python dictionary is randomly ordered
What you assign to d variable is the result of the list(b).extend function. This method does not return anything - it extends existing mutable object. list(b) is modified but since you did not save it anywhere the statement won't have any effect.
You don't even need to make the variable d. extend() will extend the list in place so just use this:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
b.extend(a.values())
print b
This will give you the desired output: [2, 4, 6, 5, 3, 1]
EDIT:
According to the OP he wanted b to remain unchanged. All you need to do is make a copy of b and then extend that. You can do that like this:
a = {'a': 1, 'b':3, 'c': 5}
b = [2,4,6]
c = b[:]
c.extend(a.values())
print c

Python 2.7.9 dictionary find and delete

Python 2.7.9 dictionary question:
I have a dictionary in Python that contain lists that have been appended previously, and these lists are mapped, e.g. 1=>10.2, 2=>10.33
How may I find a single value within the dictionary and delete it?
E.g. find 'a'=2 and delete 'a' and corresponding 'b' value:
myDictBefore = {'a': [1, 2, 3], 'b': [10.2, 10.33, 10.05]}
myDictAfter = {'a': [1, 3], 'b': [10.2, 10.05]}
I suspect I should find 'a' value and get the index and then
delete myDict['a'][index]
and myDict['b'][index] - though I'm unsure how to do this.
How about:
idx = myDictBefore['a'].index(2)
myDictBefore['a'].pop(idx)
myDictBefore['b'].pop(idx)
If this comes up more often, you might as well write a general function for it:
def removeRow(dct, col, val):
'''remove a "row" from a table-like dictionary containing lists,
where the value of that row in a given column is equal to some
value'''
idx = dct[col].index(val)
for key in dct:
dct[key].pop(idx)
which you could then use like this:
removeRow(myDictBefore, 'a', 2)
You could define a function that does it.
def remove(d, x):
index = d['a'].index(x) # will raise ValueError if x is not in 'a' list
del d['a'][index]
del d['b'][index]
myDict = {'a': [1, 2, 3], 'b': [10.2, 10.33, 10.05]}
remove(myDict, 2)
print(myDict) # --> {'a': [1, 3], 'b': [10.2, 10.05]}

Categories