Remove elements from list based on a field of elements

Remove elements from list based on a field of elements - python

I have a list of tuples where each tuple has two items; first item is a dictionary and second one is a string.
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3'),
({'x4': 1, 'y4': 2}, 'str1'),
]
I want to remove duplicate data from list based on the second item of the tuple. I wrote this code, but I want to improve it:
flag = False
items = []
for index, item in enumerate(all_values):
for j in range(0, index):
if all_values[j][1] == all_values[index][1]:
flag = True
if not flag:
items.append(item)
flag = False
And get this:
items = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')
]
Any help?
BTW I tried to remove duplicate data using list(set(all_values)) but I got error unhashable type: dict.

Use another list ('strings') to collect the second string items of the tuples. Thus, you will have a clear way to check if a current list item is a duplicate.
In the code below I added one duplicate list item (with 'str2' value) for demonstration purpose.
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x5': 8, 'ab': 7}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')
]
strings = []
result = []
for value in all_values:
if not value[1] in strings:
strings.append(value[1])
result.append(value)
The new non-duplicated list will be in 'result'.

You could use following code
items = []
for item in all_values:
if next((i for i in items if i[1] == item[1]), None) is None:
items.append(item)

items = []
[items.append(item) for item in all_values if item[1] not in [x[1] for x in items]]
print items

If you're not concerned with the ordering, use a dict
formattedValues = {}
# Use with reveresed if you want the first duplicate to be kept
# Use without reveresed if you want the last duplicated
for v in reversed(allValues):
formattedValues[ v[1] ] = v
If ordering is a concern, use OrderedDict
from collections import OrderedDict
formattedValues = OrderedDict()
for v in reversed(allValues):
formattedValues[ v[1] ] = v

The Object oriented approach is not shorter - but it's more intuitive as well as readable/maintainable (IMHO).
Start by creating an object that will mimic the tuple and will provide an additional hash() and eq() functions which will be used by Set later on to check the uniqueness of the objects.
The function __repr__() is declared for debugging purpose:
class tup(object):
def __init__(self, t):
self.t = t
def __eq__(self, other):
return self.t[1] == other.t[1]
def __hash__(self):
return hash(self.t[1])
def __repr__(self):
return str(t)
# now you can declare:
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3'),
({'x3': 1, 'y3': 2}, 'str3')
]
#create your objects and put them in a list
all_vals = []
map(lambda x: all_vals.append(Tup(x)), all_values)
print all_vals # [({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')]
# and use Set for uniqueness
from sets import Set
print Set(all_vals) # Set([({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3'), ({'x3': 1, 'y3': 2}, 'str3')])
An alternative shorter version to those who feel that size matters ;)
res = []
for a in all_values:
if a[1] not in map(lambda x: x[1], res):
res.append(a)
print res

itertools Recipes has a function unique_everseen, it will return an iterator of the unique items in the passed in iterable according to the passed in key, if you want a list as a result just pass the result to list() but if theres lots of data it would be better to just iterate if you can to save memory.
from itertools import ifilterfalse
from operator import itemgetter
all_values = [
({'x1': 1, 'y1': 2}, 'str1'),
({'x2': 1, 'y2': 2}, 'str2'),
({'x5': 8, 'ab': 7}, 'str2'),
({'x3': 1, 'y3': 2}, 'str3')]
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
print list(unique_everseen(all_values, itemgetter(1)))
Output
[({'y1': 2, 'x1': 1}, 'str1'), ({'x2': 1, 'y2': 2}, 'str2'), ({'x3': 1, 'y3': 2}, 'str3')]

Related

How to sort the keys of a dictionary by the sum of the inner value of its nested dictionary elements?

So I've looked into some similar questions but I couldn't find an answer, suitable for my problem. I have a nested dictionary and I need to sort the keys of the whole dictionary by the sum of the values of the nested dictionaries, but in reversed order:
So from:
dict = {Sarah : {apple: 1, pear: 2}, John: {tomato: 5, cucumber: 5}, Dany: {carrot: 1}}
I need to get to:
dict = {Dany: {carrot:1}, Sarah: {apple: 1, pear: 2}, John: {tomato: 5, cucumber: 5}}
I figured I could maybe do this with dict(sorted(), key, reverse=True), but I am unable to formulate correctly the key because I don't understand how can I access the inner values of the nested dictionaries.
I'd be grateful for your help!

If you have sufficiently high Python (3.7+) version where insertion-order of keys is preserved you can do:
dct = {
"Sarah": {"apple": 1, "pear": 2},
"John": {"tomato": 5, "cucumber": 5},
"Dany": {"carrot": 1},
}
dct = dict(sorted(dct.items(), key=lambda k: sum(k[1].values())))
print(dct)
Prints:
{'Dany': {'carrot': 1}, 'Sarah': {'apple': 1, 'pear': 2}, 'John': {'tomato': 5, 'cucumber': 5}}
If not, you can use collections.OrderedDict:
from collections import OrderedDict
dct = OrderedDict(sorted(dct.items(), key=lambda k: sum(k[1].values())))
print(dct)

Try:
sorted(dict, key=lambda x: sum(dict[x].values()))

Splitting a nested dictionary into a list of single-key paths

So let's say I have a dictionary:
{"a": {"b": 1,
"c": 2,
"d": {"e": 3,
"f": 4,
}
}
"g": {"h": 5,
"i": 6
}
}
I've been trying to find a way to map this dictionary into a list of "paths" to each end-value, as follows:
[
{"a": {"b": 1}},
{"a": {"c": 2}},
{"a": {"d": {"e": 3}}},
{"a": {"d": {"f": 4}}},
{"g": {"h": 5}},
{"g": {"i": 6}}
]
Where each list entry is a single-key nested dictionary. I assume this can be done using some form of depth-first walk with recursion, but I'm not very familiar with programming recursive functions, and also don't know if that's even the best approach.
Any input would be appreciated.

I agree with you that a recursion is a good approach. Here is what I would do. Using a generator (i.e. yield-ing values) has the advantage that we don't have to have a variable gathering the individual result items.
The variable _revpath contains the "path" of dict keys leading to a value, but in reversed order, because at the end of the recursion we want to create a nested dict from inner to outer dict.
test = {"a": {"b": 1,
"c": 2,
"d": {"e": 3,
"f": 4,
}
},
"g": {"h": 5,
"i": 6
}
}
def walkdict(d, _revpath=[]):
if isinstance(d, dict):
for key, value in d.items():
yield from walkdict(value, [key] + _revpath)
else:
for key in _revpath:
d = {key: d}
yield d
print(list(walkdict(test)))

Use recursion to build the list of nodes you pass throught
def paths(values, parents=None):
results = []
parents = parents or []
for k, v in values.items():
if isinstance(v, int):
results.append(keys_to_nested_dict([*parents, k], v))
else:
results.extend(paths(v, [*parents, k]))
return results
Then when you reach an leaf, an int, transform that list of nodes, into a nested dict
def keys_to_nested_dict(keys, value):
result = {}
tmp = result
for k in keys[:-1]:
tmp[k] = {}
tmp = tmp[k]
tmp[keys[-1]] = value
return result
print(keys_to_nested_dict(['a', 'b', 'c'], 1)) # {'a': {'b': {'c': 1}}}
x = {"a": {"b": 1, "c": 2, "d": {"e": 3, "f": 4, }},
"g": {"h": 5, "i": 6}}
print(paths(x))
# [{'a': {'b': 1}}, {'a': {'c': 2}}, {'a': {'d': {'e': 3}}}, {'a': {'d': {'f': 4}}}, {'g': {'h': 5}}, {'g': {'i': 6}}]

Not exactly the same output, but a little tweak can you a lot of efforts & complexity
sample = {"a": {"b": 1,"c": 2,"d": {"e": 3,"f": 4}},"g": {"h": 5,"i": 6}}
from pandas.io.json._normalize import nested_to_record
flat = nested_to_record(sample, sep='_')
Output
{'a_b': 1, 'a_c': 2, 'a_d_e': 3, 'a_d_f': 4, 'g_h': 5, 'g_i': 6}
Now, whenever you wish to know the possible paths, just iterate over the keys of this dictionary & split('_') will give you the entire path & associated value

How to get unique list of values from the list of dictonaries

I have a python list of dictionaries like this
test_list = [{'id': 0, 'A':True, 'B':123},
{'id':76, 'A':True, 'B':73},
{'id':5, 'A':False, 'B':223},
{'id':5, 'A':False, 'B':223},
{'id':85, 'A':True, 'B':4},
{'id':81, 'A':False, 'B':76},
{'id':76, 'A':True, 'B':73}]
And I want to make this list unique
using simple set(test_list) give you TypeError: unhashable type: 'dict'
My below answer works fine for me, but I am looking for a better and short answer
unique_ids = list(set([x['id'] for x in test_list]))
d = {}
for item in test_list:
d[item['id']] = item
new_d = []
for x in unique_ids:
new_d.append(d[x])

Create a new dictionary with keys as ids and the values as the entire dictionary and access only the values:
new_d = list({v["id"]: v for v in test_list}.values())
>>> new_d
[{'id': 0, 'A': True, 'B': 123},
{'id': 76, 'A': True, 'B': 73},
{'id': 5, 'A': False, 'B': 223},
{'id': 85, 'A': True, 'B': 4},
{'id': 81, 'A': False, 'B': 76}]
~~

dict_values intersection and hashable types

I would like to check the intersection of two dictionaries. If I do this, I get exactly what I expected:
dict1 = {'x':1, 'y':2, 'z':3}
dict2 = {'x':1, 'y':2, 'z':4}
set(dict1.items()).intersection(dict2.items())
>> {('x', 1), ('y', 2)}
However, if the items within the dictionary are nonhashable, I get an error.
dict1 = {'x':{1,2}, 'y':{2,3}, 'z':3}
dict2 = {'x':{1,3}, 'y':{2,4}, 'z':4}
TypeError Traceback (most recent call
last)
<ipython-input-56-33fdb931ef54> in <module>
----> 1 set(dict1.items()).intersection(dict2.items())
TypeError: unhashable type: 'set'
Of course, I get the same error for tuples and lists, as they are also not hashable.
Is there a work around or an existing class I can use to check the intersection of nonhashable dictionary values?

You could create a "makeHashable" function to apply to dictionary items for comparison purposed and used it to build a set that you can then check in a list comprehension:
dict1 = {'x':{1,2}, 'y':{2,3}, 'z':3}
dict2 = {'x':{1,3}, 'y':{3,2}, 'z':4}
def makeHashable(x):
if isinstance(x,(list,tuple)): return tuple(map(makeHashable,x))
if isinstance(x,set): return makeHashable(sorted(x))
if isinstance(x,dict): return tuple(map(makeHashable,x.items()))
return x
dict1Set = set(map(makeHashable,dict1.items()))
intersect = [ kv for kv in dict2.items() if makeHashable(kv) in dict1Set]
output:
print(intersect)
# [('y', {2, 3})]

Maybe try:
#!/usr/local/cpython-3.8/bin/python3
def intersection1(dict1, dict2):
intersection = set(dict1.items()).intersection(dict2.items())
return intersection
def intersection2(dict1, dict2):
result = {}
for key1 in dict1:
if key1 in dict2 and dict1[key1] == dict2[key1]:
result[key1] = dict1[key1]
return result
def main():
dict1 = {'x': 1, 'y': 2, 'z': 3}
dict2 = {'x': 1, 'y': 2, 'z': 4}
print(intersection2(dict1, dict2))
print(intersection1(dict1, dict2))
# >> {('x', 1), ('y', 2)}
dict3 = {'x': [1, 2], 'y': [2, 3], 'z': [3, 4]}
dict4 = {'x': [1, 2], 'y': [2, 3], 'z': [4, 5]}
print(intersection2(dict3, dict4))
print(intersection1(dict3, dict4))
main()
You of course cannot put an unhashable type in a set, so I've done the next best thing with intersection2()

You can serialize the dict values before performing set intersection, and deserialize the values in the resulting set. The following example uses pickle for serialization:
import pickle
{k: pickle.loads(v) for k, v in set.intersection(
*({(k, pickle.dumps(v)) for k, v in i} for i in map(dict.items, (dict1, dict2))))}
so that given:
dict1 = {'x': {1, 2}, 'y': {2, 3}, 'z': 3}
dict2 = {'x': {2, 1}, 'y': {2, 4}, 'z': 4}
the expression would return:
{'x': {1, 2}}

Pythonic way of finding duplicate maps in a list while ignoring certain keys, and combining the duplicate maps to make a new list

I want to write a code which takes the following inputs:
list (list of maps)
request_keys (list of strings)
operation (add,substract,multiply,concat)
The code would look at the list for the maps having the same value for all keys except the keys given in request_keys. Upon finding two maps for which the value in the search keys match, the code would do the operation (add,multiple,substract,concat) on the two maps and combine them into one map. This combination map would basically replace the other two maps.
i have written the following peice of code to do this. The code only does add operation. It can be extended to make the other operations
In [83]: list
Out[83]:
[{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
In [84]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:def func(list,request_keys):
: new_list = []
: found_indexes = []
: for i in range(0,len(list)):
: new_item = list[i]
: if i in found_indexes:
: continue
: for j in range(0,len(list)):
: if i != j and {k: v for k,v in list[i].iteritems() if k not in request_keys} == {k: v for k,v in list[j].iteritems() if k not in request_keys}:
: found_indexes.append(j)
: for request_key in request_keys:
: new_item[request_key] += list[j][request_key]
: new_list.append(new_item)
: return new_list
:--
In [85]: func(list,['c'])
Out[85]: [{'a': 2, 'b': 3, 'c': 18}, {'a': 2, 'b': 4, 'c': 4}]
In [86]:
What i want to know is, is there a faster, more memory efficient, cleaner and a more pythonic way of doing the same?
Thank you

You manually generate all the combinations and then compare each of those combinations. This is pretty wasteful. Instead, I suggest grouping the dictionaries in another dictionary by their matching keys, then adding the "same" dictionaries. Also, you forgot the operator parameter.
import collections, operator, functools
def func(lst, request_keys, op=operator.add):
matching_dicts = collections.defaultdict(list)
for d in lst:
key = tuple(sorted(((k, d[k]) for k in d if k not in request_keys)))
matching_dicts[key].append(d)
for group in matching_dicts.values():
merged = dict(group[0])
merged.update({key: functools.reduce(op, (g[key] for g in group))
for key in request_keys})
yield merged
What this does: First, it creates a dictionary, mapping the key-value pairs that have to be equal for two dictionaries to match to all those dictionaries that have those key-value pairs. Then it iterates the dicts from those groups, using one of that group as a prototype and updating it with the sum (or product, or whatever, depending on the operator) of the all the dicts in that group for the required_keys.
Note that this returns a generator. If you want a list, just call it like list(func(...)), or accumulate the merged dicts in a list and return that list.

from itertools import groupby
from operator import itemgetter
def mergeDic(inputData, request_keys):
keys = inputData[0].keys()
comparedKeys = [item for item in keys if item not in request_keys]
grouper = itemgetter(*comparedKeys)
result = []
for key, grp in groupby(sorted(inputData, key = grouper), grouper):
temp_dict = dict(zip(comparedKeys, key))
for request_key in request_keys:
temp_dict[request_key] = sum(item[request_key] for item in grp)
result.append(temp_dict)
return result
inputData = [{'a': 2, 'b': 3, 'c': 10},
{'a': 2, 'b': 3, 'c': 3},
{'a': 2, 'b': 4, 'c': 4},
{'a': 2, 'b': 3, 'c': 2},
{'a': 2, 'b': 3, 'c': 3}]
from pprint import pprint
pprint(mergeDic(inputData,['c']))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove elements from list based on a field of elements - python

You could use following code items = [] for item in all_values: if next((i for i in items if i[1] == item[1]), None) is None: items.append(item)

items = [] [items.append(item) for item in all_values if item[1] not in [x[1] for x in items]] print items

Related

How to sort the keys of a dictionary by the sum of the inner value of its nested dictionary elements?

Splitting a nested dictionary into a list of single-key paths

How to get unique list of values from the list of dictonaries

dict_values intersection and hashable types

Pythonic way of finding duplicate maps in a list while ignoring certain keys, and combining the duplicate maps to make a new list

Categories

Resources