intersection of two or more lists of dicts - python

I have
l1 = [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
l2 = [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
l3 = [{"value": 1, "label": "One"}, {"value": 3, "label": "Three"}]
l4 = [{"value": 4, "label": "Four"}]
and I need something like this:
def foo(*lists):
...
that returns:
foo(l1, l2) -> [{"value": 1, "label": "One"}, {"value": 2, "label": "Two"}]
foo(l2, l3) -> [{"value": 1, "label": "One"}]
foo(l1, l2, l3) -> [{"value": 1, "label": "One"}]
foo(l1, l2, l3, l4) -> []
Edit (sorry I truncated part of the question):
The order in the output list doesn't matter.
I tried to use the sets but the dicts inside the lists are unhashable.
So I tried to transform dicts in frozendict or tuple but the keys order in input dict should not be significant:
{"value": 1, "label": "One"} == {"label": "One", "value": 1}
l5 = [{"value": 1, "label": "One"}]
l6 = [{"label": "One", "value": 1}]
foo(l5, l6) -> [{"value": 1, "label": "One"}]
Thanks so much.

You can convert the list of dicts to set of tuples of dict items so that you can use functools.reduce to perform set.intersection on all the sets, and then convert the resulting sequence of sets to a list of dicts by mapping the sequence to the dict constructor:
from functools import reduce
def intersection(*lists):
return list(map(dict, reduce(set.intersection, ({tuple(d.items()) for d in l} for l in lists))))
so that with your sample input:
print(intersection(l1, l2))
print(intersection(l2, l3))
print(intersection(l1, l2, l3))
print(intersection(l1, l2, l3, l4))
would output:
[{'value': 1, 'label': 'One'}, {'value': 2, 'label': 'Two'}]
[{'value': 1, 'label': 'One'}]
[{'value': 1, 'label': 'One'}]
[]

Related

Parsing a pandas dataframe into a nested list object

Does anyone have a neat way of packing a dataframe including some columns which indicate hierarchy into a nested array?
Say I have the following data frame:
from pandas import DataFrame
df = DataFrame(
{
"var1": [1, 2, 3, 4, 9],
"var2": [5, 6, 7, 8, 9],
"group_1": [1, 1, 1, 1, 2],
"group_2": [None, 1, 2, 1, None],
"group_3": [None, None, None, 1, None],
}
)
var1 var2 group_1 group_2 group_3
0 1 5 1 NaN NaN
1 2 6 1 1.0 NaN
2 3 7 1 2.0 NaN
3 4 8 1 1.0 1.0
4 9 9 2 NaN NaN
The group_ columns show that the records on the 2nd and 3rd rows are children of the one on the first row. The 4th row is a child of the 2nd row, and the last row of the table has no children. I am looking to derive something like the following:
[
{
"var1": 1,
"var2": 5,
"children": [
{
"var1": 2,
"var2": 6,
"children": [{"var1": 4, "var2": 8, "children": []}],
},
{"var1": 3, "var2": 7, "children": []},
],
},
{"var1": 9, "var2": 9, "children": []},
]
You could try if the following recursive .groupby over the group_n columns works for you:
def nest_it(df, level=1):
record = {"var1": None, "var2": None, "children": []}
for key, gdf in df.groupby(f"group_{level}", dropna=False):
if pd.isna(key):
record["var1"], record["var2"] = map(int, gdf.iloc[0, 0:2])
elif level == 3:
var1, var2 = map(int, gdf.iloc[0, 0:2])
record["children"].append({"var1": var1, "var2": var2, "children": []})
else:
record["children"].append(nest_it(gdf, level=level + 1))
return record
result = nest_it(df)["children"]
While going over the key, group tuples from a (nested) df.groupby("group_n") 3 things could happen:
The key is a NaN, i.e. it's time to record the vars and there aren't any more children.
The level is 3, i.e. the end of the dataframe is reached, so it's also time to wrap up, but this time as child.
Otherwise (recursion): Put the recursively retrieved children in the resp. list.
Remark: I've only initialized the record dicts front up to get the item order as in your expected output.
Result for the sample:
[{'var1': 1,
'var2': 5,
'children': [{'var1': 2,
'var2': 6,
'children': [{'var1': 4, 'var2': 8, 'children': []}]},
{'var1': 3, 'var2': 7, 'children': []}]},
{'var1': 9, 'var2': 9, 'children': []}]

list of dicts - change key values from one to many to many to one

with a list of dict, say list1 like below
[
{'subId': 0, 'mainIds': [0]},
{'subId': 3, 'mainIds': [0, 3, 4, 5], 'parameter': 'off', 'Info': 'true'}
]
Need to convert to below format.
[
{'mainId': 0, 'subIds':[0,3]},
{'mainId': 3, 'subIds': [3] },
{'mainId': 4, 'subIds': [3] },
{'mainId': 5, 'subIds': [3]}
]
What is tried so far
finalRes = []
for i in list1:
subId = i['subId']
for j in i['mainIds']:
res = {}
res[mainId] = j
res['subIds'] = []
res['subIds'].append(subId)
finalRes.append(res)
This gives something closer to the required output. Need help with getting the output mentioned above. Is there any popular name for this kind of operation (something like one to many to many to one ?)
[
{'mainId': 0, 'subIds':[0]},
{'mainId': 0, 'subIds':[3]}
{'mainId': 3, 'subIds': [3]},
{'mainId': 4, 'subIds': [3]},
{'mainId': 5, 'subIds': [3]}
]
This kinds of joins can be implemented easily with defaultdict:
subs_by_main_id = defaultdict(list)
for entry in list1:
sub_id = entry['subId']
for main_id in entry['mainIds']:
subs_by_main_id[main_id].append(sub_id)
return [{'mainId': main_id, 'subIds': sub_ids}
for main_id, sub_ids in sub_by_main_id.items()]
Here's a solution using comprehensions and itertools.chain. Start by converting the lists to sets, for fast membership tests; then build the result directly. It is not as efficient as the defaultdict solution.
from itertools import chain
sets = { d['subId']: set(d['mainIds']) for d in data }
result = [
{'mainId': i, 'subIds': [ j for j, v in sets.items() if i in v ]}
for i in set(chain.from_iterable(sets.values()))
]

merge two lists of dictionaries without ids in Python

I have two lists of dicts like this :
list1 =[{doc:1,pos_ini:5,pos_fin:10},{doc:1,pos_ini:7,pos_fin:12},{doc:2,pos_ini:5,pos_fin:10},**{doc:7,pos_ini:5,pos_fin:10}**]
list2 =
[{doc:1,pos_ini:5,pos_fin:10},**{doc:1,pos_ini:6,pos_fin:7}**,{doc:1,pos_ini:7,pos_fin:12},{doc:2,pos_ini:5,pos_fin:10},**{doc:2,pos_ini:25,pos_fin:30}**]
list2 has two elements that list1 does not have and list1 has one element that list2 does not have.
I need a list_result with all the elements merged :
list_result =[{doc:1,pos_ini:5,pos_fin:10},**{doc:1,pos_ini:6,pos_fin:7}**,{doc:1,pos_ini:7,pos_fin:12},{doc:2,pos_ini:5,pos_fin:10},
**{doc:2,pos_ini:25,pos_fin:30}**,**{doc:7,pos_ini:5,pos_fin:10}**]
Whats the best way to do that in Python ? thanks!
In Python there is the builtin set collection that is perfect for this. The problem is that sets need hashable elements so you must convert the dict to a set of tuples:
[dict(items) for items in set(tuple(sorted(d.items())) for d in (list1 + list2))]
You can use frozenset() to hash each dictionaries items() to a dictionary, then simply take the assigned values:
list({frozenset(x.items()): x for x in list1 + list2}.values())
Or using map() applied to a set comprehension:
list(map(dict, {frozenset(x.items()) for x in list1 + list2}))
Or even using just a list comprehension:
[dict(d) for d in {frozenset(x.items()) for x in list1 + list2}]
Which will give an unordered result:
[{'doc': 1, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 1, 'pos_fin': 12, 'pos_ini': 7},
{'doc': 2, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 7, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 1, 'pos_fin': 7, 'pos_ini': 6},
{'doc': 2, 'pos_fin': 30, 'pos_ini': 25}]
Note: If order is needed, you can use a collections.OrderedDict() instead here:
from collections import OrderedDict
list(OrderedDict((frozenset(x.items()), x) for x in list1 + list2).values())
Which gives this ordered result:
[{'doc': 1, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 1, 'pos_fin': 12, 'pos_ini': 7},
{'doc': 2, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 7, 'pos_fin': 10, 'pos_ini': 5},
{'doc': 1, 'pos_fin': 7, 'pos_ini': 6},
{'doc': 2, 'pos_fin': 30, 'pos_ini': 25}]
You can create a set out of these values, instead of the dictionaries it would require to be converted into a hashable object like tuple:
unique_list = set(tuple(dictionary.items())) for dictionary in list1 + list2)
and then can be converted back to dictionaries and list format again:
l = []
for item in unique_list:
l.append(dict(item))
Something like above should work.

Get a list of values from a list of dictionaries?

I have a list of dictionaries, and I need to get a list of the values from a given key from the dictionary (all the dictionaries have those same key).
For example, I have:
l = [ { "key": 1, "Val1": 'val1 from element 1', "Val2": 'val2 from element 1' },
{ "key": 2, "Val1": 'val1 from element 2', "Val2": 'val2 from element 2' },
{ "key": 3, "Val1": 'val1 from element 3', "Val2": 'val2 from element 3' } ]
I need to get 1, 2, 3.
Of course, I can get it with:
v=[]
for i in l:
v.append(i['key'])
But I would like to get a nicer way to do so.
Using a simple list comprehension (if you're sure every dictionary has the key):
In [10]: [d['key'] for d in l]
Out[10]: [1, 2, 3]
Otherwise you'll need to check for existence first:
In [11]: [d['key'] for d in l if 'key' in d]
Out[11]: [1, 2, 3]

Traverse a dictionary recursively in Python?

What is the better way to traverse a dictionary recursively?
Can I do it with lambda or/and list comprehension?
I have:
[
{
"id": 1,
"children": [
{
"id": 2,
"children": []
}
]
},
{
"id": 3,
"children": []
},
{
"id": 4,
"children": [
{
"id": 5,
"children": [
{
"id": 6,
"children": [
{
"id": 7,
"children": []
}
]
}
]
}
]
}
]
I want:
[1,2,3,4,5,6,7]
You can recursively traverse your dictionaries, with this generic generator function, like this
def rec(current_object):
if isinstance(current_object, dict):
yield current_object["id"]
for item in rec(current_object["children"]):
yield item
elif isinstance(current_object, list):
for items in current_object:
for item in rec(items):
yield item
print list(rec(data))
# [1, 2, 3, 4, 5, 6, 7]
The easiest way to do this will be with a recursive function:
recursive_function = lambda x: [x['id']] + [item for child in x['children'] for item in recursive_function(child)]
result = [item for topnode in whatever_your_list_is_called for item in recursive_function(topnode)]
My solution:
results = []
def function(lst):
for item in lst:
results.append(item.get('id'))
function(item.get('children'))
function(l)
print results
[1, 2, 3, 4, 5, 6, 7]
The dicter library can be useful. You can easily flatten or traverse the dictionary paths.
pip install dicter
import dicter as dt
# Example dict:
d = {'level_a': 1, 'level_b': {'a': 'hello world'}, 'level_c': 3, 'level_d': {'a': 1, 'b': 2, 'c': {'e': 10}}, 'level_e': 2}
# Walk through dict to get all paths
paths = dt.path(d)
print(paths)
# [[['level_a'], 1],
# [['level_c'], 3],
# [['level_e'], 2],
# [['level_b', 'a'], 'hello world'],
# [['level_d', 'a'], 1],
# [['level_d', 'b'], 2],
# [['level_d', 'c', 'e'], 10]]
The first column is the key path. The 2nd column are the values. In your case, you can take in the 1st column all last elements.

Categories