Merging two list of dicts with different keys effectively

Merging two list of dicts with different keys effectively - python

I've got two lists:
lst1 = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18},
{"name":"Dom", "age": 15},
{"name":"Tom", "age": 5}]
and the second one contains a few of above key name values under different key:
lst2 = [{"username": "Kris", "Town": "Big City"},
{"username":"Dom", "Town": "NYC"}]
I would like to merge them with result:
lst = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18, "Town": "Big City"},
{"name":"Dom", "age": 15, "Town": "NYC"},
{"name":"Tom", "age":"5"}]
The easiest way is to go one by one (for each element from lst1, check whether it exists in lst2), but for big lists, this is quite ineffective (my lists have a few hundred elements each). What is the most effective way to achieve this?

To avoid iterating over another list again and again, you can build a name index first.
lst1 = [{"name": "Hanna", "age":3},
{"name": "Kris", "age": 18},
{"name":"Dom", "age": 15},
{"name":"Tom", "age": 5}]
lst2 = [{"username": "Kris", "Town": "Big City"},
{"username":"Dom", "Town": "NYC"}]
name_index = { dic['username'] : idx for idx, dic in enumerate(lst2) if dic.get('username') }
for dic in lst1:
name = dic.get('name')
if name in name_index:
dic.update(lst2[name_index[name]]) # update in-place to further save time
dic.pop('username')
print(lst1)

One way to do this a lot more efficient than by lists is to create an intermediate dictionary from lst1 with name as key, so that you're searching a dictionary not a list.
d1 = {elem['name']: {k:v for k,v in elem.items()} for elem in lst1}
for elem in lst2:
d1[elem['username']].update( {k:v for k,v in elem.items() if k != 'username'} )
lst = list(d1.values())
Output:
[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]
edited to only have one intermediate dict

Use zip function to pair both lists. We need to order both lists using some criteria, in this case, you must use the username and name keys for the lists because those values will be your condition to perform the updating action, for the above reason is used the sorted function with key param. It is important to sort them out to get the match.
Finally your list lst2 has a little extra procedure, I expanded it taking into account the length of lst1, that is what I do using lst2 * abs(len(lst1) - len(lst2). Theoretically, you are iterating once over an iterable zip object, therefore I consider this could be a good solution for your requirements.
for d1, d2 in zip(sorted(lst1, key=lambda d1: d1['name']),
sorted(lst2 * abs(len(lst1) - len(lst2)), key=lambda d2: d2['username'])):
if d1['name'] == d2['username']:
d1.update(d2)
# Just we delete the username
del d1['username']
print(lst1)
Output:
[{'name': 'Hanna', 'age': 3}, {'name': 'Kris', 'age': 18, 'Town': 'Big City'}, {'name': 'Dom', 'age': 15, 'Town': 'NYC'}, {'name': 'Tom', 'age': 5}]

Related

How to ignore a single/multiple keys of all the dictionaries while looping over a list of dictionaries?

I am looping over a list of dictionaries and I have to drop/ignore either one or more keys of the each dictionary in the list and write it to a MongoDB. What is the efficient pythonic way of doing this ?
Example:
employees = [
{'name': "Tom", 'age': 10, 'salary': 10000, 'floor': 10},
{'name': "Mark", 'age': 5, 'salary': 12000, 'floor': 11},
{'name': "Pam", 'age': 7, 'salary': 9500, 'floor': 9}
]
Let's say I want to drop key = 'floor' or keys = ['floor', 'salary'].
Currently I am using del employees['floor'] inside the loop to delete the key and my_collection.insert_one() to simply write the dictionary into my MongoDB.
My code:
for d in employees:
del d['floor']
my_collection.insert_one(d)

The solution you proposed is the most efficient to use since you have no control on what happens inside the method insert_one.
If you have more keys, just loop over them:
ignored_keys = ['floor', 'salary']
for d in employees:
for k in ignored_keys:
del d[k]
my_collection.insert_one(d)

Let's say you want to drop keys = ['floor', 'salary']. You can try:
exclude_keys = ['salary', 'floor']
for d in employees:
my_collection.insert_one({k: d[k] for k in set(list(d.keys())) - set(exclude_keys)})

Python creating dictionary from list and tuple

When I iterate over a dictionary like so:
dict2={
'Joe':('Caucasian','Male', 35, 7.5),
'Kevin':('Black','Male', 55, 9.5),
More tuples here like the one above
}
The data is bigger but it doesn't matter here.
What I am trying to accomplish is to create a new dictionary with the information from the tuples. Like so:
dict_i_want = {
"Name": Joe,
"Ethiniticy": "Caucasian",
"Gender":"Male",
"Voter_age": 35,
"Score": 7.5
}
Here is my code:
dict_i_want = {}
for k,v in dict2.items():
dict_i_want["Name"] = k
dict_i_want['Ethiniticy'] = v[0]
dict_i_want['Gender'] = v[1]
dict_i_want['Voter_age'] = v[2]
dict_i_want['Score'] = v[3]
But when I do
print(dict_i_want)
{'Name': 'Kevin', 'Ethiniticy': 'Black', 'Gender': 'Male', 'Voter_age': 55, 'Score': 9.5}
The result is just the last tuple that I have in mydict2. No all the tuples.
What I am doing wrong if I have the loop?
PS: I don't want to use any modules or import anything here. No built-in function like zip() or something like that. I want to hard code the solution

#ForceBru answered your question - your best bet is a list of dictionaries unless you want to create a dictionary of dictionaries with unique keys for each sub-dictionary. Going with the list approach you could do something like this:
Example:
from pprint import pprint
dict2 = {
'Joe': ('Caucasian', 'Male', 35, 7.5),
'Kevin': ('Black', 'Male', 55, 9.5),
}
dicts_i_want = [
{"name": name, "ethnicity": ethnicity, "gender": gender, "voter_age": voter_age, "score": score}
for name, (ethnicity, gender, voter_age, score) in dict2.items()
]
pprint(dicts_i_want)
Output:
[{'ethnicity': 'Caucasian',
'gender': 'Male',
'name': 'Joe',
'score': 7.5,
'voter_age': 35},
{'ethnicity': 'Black',
'gender': 'Male',
'name': 'Kevin',
'score': 9.5,
'voter_age': 55}]

Dict keys has to be unique. You're just overwriting your dict each cycle in your loop. It's just how dicts work.

How to eliminate second item of a nested dictionary and skip the empty nested lists?

I am having issues eliminating the second item of the lists nested inside a list of dictionaries. I think it may be because there are a couple of empty lists, so the indexing does not work. How can I delete the second item of each nested list pair but also skip the empty lists?
In the end, the nested list should be flattened as it does not have a second pair anymore.
The list looks something like this:
list_dict = [{"name": "Ken", "bla": [["abc", "ABC"],["def", "DEF"]]},
{"name": "Bob", "bla": []}, #skip the empty list
{"name": "Cher", "bla":[["abc", "ABC"]]}]
Desired output:
wanted = [{"name": "Ken", "bla": ["abc", "def"]},
{"name": "Bob", "bla": []},
{"name": "Cher", "bla":["abc"]}]
My code:
for d in list_dict:
for l in list(d["bla"]):
if l is None:
continue #use continue to ignore the empty lists
d["bla"].remove(l[1]) #remove second item of every nested list pair (gives error).

Yu can use [:1] to get only first item from the list (works with zero-lenght lists too):
list_dict = [{"name": "Ken", "bla": [["abc", "ABC"],["def", "DEF"]]},
{"name": "Bob", "bla": []}, #skip the empty list
{"name": "Cher", "bla":[["abc", "ABC"]]}]
for i in list_dict:
i['bla'] = [ll for l in [l[:1] for l in i['bla']] for ll in l]
print(list_dict)
Prints:
[{'name': 'Ken', 'bla': ['abc', 'def']},
{'name': 'Bob', 'bla': []},
{'name': 'Cher', 'bla': ['abc']}]

You can flatten lists using itertools.chain.from_iterable.
Example:
In [24]: l = [["abc", "ABC"],["def", "DEF"]]
In [25]: list(itertools.chain.from_iterable(l))
Out[25]: ['abc', 'ABC', 'def', 'DEF']
After you flatten your list you can slice it to get every second element:
In [26]: flattened = list(itertools.chain.from_iterable(l))
In [27]: flattened[::2]
Out[27]: ['abc', 'def']

You can use chain.from_iterable like this example:
from itertools import chain
from collections.abc import Iterable
list_dict = [{'name': 'Ken', 'bla': [['abc', 'ABC'], ['def', 'DEF']]},
{'name': 'Bob', 'bla': []},
{'name': 'Cher', 'bla': [['abc', 'ABC']]}]
out = []
for k in list_dict:
tmp = {}
for key, value in k.items():
# Check if the value is iterable
if not isinstance(value, Iterable):
tmp[key] = value
else:
val = list(chain.from_iterable(value))[:1]
tmp[key] = val
out.append(tmp)
print(out)
[{'name': 'Ken', 'bla': ['ABC', 'def', 'DEF']},
{'name': 'Bob', 'bla': []},
{'name': 'Cher', 'bla': ['ABC']}]

This should work, even if a given l is empty:
for d in list_dict:
d["bla"][:] = (v[0] for v in d["bla"])
For improved efficiency, you can extract d["bla"] once per entry:
for d in list_dict:
l = d["bla"]
l[:] = (v[0] for v in l)
This changes list_dict to:
[{'name': 'Ken', 'bla': ['abc', 'def']},
{'name': 'Bob', 'bla': []},
{'name': 'Cher', 'bla': ['abc']}]
Note that this solution does not create any new lists. It simply modifies the existing lists. This solution is more concise and more efficient than any of the other posted solutions.

Given:
list_dict = [{"name": "Ken", "bla": [["abc", "ABC"],["def", "DEF"]]},
{"name": "Bob", "bla": []}, #skip the empty list
{"name": "Cher", "bla":[["abc", "ABC"]]}]
desired_dict = [{"name": "Ken", "bla": ["abc", "def"]},
{"name": "Bob", "bla": []},
{"name": "Cher", "bla":["abc"]}]
You can simply recreate d['bra'] to be what you want. The empty lists are skipped since there is nothing to iterate and the existing entry is unchanged:
for d in list_dict:
d['bla']=[sl[0] for sl in d['bla']]
>>> list_dict==desired_dict
True

compare and complete lists with each other

I have here a very tricky task here.I want to compare x number of lists in list of lists and that lists contain dictionaries.So i want to compare the dictionaries in these lists based on the 'name' key in the dictionaries if it match it should pass if not it should copy the whole dictionary to the lists that don't have it with editing the 'balance' key vlaue to '0'.
For example let's assume we have list of lists like this :
list_of_lists=[[{'name': u'Profit','balance': 10},{'name': u'Income','balance': 30},{'name': u'NotIncome','balance': 15}],[{'name': u'Profit','balance': 20},{'name': u'Income','balance': 10}]]
So the result should be :
list_of_lists=[[{'name': u'Profit','balance': 10},{'name': u'Income','balance': 30},{'name': u'NotIncome','balance': 15}],[{'name': u'Profit','balance': 20},{'name': u'Income','balance': 10},{'name': u'NotIncome','balance': 0}]]
Here is my code but i can't get it work with 2 lists or more(I don't know the number of lists in the list (maybe 2,3 or 4 etc...) :
for line in lines:
for d1, d2 in zip(line[0], line[1]):
for key, value in d1.items():
if value != d2[key]:
print key, value, d2[key]

You could first create a set containing all the names and then iterate the sublists one by one adding the missing dicts:
import pprint
l = [
[
{'name': u'Profit','balance': 10},
{'name': u'Income','balance': 30},
{'name': u'NotIncome','balance': 15}
],
[
{'name': u'Profit','balance': 20},
{'name': u'Income','balance': 10}
],
[]
]
all_names = {d['name'] for x in l for d in x}
for sub_list in l:
for name in (all_names - {d['name'] for d in sub_list}):
sub_list.append({'name': name, 'balance': 0})
pprint.pprint(l)
Output:
[[{'balance': 10, 'name': u'Profit'},
{'balance': 30, 'name': u'Income'},
{'balance': 15, 'name': u'NotIncome'}],
[{'balance': 20, 'name': u'Profit'},
{'balance': 10, 'name': u'Income'},
{'balance': 0, 'name': u'NotIncome'}],
[{'balance': 0, 'name': u'Profit'},
{'balance': 0, 'name': u'Income'},
{'balance': 0, 'name': u'NotIncome'}]]
That said you should consider converting sublists to dicts where keys are names and values are balances in order to ease the processing.

generate list from values of certain field in list of objects

How would I generate a list of values of a certain field of objects in a list?
Given the list of objects:
[ {name: "Joe", group: 1}, {name: "Kirk", group: 2}, {name: "Bob", group: 1}]
I want to generate list of the name field values:
["Joe", "Kirk", "Bob"]
The built-in filter() function seems to come close, but it will return the entire objects themselves.
I'd like a clean, one line solution such as:
filterLikeFunc(function(obj){return obj.name}, mylist)
Sorry, I know that's c syntax.

Just replace filter built-in function with map built-in function.
And use get function which will not give you key error in the absence of that particular key to get value for name key.
data = [{'name': "Joe", 'group': 1}, {'name': "Kirk", 'group': 2}, {'name': "Bob", 'group': 1}]
print map(lambda x: x.get('name'), data)
In Python 3.x
print(list(map(lambda x: x.get('name'), data)))
Results:
['Joe', 'Kirk', 'Bob']
Using List Comprehension:
print [each.get('name') for each in data]

Using a list comprehension approach you get:
objects = [{'group': 1, 'name': 'Joe'}, {'group': 2, 'name': 'Kirk'}, {'group': 1, 'name': 'Bob'}]
names = [i["name"] for i in objects]
For a good intro to list comprehensions, see https://docs.python.org/2/tutorial/datastructures.html

Just iterate over your list of dicts and pick out the name value and put them in a list.
x = [ {'name': "Joe", 'group': 1}, {'name': "Kirk", 'group': 2}, {'name': "Bob", 'group': 1}]
y = [y['name'] for y in x]
print(y)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Merging two list of dicts with different keys effectively - python

Related

How to ignore a single/multiple keys of all the dictionaries while looping over a list of dictionaries?

Python creating dictionary from list and tuple

How to eliminate second item of a nested dictionary and skip the empty nested lists?

compare and complete lists with each other

generate list from values of certain field in list of objects

Categories

Resources