merge two lists of dictionaries in python? - python

I have two lists of dictionaries:
old = [{'a':'1','b':'2'},{'a':'2','b':'3'},{'a':'3','b':'4'},{'a':'4','b':'5'}]
new = [{'a':'1','b':'100'},{'a':'2','b':'100'},{'a':'5','b':'6'}]
How can I merge two lists of dictionaries to get:
update = [{'a':'1','b':'2,100'},{'a':'2','b':'3,100'},{'a':'3','b':'4'},{'a':'4','b':'5'},{'a':'5','b':'6'}]
the idea is if new 'a' is not in the old, add it and if new 'a' is in the old, update 'b' and if old 'a' is not in the new, keep it.

If 'a' is the real key here and b is the value, imo it would be easier to convert the list of dicts into one dict, process the merge and then convert it back. This way you can use the standard functions.
Convert into one dict where a is the key:
def intoRealDict(listOfDicts):
values = []
for item in listOfDicts:
values.append((item.get('a'), item.get('b')))
return dict(values) #Use Dict Constructur ('3', '4) -> {'3': '4'}
Convert into the data structure again:
def intoTheDataStructure(realDict):
res = []
for i in realDict:
res.append(dict([('a', i), ('b', realDict[i])]))
return res
Easy Merge of your two lists:
def merge(l1, l2):
d1, d2 = intoRealDict(l1), intoRealDict(l2)
for i in d2:
if i in d1:
#extend "b"
d1[i] = d1[i] + ", " + d2[i]
else:
#append value of key i
d1[i] = d2[i]
return intoTheDataStructure(d1)
Working Code with bad performance

Related

Is there a better/faster/cleaner way to count duplicates of a list of class objects based on their attributes?

Basically the title. I'm trying to store information about duplicate objects in a list of objects, but I'm having a hard time finding anything related to this. I've devised this for now, but I'm not sure if this is the best way for what I want to do :
#dataclass
class People:
name: str = None
age: int = None
# Functions to check for duplicates (based on names)
def __eq__(self, other):
return (self.name == other.name)
def __hash__(self):
return hash(('name', self.name))
objects = [People("General", 12), People("Kenobi", 11), People("General", 15)]
duplicates, temp = [], {}
for (i, object) in enumerate(objects):
if (not object.name in temp):
temp[object.name] = {'count': 1,
'indices': [i]}
else:
temp[object.name]['count'] += 1
temp[object.name]['indices'] += [i]
for t in temp:
if (temp[t]['count'] > 1):
print(f"Found duplicates of {t}")
for i in temp[t]['indices']:
duplicates.append(objects[i])
Edit : The People class is simple as an example. I thought about making it a dict, but that would be more complicated than keeping track of a list of objects. I'm looking to make a new list of duplicates by name only, while keeping every other attribute/value as the original object
Use collections.Counter.
from collections import Counter
...
counts = Counter(objects)
duplicates = [o for o, c in counts.items() if c > 1]
If you want lists of objects matching certain criteria (e.g. all those with the same name), that's not really the same thing as getting a list of duplicates, but it's also very simple:
from collections import defaultdict
...
people_by_name = defaultdict(list)
for p in objects:
people_by_name[p.name].append(p)
If you want to narrow that dictionary to only lists with more than one element, you can use a comprehension very similar to the one you'd use with the Counter:
people_by_name = {k: v for k, v in people_by_name.items() if len(v) > 1}

iteratively appending N items to list gives last item N times instead

I am accessing a list of dictionary items list_of_dict = [{'ka':'1a', 'kb':'1b', 'kc':'1c'},{'ka':'2a'},{'ka':'3a', 'kb':'3b', 'kc':'3c'}], and trying to conditionally append each entry to another list article_list using a function add_entries.
My desired output
article_list = [{x:1a, y:1b, z:1c}, {x:2a}, {x:3a, y:3b, z:3c}]
My code:
def add_entries(list_of_dict):
keys = ['x','y','z']
#defining a temporary dictionary here
my_dict = dict.fromkeys(keys,0)
entry_keys = ['ka','kb','kc']
my_list = []
for item in list_of_dict:
# conditionally append the entries into the temporary dictionary maintaining desired key names
my_dict.update({a: item[b] for a,b in zip(keys, entry_keys) if b in item})
my_list.append(my_dict)
return my_list
if __name__ == "__main__":
list_of_dict = [{'ka':'1a', 'kb':'1b', 'kc':'1c'},{'ka':'2a'},{'ka':'3a', 'kb':'3b', 'kc':'3c'}]
article_list = []
returned_list = add_entries(list_of_dict)
article_list.extend(returned_list)
My output
article_list = [{x:3a, y:3b, z:3c}, {x:3a, y:3b, z:3c}, {x:3a, y:3b, z:3c}]
Whats wrong
my_list.append(my_dict) appends a reference to the my_dict object to my_list. Therefore, at the end of the for loop in your example, my_list is a list of references to the exact same dictionary in memory.
You can see this for yourself using the function id. id, at least in CPython, basically gives you the memory address of an object. If you do
article_list.extend(returned_list)
print([id(d) for d in article_list])
You'll get a list of identical memory addresses. On my machine I get
[139920570625792, 139920570625792, 139920570625792]
The consequence is that updating the dictionary affects 'all of the dictionaries' in your list. (quotes because really there are not multiple dictionaries in your list, just many times the exact same one). So in the end, only the last update operation is visible to you.
A good discussion on references and objects in Python can be found in this answer https://stackoverflow.com/a/30340596/8791491
The fix
Moving the definition of my_dict inside the for loop, means that you get a new, separate dictionary for each element of my_list. Now, the update operation won't affect the other dictionaries in the list, and my_list is a list of references to several different dictionaries.
def add_entries(list_of_dict):
keys = ['x','y','z']
entry_keys = ['ka','kb','kc']
my_list = []
for item in list_of_dict:
#defining a new dictionary here
my_dict = dict.fromkeys(keys,0)
# conditionally append the entries into the temporary dictionary maintaining desired key names
my_dict.update({a: item[b] for a,b in zip(keys, entry_keys) if b in item})
my_list.append(my_dict)
return my_list
You can use this.
keys=['x','y','z']
res=[{k1:d[k]} for d in list_of_dict for k,k1 in zip(d,keys)]
output
[{'x': '1a'},
{'y': '1b'},
{'z': '1c'},
{'x': '2a'},
{'x': '3a'},
{'y': '3b'},
{'z': '3c'}]
Try this:
list_new_d=[]
for d in list_of_dict:
new_d={}
for k,v in d.items():
if k == 'ka':
new_d['x'] = v
if k == 'kb':
new_d['y'] = v
if k == 'kc':
new_d['z'] = v
list_new_d.append(new_d)

How to remove a dict objects(letter) that remain in another str?

Suppose I have this dictionary:
x = {'a':2, 'b':5, 'g':7, 'a':3, 'h':8}`
And this input string:
y = 'agb'
I want to delete the keys of x that appear in y, such as, if my input is as above, output should be:
{'h':8, 'a':3}
My current code is here:
def x_remove(x,word):
x1 = x.copy() # copy the input dict
for i in word: # iterate all the letters in str
if i in x1.keys():
del x1[i]
return x1
But when the code runs, it removes all existing key similar as letters in word. But i want though there is many keys similar as letter in word , it only delete one key not every
wheres my wrong, i got that maybe but Just explain me how can i do that without using del function
You're close, but try this instead:
def x_remove(input_dict, word):
output_dict = input_dict.copy()
for letter in word:
if letter in output_dict:
del output_dict[letter]
return output_dict
For example:
In [10]: x_remove({'a': 1, 'b': 2, 'c':3}, 'ac')
Out[10]: {'b': 2}
One problem was your indentation. Indentation matters in Python, and is used the way { and } and ; are in other languages. Another is the way you were checking to see if each letter was in the list; you want if letter in output_dict since in on a dict() searches keys.
It's also easier to see what's going on when you use descriptive variable names.
We can also skip the del entirely and make this more Pythonic, using a dict comprehension:
def x_remove(input_dict, word):
return {key: value for key, value in input_dict if key not in word}
This will still implicitly create a shallow copy of the list (without the removed elements) and return it. This will be more performant as well.
As stated in the comments, all keys in dictionaries are unique. There can only ever be one key named 'a' or b.
Dictionary must have unique keys. You may use list of tuples for your data instead.
x = [('a',2), ('b',5), ('g',7), ('a',3), ('h',8)]
Following code then deletes the desired entries:
for letter in y:
idx = 0
for item in x.copy():
if item[0] == letter:
del x[idx]
break
idx += 1
Result:
>>> x
[('a', 3), ('h', 8)]
You can also implement like
def remove_(x,y)
for i in y:
try:
del x[i]
except:
pass
return x
Inputs x = {'a': 1, 'b': 2, 'c':3} and y = 'ac'.
Output
{'b': 2}

Finding if there are distinct elements in a python dictionary

I have a python dictionary containing n key-value pairs, out of which n-1 values are identical and 1 is not. I need to find the key of the distinct element.
For example: consider a python list [{a:1},{b:1},{c:2},{d:1}]. I need the to get 'c' as the output.
I can use a for loop to compare consecutive elements and then use two more for loops to compare those elements with the other elements. But is there a more efficient way to go about it or perhaps a built-in function which I am unaware of?
If you have a dictionary you can quickly check and find the first value which is different from the next two values cycling around the keys of your dictionary.
Here's an example:
def find_different(d):
k = d.keys()
for i in xrange(0, len(k)):
if d[k[i]] != d[k[(i+1)%len(k)]] and d[k[i]] != d[k[(i+2)%len(k)]]:
return k[i]
>>> mydict = {'a':1, 'b':1, 'c':2, 'd':1}
>>> find_different(mydict)
'c'
Otherwise, if what you have is a list of single-key dictionaries, then you can do it quite nicely mapping your list with a function which "extracts" the values from your elements, then check each one using the same logic.
Here's another working example:
def find_different(l):
mask = map(lambda x: x[x.keys()[0]], l)
for i in xrange(0, len(l)):
if mask[i] != mask[(i+1)%len(l)] and mask[i] != mask[(i+2)%len(l)]:
return l[i].keys()[0]
>>> mylist = [{'a':1},{'b':1},{'c':2},{'d':1}]
>>> find_different(mylist)
'c'
NOTE: these solutions do not work in Python 3 as the map function doesn't return a list and neither does the .keys() method of dictionaries.
Assuming that your "list of pairs" (actually list of dictionaries, sigh) cannot be changed:
from collections import defaultdict
def get_pair(d):
return (d.keys()[0], d.values()[0])
def extract_unique(l):
d = defaultdict(list)
for key, value in map(get_pair, l):
d[value].append(key)
return filter(lambda (v,l): len(l) == 1, d.items())[0][1]
If you already have your dictionary, then you make a list of all of the keys: key_list = yourDic.keys(). Using that list, you can then loop through your dictionary. This is easier if you know one of the values, but below I assume that you do not.
yourDic = {'a':1, 'b':4, 'c':1, 'd':1, }
key_list = yourDic.keys()
previous_value = yourDic[key_list[0]] # Making it so loop gets past first test
count = 0
for key in key_list:
test_value = yourDic[key]
if (test_value != previous_value) and count == 1: # Checks first key
print key_list[count - 1]
break
elif (test_value != previous_value):
print key
break
else:
previous_value = test_value
count += 1
So, once you find the value that is different, it will print the key. If you want it to print the value, too, you just need a print test_value statement

How to unit a sequence of dicts in Python

I want to unit two dicts, but there are 10 dicts in a list, so how can I unit two by two without duplication?
I wrote something like this:
d_a1 = dict(list(dicts[0].items()) + list(dicts[1].items()))
d_b1 = dict(list(dicts[2].items()) + list(dicts[3].items()))
d_b2 = dict(list(dicts[4].items()) + list(dicts[5].items()))
d_b3 = dict(list(dicts[6].items()) + list(dicts[7].items()))
d_b4 = dict(list(dicts[8].items()) + list(dicts[9].items()))
You could use:
for d1, d2 in zip(dicts[::2], dicts[1::2]):
new_dict = dict(d1, **d2)
This pairs up the dictionaries and creates a new dictionary based on the two input dictionaries.
Further bringing this down to a loop with some iteration magic:
paired = [dict(d1, **d2) for d1, d2 in zip(*[iter(dicts)]*2)]
which produces a list of paired dictionaries.
This creates a list of all paired dictionaries:
it = iter(dicts)
paired_dicts = [dict(x, **next(it)) for x in it]
I would use:
import collections
newDicts = collections.deque((d1.update(d2) for d1,d2 in zip(*[iter(listOfDicts)]*2)), maxlen=0)
This should edit your dictionaries in-place. Thus, all the dictionaries in even numbered indices in your list of dictionaries would contain the union of itself and the dictionary in the very next index

Categories