Python Merge 2 Dictionaries without overwriting - python

If a and b are 2 dictionaries:
a = {'UK':'http://www.uk.com', 'COM':['http://www.uk.com','http://www.michaeljackson.com']}
bb = {'Australia': 'http://www.australia.com', 'COM':['http://www.Australia.com', 'http://www.rafaelnadal.com','http://www.rogerfederer.com']}
I want to merge them to get
{'Australia': ['http://www.australia.com'], 'COM': ['http://www.uk.com', 'http://www.michaeljackson.com', 'http://www.Australia.com', 'http://www.rafaelnadal.com', 'http://www.rogerfederer.com'], 'UK': ['http://www.uk.com']}
I want to union them i.e.
How to do it in Python without overwwriting and replacing any value?

Use a defaultdict:
from collections import defaultdict
d = defaultdict(list)
for dd in (a,bb):
for k,v in dd.items():
#Added this check to make extending work for cases where
#the value is a string.
v = (v,) if isinstance(v,basestring) else v #basestring is just str in py3k.
d[k].extend(v)
(but this is pretty much what I told you in my earlier answer)
This now works if your input dictionaries look like
{'Australia':['http://www.australia.com']}
or like:
{'Australia':'http://www.australia.com'}
However, I would advise against the latter form. In general, I think it's a good idea to keep all the keys/values of a dictionary looking the same (at least if you want to treat all the items the same as in this question). That means that if one value is a list, it's a good idea for all of them to be a list.
If you really insist on keeping things this way:
d = {}
for dd in (a,b):
for k,v in dd.items():
if(not isinstance(v,list)):
v = [v]
try:
d[k].extend(v)
except KeyError: #no key, no problem, just add it to the dict.
d[k] = v

Related

Splitting a dictionary by key suffixes

I have a dictionary like so
d = {"key_a":1, "anotherkey_a":2, "key_b":3, "anotherkey_b":4}
So the values and key names are not important here. The key (no pun intended) thing, is that related keys share the same suffix in my example above that is _a and _b.
These suffixes are not known before hand (they are not always _a and _b for example, and there are an unknown number of different suffixes.
What I would like to do, is to extract out related keys into their own dictionaries, and have all generated dictionaries in a list.
The output from above would be
output = [{"key_a":1, "anotherkey_a":2},{"key_b":3, "anotherkey_b":4}]
My current approach is to first get all the suffixes, and then generate the sub-dicts one at a time and append to the new list
output = list()
# Generate a set of suffixes
suffixes = set([k.split("_")[-1] for k in d.keys()])
# Create the subdict and append to output
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
This works (and is not prohibitively slow or anyhting) but I am simply wondering if there is a more elegant way to do it with a list or dict comprehension? Just out of interest...
Make your output a defaultdict rather than a list, with suffixes as keys:
from collections import defaultdict
output = defaultdict(lambda: {})
for k, v in d.items():
prefix, suffix = k.rsplit('_', 1)
output[suffix][k] = v
This will split your dict in a single pass and result in something like:
output = {"a" : {"key_a":1, "anotherkey_a":2}, "b": {"key_b":3, "anotherkey_b":4}}
and if you insist on converting it to a list, you can simply use:
output = list(output.values())
You could condense the lines
output = list()
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
to a list comprehension, like this
[{k:v for k,v in d.items() if k.endswith(suffix)} for suffix in suffixes]
Whether it is more elegant is probably in the eyes of the beholder.
The approach suggested by #Błotosmętek will probably be faster though, given a large dictionary, since it results in less looping.
def sub_dictionary_by_suffix(dictionary, suffix):
sub_dictionary = {k: v for k, v in dictionary.items() if k.endswith(suffix)}
return sub_dictionary
I hope it helps

Updating dictionaries without data loss

If I have two dictionaries that looks like this:
a = {"fruit":["orange", "lemon"], "vegetable":["carrot", "tomato"]}
b = {"fruit":["banana", "lime"]}
Is there a way I can update dictionary 'a' so that I don't overwrite the previous data, but simply append it so that my result would look like this?
a = {"fruit":["orange", "lemon", "banana", "lime"], "vegetable": ["carrot", "tomato"]}
I know there is something similar , but unfortunately it rewrites the values, which is not what I am looking to do:
a.update(b)
#returns a dictionary like the following {"fruit":["banana", "lime"], "vegetable":["carrot","tomato"]}, again, not what I want.
No way without a loop:
for k, v in b.items():
a[k].extend(v)
This assumes that a[k] actually exists . . . If you want to add it in the case where it is missing:
for k, v in b.items():
try:
a[k].extend(v)
except KeyError:
a[k] = v
You could use a defaultdict, but you have to iterate over the items.
from collections import defaultdict
a = defaultdict(list)
You could also define a helper method (but be careful not to call it with a normal dict, some type check may be appropriate):
def update(a,b):
for k, v in b.items():
a[k].extend(v)
The other option is to extend dict and override the update method to do it there.
You can also do a simple while loop:
accesos = {'carlos pinto': 23849284}
while True:
nueva_persona = input("Nombre?: ")
nueva_clave = input("Clave?: ")
accesos[nueva_persona] = nueva_clave
print(accesos)

Returning unique elements from values in a dictionary

I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,
I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}
from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation
Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}

Python: Understanding loops

I'm super new to Python, I think this isn't a problem with my syntax, but with my understanding...(and I'm sure there's an easier way to do this, but right now I really just want some help with what is wrong with my understanding of loops)
Considering some code that goes roughly like...
for k, v in dict1.iteritems():
if v not in dict2.keys():
print "adding %s to dict2" % v
dict2[v] = "whatever"
My loop cycles through the "if" for every single key in dict1, I can tell because of the print statement. It's as though the for loop uses the original definition of dict2 each time, and doesn't consider whatever happened in the last iteration.
I had expected that once I went through the for loop once, with a unique value from dict1, any duplicate values from dict1 would skip the if step of the loop because that value was already added to dict2 in a previous iteration. Is that incorrect?
Thanks so much!
Joe
more context: hi, here is what I actually have (first thing I've ever written, so maybe it would be helpful to me if you critiqued the whole thing!) I have a file listing employees and their designated "work unit" (substitute the word "work unit" for "team" if it helps), and I figured how to import that into a dictionary. Now I want to turn that into a dictionary of "work units" as keys, with an associated employee as the value. For now it doesn't matter which employee, I just am trying to figure out how to get a dictionary containing 1 key for each work unit). what I have so far...
sheet = workbook.sheet_by_index(0)
r = sheet.nrows
i = 1
employees = {}
'''Importing employees into a employees dictionary'''
while i < r:
hrid = sheet.row_values(i,0,1)
name = sheet.row_values(i,1,2)
wuid = sheet.row_values(i,2,3)
wuname = sheet.row_values(i,3,4)
wum = sheet.row_values(i,4,5)
parentwuid = sheet.row_values(i,5,6)
employees[str(i)] = hrid, name, wuid, wuname, wum, parentwuid
i += 1
'''here's where I create workunits dictionary and try to begin to populate'''
workunits = {}
for k, v in employees.iteritems():
if v[2] not in workunits.keys():
print "Adding to %s to the dictionary" % (v[2])
workunits[str(v[2])] = v[1]
Solution: OK, finally got there...it's just because I hadn't called str() on v[2] in my if statement. Thanks all!
You're checking to see if v (a value) is in dict2's keys, but then adding it as a key. Is that what you want it to do?
If maybe you meant to copy elements over this might be what you meant to do:
if k not in dict2.keys():
print "adding %s to dict2" % v
dict2[k] = v
This question is more for codereview than for SO, but
for k, v in dict1.iteritems(): # here's you iterating through tuples like (key, value) from dict1
if v not in dict2.keys(): # list of dict2 keys created each time and you iterating through the whole list trying to find v
print "adding %s to dict2" % v
dict2[v] = "whatever"
you can simplify (and improve performance) your code like
for k, v in dict1.iteritems(): # here's you iterating through tuples like (key, value) from dict1
if v not in dict2: # just check if v is already in dict2
print "adding %s to dict2" % v
dict2[v] = "whatever"
or even
dict2 = {v:"whatever" for v in dict1.values()}
You mention in your comment "I want to dict2 to contain a key for every unique value in dict1".
There's a compact syntax for getting the result you want.
d_1 = {1: 2, 3: 4, 5: 6}
d_2 = {v: "whatever" for v in d_1.itervalues()}
However, this doesn't address you concern about about duplicates.
What you could do is make a set of the values in d_1 (no duplicates) and then create d_2 from that:
values_1 = set(d_1.itervalues())
d_2 = {v: "whatever" for v in values_1}
Another option is to use the fromkeys method, but to my eye this isn't as clear as the dictionary comprehension.
d_2 = {}.fromkeys(set(d_1.itervalues()))
Unless you have reason to believe that processing duplicates is slowing down your code unacceptably, I'd say you should use the most direct method to express what you want.
For your application of converting the employee_to_team dictionary to a team_to_employee dictionary, you could then do:
team_to_employee = {v: k for k, v in employee_to_team.iteritems()}
This because you don't care which employee gets represented and this method will just overwrite each time a duplicate is encountered.

OrderedDictionary.popitem() unable to iterate over all values?

I try to iterate over an ordered dictionary in last in first out order.
While for a standard dictionary everything works fine, the first solution for the orderedDict reacts strange. It seems, that while popitem() returns one key/value pair (but somehow sequentially, since I can't replace kv_pair by two variables), iteration is finished then. I see no easy way to proceed to the next key/value pair.
While I found two working alternatives (shown below), both of them lack the elegance of the normal dictionary approach.
From what I found in the online help, it is impossible to decide, but I assume I have wrong expectations. Is there a more elgant approach?
from collections import OrderedDict
normaldict = {"0": "a0.csf", "1":"b1.csf", "2":"c2.csf"}
for k, v in normaldict.iteritems():
print k,":",v
d = OrderedDict()
d["0"] = "a0.csf"
d["1"] = "b1.csf"
d["2"] = "c2.csf"
print d, "****"
for kv_pair in d.popitem():
print kv_pair
print "++++"
for k in reversed(d.keys()):
print k, d[k]
print "%%%%"
while len(d) > 0:
k, v = d.popitem()
print k, v
dict.popitem() is not the same thing as dict.iteritems(); it removes one pair from the dictionary as a tuple, and you are looping over that pair.
The most efficient method is to use a while loop instead; no need to call len(), just test against the dictionary itself, an empty dictionary is considered false:
while d:
key, value = d.popitem()
print key, value
The alternative is to use reversed():
for key, item in reversed(d.items()):
print key, value
but that requires the whole dictionary to be copied into a list first.
However, if you were looking for a FIFO queue, use collections.deque() instead:
from collections import deque
d = deque(["a0.csf", "b1.csf", "c2.csf"])
while d:
item = d.pop()
or use deque.reverse().
d.popitems() will return only one tuple (k,v). So your for loop is iterating over the one item and the loop ends.
you can try
while d:
k, v = d.popitem()

Categories