How to merge multiple dictionaries - python

I have main_dict.
main_dict={'name1':{'key1':'value1', 'key2':'value2'}, 'name2':{'key1':'value3', 'key2':'value8'} ... }
I have 2 other dictionaries which brings some more data to be added in the main_dict.
like,
**age_dict= {{'age':'age_value1', 'name': 'name1'}, {'age':'age_value1', 'name': 'name2'}}
gender_dict= {{'gender':'gen_value1', 'name': 'name1'}, {'gender':'gen_value2', 'name': 'name2'}}**
Now i would like to make some loops and merge these dictionaries such that
it checks for the same name and takes values from age and gender dictionaries and create keys 'age' , 'gender' and add them into main_dict.
For now i have done this, but i think django can help to do this in a single way:
for user in age_dict:
for key, value in main_dict.iteritems():
if key == user['name']:
value['age'] = user['age_value']
for user in gender_dict:
for key, value in main_dict.iteritems():
if key == user['name']:
value['gender'] = user['gen_value']
EDIT: Modified age_dict and gender_dict.

General hint: if you are doing something like
for key, val in some_dict.iteritems():
if key == some_value:
do_something(val)
you are most likely doing it wrong, because you are not using the dictionaries very purpose: accessing elements by their keys. Instead, do
do_something(some_dict[key])
and use exceptions if you can't be sure that somedict[key] exists.
You don't have to interate over dictionaries to find the appropriate key. Just access it directly, that's what dictionaries are for:
main_dict={'name1':{'key1':'value1', 'key2':'value2'}, 'name2':{'key1':'value3', 'key2':'value8'}}
age_dicts = [{'age':'age_value1', 'name': 'name1'}, 'age':'age_value1', 'name': 'name2'}]
gender_dicts = [{'gender':'gen_value1', 'name': 'name1'}, 'gender':'gen_value2', 'name': 'name2'}]
for dct in age_dicts:
main_dict[dct['name']]['age'] = dct['age']
for dct in gender_dicts:
main_dict[dct['name']]['gender'] = dct['gender']
Specific answer to the pre-edit case:
age_dict= {'name1':'age_value1', 'name2':'age_value2'}
gender_dict= {'name1':'gen_value1', 'name2':'gen_value2'}
If you are sure that gender_dict and age_dict provide values for each name, it's as easy as
for name, dct in main_dict.iteritems():
dct['age'] = age_dict[name]
dct['gender'] = gender_dict[name]
If there are names without entries in the other dictionaries, you can use exceptions:
for name, dct in main_dict.iteritems():
try:
dct['age'] = age_dict[name]
except KeyError: # no such name in age_dict
pass
try:
dct['gender'] = gender_dict[name]
except KeyError: # no such name in gender_dict
pass

The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.
super_dict = {}
for d in dicts:
for k, v in d.iteritems():
super_dict.setdefault(k, []).append(v)
Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.
import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems():
super_dict[k].append(v)
Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:
import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems():
super_dict[k].add(v)

So you want use age_dict and gender_dict to enrich the values for the keys in main_dict. Well, given Python guarantees average dict lookup to be constant you are constrained only by the number of keys in main_dict and you can reach the enrichment in O(n) where n is the size of the dictionary:
for user_name, user_info in main_dict.items():
if user_name in gender_dict:
user_info['gender'] = gender_dict[user_name]
if user_name in age_dic:
user_info['age'] = age_dict[user_name]
And a fancy function doing this in a generic way:
def enrich(target, **complements):
for user_name, user_info in target.items():
for complement_key, complemented_users in complements.items():
if user_name in complemented_users:
user_info[complement_key] = complemented_users[user_name]
enrich(main_dict, age=age_dict, gender=gender_dict)
Even if you see two nested loops, it is more likely the number of users in main_dict dominates over the number of complementary dictionaries.

Related

How to prevent ast.literal_eval from overwriting same key values? [duplicate]

I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)

LEFT JOIN dictionaries in python based on value

#Input
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250"}}
dict_2 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.252"}}
#Mapper can be modified as required
mapper = {"10.10.210.250":"black","192.168.2.1":"black"}
I am getting each dict in a loop, in each iteration I need to check a dict against the mapper and append a flag based on match between dict_1.orig_h and mapper.10.10.210.250 . I have the flexibility to define the mapper however I need.
So the desired result would be:
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250", "class":"black"}}
dict_2 will remain unchanged since there is no matching value in mapper.
This is kinda what I want, but it works only if orig_h is an int
import collections
result = collections.defaultdict(dict)
for d in dict_1:
result[d[int('orig_h')]].update(d)
for d in mapper:
result[d[int('orig_h')]].update(d)
Not much explaining to be done; if the ip is in the mapper dictionary (if mapper has a key which is that ip) then set the desired attribute of the dict to the value of the key in the mapper dict ('black' here).
def update_dict(dic, mapper):
ip = dic['conn']['orig_h']
if ip in mapper:
dic['conn']['class'] = mapper[ip]
which works exactly as desired:
>>> update_dict(dict_1, mapper)
>>> dict_1
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.250', 'class': 'black'}}
>>> update_dict(dict_2, mapper)
>>> dict_2
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.252'}}
Extracting the conn value for simplicity:
conn_data = dict_1['conn']
conn_data['class'] = mapper[conn_data['orig_h']]
A two liner, extracting class and dict if the 'orig_h' is in the mapper dictionary's keys, if it id, keep it, otherwise don't keep it, then create a new dictionary comprehension inside the list comprehension to add 'class' to the dictionary's 'conn' key's dictionary.
l=[(i,mapper[i['conn']['orig_h']]) for i in (dict_1,dict_2) if i['conn']['orig_h'] in mapper]
print([{'conn':dict(a['conn'],**{'class':b})} for a,b in l])
BTW this answer chooses the dictionaries automatically

Extracting value from dictionary based on matching term from a list

Using Python 2.7, I am trying to extract a value from a dictionary, where the key is from a list / set of possible values (and assuming that if any key is present in the dictionary, only one of the possible keys is present). For example, I want to grab the "name" value from a dictionary, where the key could be name, display_name, displayName, or displayname. I could do a bunch of if-elif statements, but hoping for something more elegant. From these SO answers, I believe I could also do something like:
keys = ['name', 'display_name', 'displayName', 'displayname']
data = {'name': 'Foo',
'baz': 'bim'}
if any(_key in data for _key in keys):
filtered = {k:v for k, v in data.iteritems() if k in keys}
display_name = filtered[filtered.keys()[0]]
While that is definitely better than if-elif statements, is there a one-liner way to do this? (fill in the ????):
keys = ['name', 'display_name', 'displayName', 'displayname']
data = {'name': 'Foo',
'baz': 'bim'}
if any(_key in data for _key in keys):
display_name = ????
I have a bunch of these pairings to find, so that's why I'm looking for simpler solutions.
Provided you know only one key will match:
In [89]: keys = ['name', 'display_name', 'displayName', 'displayname']
data = {'name': 'Foo', 'baz': 'bim'}
data[set(keys).intersection(data.keys()).pop()]
Out[89]: 'Foo'
Edit: To explain. pop() for a set returns an arbitrary element, so if you know the intersection of the two sets can only be one element, it will always return what you're looking for.

Python json parser allow duplicate keys

I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)

Initializing a dictionary in python with a key value and no corresponding values

I was wondering if there was a way to initialize a dictionary in python with keys but no corresponding values until I set them. Such as:
Definition = {'apple': , 'ball': }
and then later i can set them:
Definition[key] = something
I only want to initialize keys but I don't know the corresponding values until I have to set them later. Basically I know what keys I want to add the values as they are found. Thanks.
Use the fromkeys function to initialize a dictionary with any default value. In your case, you will initialize with None since you don't have a default value in mind.
empty_dict = dict.fromkeys(['apple','ball'])
this will initialize empty_dict as:
empty_dict = {'apple': None, 'ball': None}
As an alternative, if you wanted to initialize the dictionary with some default value other than None, you can do:
default_value = 'xyz'
nonempty_dict = dict.fromkeys(['apple','ball'],default_value)
You could initialize them to None.
you could use a defaultdict. It will let you set dictionary values without worrying if the key already exists. If you access a key that has not been initialized yet it will return a value you specify (in the below example it will return None)
from collections import defaultdict
your_dict = defaultdict(lambda : None)
It would be good to know what your purpose is, why you want to initialize the keys in the first place. I am not sure you need to do that at all.
1) If you want to count the number of occurrences of keys, you can just do:
Definition = {}
# ...
Definition[key] = Definition.get(key, 0) + 1
2) If you want to get None (or some other value) later for keys that you did not encounter, again you can just use the get() method:
Definition.get(key) # returns None if key not stored
Definition.get(key, default_other_than_none)
3) For all other purposes, you can just use a list of the expected keys, and check if the keys found later match those.
For example, if you only want to store values for those keys:
expected_keys = ['apple', 'banana']
# ...
if key_found in expected_keys:
Definition[key_found] = value
Or if you want to make sure all expected keys were found:
assert(all(key in Definition for key in expected_keys))
You can initialize the values as empty strings and fill them in later as they are found.
dictionary = {'one':'','two':''}
dictionary['one']=1
dictionary['two']=2
Comprehension could be also convenient in this case:
# from a list
keys = ["k1", "k2"]
d = {k:None for k in keys}
# or from another dict
d1 = {"k1" : 1, "k2" : 2}
d2 = {k:None for k in d1.keys()}
d2
# {'k1': None, 'k2': None}
q = input("Apple")
w = input("Ball")
Definition = {'apple': q, 'ball': w}
Based on the clarifying comment by #user2989027, I think a good solution is the following:
definition = ['apple', 'ball']
data = {'orange':1, 'pear':2, 'apple':3, 'ball':4}
my_data = {}
for k in definition:
try:
my_data[k]=data[k]
except KeyError:
pass
print my_data
I tried not to do anything fancy here. I setup my data and an empty dictionary. I then loop through a list of strings that represent potential keys in my data dictionary. I copy each value from data to my_data, but consider the case where data may not have the key that I want.

Categories