LEFT JOIN dictionaries in python based on value

LEFT JOIN dictionaries in python based on value - python

#Input
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250"}}
dict_2 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.252"}}
#Mapper can be modified as required
mapper = {"10.10.210.250":"black","192.168.2.1":"black"}
I am getting each dict in a loop, in each iteration I need to check a dict against the mapper and append a flag based on match between dict_1.orig_h and mapper.10.10.210.250 . I have the flexibility to define the mapper however I need.
So the desired result would be:
dict_1 = {"conn": {"ts":15,"uid":"ABC","orig_h":"10.10.210.250", "class":"black"}}
dict_2 will remain unchanged since there is no matching value in mapper.
This is kinda what I want, but it works only if orig_h is an int
import collections
result = collections.defaultdict(dict)
for d in dict_1:
result[d[int('orig_h')]].update(d)
for d in mapper:
result[d[int('orig_h')]].update(d)

Not much explaining to be done; if the ip is in the mapper dictionary (if mapper has a key which is that ip) then set the desired attribute of the dict to the value of the key in the mapper dict ('black' here).
def update_dict(dic, mapper):
ip = dic['conn']['orig_h']
if ip in mapper:
dic['conn']['class'] = mapper[ip]
which works exactly as desired:
>>> update_dict(dict_1, mapper)
>>> dict_1
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.250', 'class': 'black'}}
>>> update_dict(dict_2, mapper)
>>> dict_2
{'conn': {'ts': 15, 'uid': 'ABC', 'orig_h': '10.10.210.252'}}

Extracting the conn value for simplicity:
conn_data = dict_1['conn']
conn_data['class'] = mapper[conn_data['orig_h']]

A two liner, extracting class and dict if the 'orig_h' is in the mapper dictionary's keys, if it id, keep it, otherwise don't keep it, then create a new dictionary comprehension inside the list comprehension to add 'class' to the dictionary's 'conn' key's dictionary.
l=[(i,mapper[i['conn']['orig_h']]) for i in (dict_1,dict_2) if i['conn']['orig_h'] in mapper]
print([{'conn':dict(a['conn'],**{'class':b})} for a,b in l])
BTW this answer chooses the dictionaries automatically

Related

Understanding the use of defaultdict in Python [duplicate]

This question already has answers here:
Collections.defaultdict difference with normal dict
(16 answers)
Closed 6 years ago.
I am starting to learn Python and have run across a piece of code that I'm hoping one of you can help me understand.
from collections import defaultdict
dd_dict = defaultdict(dict)
dd_dict["Joel"]["City"] = "Seattle"
result:
{ "Joel" : { "City" : Seattle"}}
The part I am having a problem with is the third line. Could someone please explain to me what is happening here?

The third line inserts a dictionary inside a dictionary. By using dict as a default value in default dict you are telling python to initialize every new dd_dict value with an empty dict. The above code is equivalent to
dd_dict["Joel"] = {}
dd_dict['Joel"]["City"] = "Seattle"
If you didn't use default dict the second line would have raised a key error. So default dicts are a way of avoiding such errors by initializing the default value of your data structure.

From the documentation of defaultdict:
If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Since "Joel" doesn't exist as key yet the dd_dict["Joel"] part creates an empty dictionary as value for the key "Joel". The following part ["City"] = "Seattle" is just like adding a normal key-value pair a dictionary - in this case the dd_dict["Joel"] dictionary.

The first argument provides the initial value for the default_factory
attribute; it defaults to None. If default_factory is not None, it is
called without arguments to provide a default value for the given key,
this value is inserted in the dictionary for the key, and returned.
dd_dict = defaultdict(dict)
dd_dict["Joel"]["City"] = "Seattle"
in you case, when you call dd_dict["Joel"], there is no such key in the dd_dict, this raises a KeyError exception. defaultdict has __missing__(key) protocol to handle this error, when it can not find the key, it will call the default_factory without arguments to provide a default value for the given key.
so when you call dd_dict["Joel"], this will give you a dict {}, then you add item ["City"] = "Seattle" to the empty dict, someting like:
{}["City"] = "Seattle"

When a key is accessed and is missing, the __missing__ method is accessed.
For a regular dict, a KeyError is raised
For a defaultdict, the object you passed as a parameter is created and accessed.
If you made a defaultdict(list), and tried to access a missing key, you would get a list back.
Example:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['missing']
[]

When you access a key of a defaultdict that does not exits, you will get what the function you supply returns.
In your case you supplied dict, therefore you get a new empty dictionary:
>>> dict()
{}
>>> from collections import defaultdict
... dd_dict = defaultdict(dict)
...
>>> dd_dict['Joel']
{}
Now you add your key-value pair to this dictionary:
>>> dd_dict["Joel"]["City"] = "Seattle"
"Joel" : { "City" : Seattle"}}

defaultdict(dict) returns a dictionary object that will return an empty dictionary value if you index into it with a key that doesn't yet exist:
>>> from collections import defaultdict
>>> dd_dict = defaultdict(dict)
>>> dd_dict
defaultdict(<class 'dict'>, {})
>>> dd_dict["Joel"]
{}
>>> dd_dict["anything"]
{}
>>> dd_dict[99]
{}
So the third line creates a key-value pair ("Joel", {}) in dd_dict, then sets the ("City", "Seattle") key-value pair on the empty dictionary.
It's equivalent to:
>>> dd_dict = defaultdict(dict)
>>> dd_dict["Joel"] = {}
>>> dd_dict
defaultdict(<class 'dict'>, {'Joel': {}})
>>> dd_dict["Joel"]["City"] = "Seattle"
>>> dd_dict
defaultdict(<class 'dict'>, {'Joel': {'City': 'Seattle'}})

Python: add to a dictionary in a list

I have the following dictionary (It is for creating json),
temp = {'logs':[]}
I want to append dictionaries, but i only got 1 key:val at a time.
what I tried:
temp['logs'].append({key:val})
This does as expected and appends the dict to the array.
But now I want to add a key/val pair to this dictionary, how can I do this?
I've tried using append/extend but that just adds a new dictionary to the list.

But now I want to add a key/val pair to this dictionary
You can index the list and update that dictionary:
temp['logs'][0].update({'new_key': 'new_value'})

You can use this command to change your dict values :
>>> temp['logs'][0]={'no':'val'}
>>> temp
{'logs': [{'no': 'val'}]}
And this one to add values :
>>> temp['logs'][0].update({'yes':'val'})
>>> temp
{'logs': [{'key': 'val', 'yes': 'val'}]}

There must be unique "key" every time you append it. (If it is for json)
Also making "=" will update your old dictionary
What I have done when I was stuck once is
user = {}
name,password,id1 = [],[],[]
user1=session.query(User).all()
for i in user1:
name=i.name
password=i.password
id1=i.id
user.update({ id1:{
"name" : name,
"password" : password,
}
})
check this link might be helpful to you
How to convert List of JSON frames to JSON frame

Note that adding a dictionary (or any object) to a list only stores a reference, not a copy.
You can therefor do this:
>>> temp = {'logs': []}
>>> log_entry = {'key1': 'val1'}
>>> temp['logs'].append(log_entry)
>>> temp
{'logs': [{'key1': 'val1'}]}
>>> log_entry['key2'] = 'val2'
>>> temp
{'logs': [{'key2': 'val2', 'key1': 'val1'}]}
However, you might be able to circumvent to whole issue by using dict comprehension (only in Python >=2.7)
>>> temp = {'logs': [{key: value for key, value in my_generator}]

Try this example:
temp = {
'logs':[]
}
[temp['logs'].append(log) for log in errors['logs']]
Your log data would be list with multiple dictionary

How to merge multiple dictionaries

I have main_dict.
main_dict={'name1':{'key1':'value1', 'key2':'value2'}, 'name2':{'key1':'value3', 'key2':'value8'} ... }
I have 2 other dictionaries which brings some more data to be added in the main_dict.
like,
**age_dict= {{'age':'age_value1', 'name': 'name1'}, {'age':'age_value1', 'name': 'name2'}}
gender_dict= {{'gender':'gen_value1', 'name': 'name1'}, {'gender':'gen_value2', 'name': 'name2'}}**
Now i would like to make some loops and merge these dictionaries such that
it checks for the same name and takes values from age and gender dictionaries and create keys 'age' , 'gender' and add them into main_dict.
For now i have done this, but i think django can help to do this in a single way:
for user in age_dict:
for key, value in main_dict.iteritems():
if key == user['name']:
value['age'] = user['age_value']
for user in gender_dict:
for key, value in main_dict.iteritems():
if key == user['name']:
value['gender'] = user['gen_value']
EDIT: Modified age_dict and gender_dict.

General hint: if you are doing something like
for key, val in some_dict.iteritems():
if key == some_value:
do_something(val)
you are most likely doing it wrong, because you are not using the dictionaries very purpose: accessing elements by their keys. Instead, do
do_something(some_dict[key])
and use exceptions if you can't be sure that somedict[key] exists.
You don't have to interate over dictionaries to find the appropriate key. Just access it directly, that's what dictionaries are for:
main_dict={'name1':{'key1':'value1', 'key2':'value2'}, 'name2':{'key1':'value3', 'key2':'value8'}}
age_dicts = [{'age':'age_value1', 'name': 'name1'}, 'age':'age_value1', 'name': 'name2'}]
gender_dicts = [{'gender':'gen_value1', 'name': 'name1'}, 'gender':'gen_value2', 'name': 'name2'}]
for dct in age_dicts:
main_dict[dct['name']]['age'] = dct['age']
for dct in gender_dicts:
main_dict[dct['name']]['gender'] = dct['gender']
Specific answer to the pre-edit case:
age_dict= {'name1':'age_value1', 'name2':'age_value2'}
gender_dict= {'name1':'gen_value1', 'name2':'gen_value2'}
If you are sure that gender_dict and age_dict provide values for each name, it's as easy as
for name, dct in main_dict.iteritems():
dct['age'] = age_dict[name]
dct['gender'] = gender_dict[name]
If there are names without entries in the other dictionaries, you can use exceptions:
for name, dct in main_dict.iteritems():
try:
dct['age'] = age_dict[name]
except KeyError: # no such name in age_dict
pass
try:
dct['gender'] = gender_dict[name]
except KeyError: # no such name in gender_dict
pass

The setdefault method of dict looks up a key, and returns the value if found. If not found, it returns a default, and also assigns that default to the key.
super_dict = {}
for d in dicts:
for k, v in d.iteritems():
super_dict.setdefault(k, []).append(v)
Also, you might consider using a defaultdict. This just automates setdefault by calling a function to return a default value when a key isn't found.
import collections
super_dict = collections.defaultdict(list)
for d in dicts:
for k, v in d.iteritems():
super_dict[k].append(v)
Also, as Sven Marnach astutely observed, you seem to want no duplication of values in your lists. In that case, set gets you what you want:
import collections
super_dict = collections.defaultdict(set)
for d in dicts:
for k, v in d.iteritems():
super_dict[k].add(v)

So you want use age_dict and gender_dict to enrich the values for the keys in main_dict. Well, given Python guarantees average dict lookup to be constant you are constrained only by the number of keys in main_dict and you can reach the enrichment in O(n) where n is the size of the dictionary:
for user_name, user_info in main_dict.items():
if user_name in gender_dict:
user_info['gender'] = gender_dict[user_name]
if user_name in age_dic:
user_info['age'] = age_dict[user_name]
And a fancy function doing this in a generic way:
def enrich(target, **complements):
for user_name, user_info in target.items():
for complement_key, complemented_users in complements.items():
if user_name in complemented_users:
user_info[complement_key] = complemented_users[user_name]
enrich(main_dict, age=age_dict, gender=gender_dict)
Even if you see two nested loops, it is more likely the number of users in main_dict dominates over the number of complementary dictionaries.

Initializing a dictionary in python with a key value and no corresponding values

I was wondering if there was a way to initialize a dictionary in python with keys but no corresponding values until I set them. Such as:
Definition = {'apple': , 'ball': }
and then later i can set them:
Definition[key] = something
I only want to initialize keys but I don't know the corresponding values until I have to set them later. Basically I know what keys I want to add the values as they are found. Thanks.

Use the fromkeys function to initialize a dictionary with any default value. In your case, you will initialize with None since you don't have a default value in mind.
empty_dict = dict.fromkeys(['apple','ball'])
this will initialize empty_dict as:
empty_dict = {'apple': None, 'ball': None}
As an alternative, if you wanted to initialize the dictionary with some default value other than None, you can do:
default_value = 'xyz'
nonempty_dict = dict.fromkeys(['apple','ball'],default_value)

You could initialize them to None.

you could use a defaultdict. It will let you set dictionary values without worrying if the key already exists. If you access a key that has not been initialized yet it will return a value you specify (in the below example it will return None)
from collections import defaultdict
your_dict = defaultdict(lambda : None)

It would be good to know what your purpose is, why you want to initialize the keys in the first place. I am not sure you need to do that at all.
1) If you want to count the number of occurrences of keys, you can just do:
Definition = {}
# ...
Definition[key] = Definition.get(key, 0) + 1
2) If you want to get None (or some other value) later for keys that you did not encounter, again you can just use the get() method:
Definition.get(key) # returns None if key not stored
Definition.get(key, default_other_than_none)
3) For all other purposes, you can just use a list of the expected keys, and check if the keys found later match those.
For example, if you only want to store values for those keys:
expected_keys = ['apple', 'banana']
# ...
if key_found in expected_keys:
Definition[key_found] = value
Or if you want to make sure all expected keys were found:
assert(all(key in Definition for key in expected_keys))

You can initialize the values as empty strings and fill them in later as they are found.
dictionary = {'one':'','two':''}
dictionary['one']=1
dictionary['two']=2

Comprehension could be also convenient in this case:
# from a list
keys = ["k1", "k2"]
d = {k:None for k in keys}
# or from another dict
d1 = {"k1" : 1, "k2" : 2}
d2 = {k:None for k in d1.keys()}
d2
# {'k1': None, 'k2': None}

q = input("Apple")
w = input("Ball")
Definition = {'apple': q, 'ball': w}

Based on the clarifying comment by #user2989027, I think a good solution is the following:
definition = ['apple', 'ball']
data = {'orange':1, 'pear':2, 'apple':3, 'ball':4}
my_data = {}
for k in definition:
try:
my_data[k]=data[k]
except KeyError:
pass
print my_data
I tried not to do anything fancy here. I setup my data and an empty dictionary. I then loop through a list of strings that represent potential keys in my data dictionary. I copy each value from data to my_data, but consider the case where data may not have the key that I want.

Multiple keys per value

Is it possible to assign multiple keys per value in a Python dictionary. One possible solution is to assign value to each key:
dict = {'k1':'v1', 'k2':'v1', 'k3':'v1', 'k4':'v2'}
but this is not memory efficient since my data file is > 2 GB. Otherwise you could make a dictionary of dictionary keys:
key_dic = {'k1':'k1', 'k2':'k1', 'k3':'k1', 'k4':'k4'}
dict = {'k1':'v1', 'k4':'v2'}
main_key = key_dict['k2']
value = dict[main_key]
This is also very time and effort consuming because I have to go through whole dictionary/file twice. Is there any other easy and inbuilt Python solution?
Note: my dictionary values are not simple string (as in the question 'v1', 'v2') rather complex objects (contains different other dictionary/list etc. and not possible to pickle them)
Note: the question seems similar as How can I use both a key and an index for the same dictionary value?
But I am not looking for ordered/indexed dictionary and I am looking for other efficient solutions (if any) other then the two mentioned in this question.

What type are the values?
dict = {'k1':MyClass(1), 'k2':MyClass(1)}
will give duplicate value objects, but
v1 = MyClass(1)
dict = {'k1':v1, 'k2':v1}
results in both keys referring to the same actual object.
In the original question, your values are strings: even though you're declaring the same string twice, I think they'll be interned to the same object in that case
NB. if you're not sure whether you've ended up with duplicates, you can find out like so:
if dict['k1'] is dict['k2']:
print("good: k1 and k2 refer to the same instance")
else:
print("bad: k1 and k2 refer to different instances")
(is check thanks to J.F.Sebastian, replacing id())

Check out this - it's an implementation of exactly what you're asking: multi_key_dict(ionary)
https://pypi.python.org/pypi/multi_key_dict
(sources at https://github.com/formiaczek/python_data_structures/tree/master/multi_key_dict)
(on Unix platforms it possibly comes as a package and you can try to install it with something like:
sudo apt-get install python-multi-key-dict
for Debian, or an equivalent for your distribution)
You can use different types for keys but also keys of the same type. Also you can iterate over items using key types of your choice, e.g.:
m = multi_key_dict()
m['aa', 12] = 12
m['bb', 1] = 'cc and 1'
m['cc', 13] = 'something else'
print m['aa'] # will print '12'
print m[12] # will also print '12'
# but also:
for key, value in m.iteritems(int):
print key, ':', value
# will print:1
# 1 : cc and 1
# 12 : 12
# 13 : something else
# and iterating by string keys:
for key, value in m.iteritems(str):
print key, ':', value
# will print:
# aa : 12
# cc : something else
# bb : cc and 1
m[12] = 20 # now update the value
print m[12] # will print '20' (updated value)
print m['aa'] # will also print '20' (it maps to the same element)
There is no limit to number of keys, so code like:
m['a', 3, 5, 'bb', 33] = 'something'
is valid, and either of keys can be used to refer to so-created value (either to read / write or delete it).
Edit: From version 2.0 it should also work with python3.

Using python 2.7/3 you can combine a tuple, value pair with dictionary comprehension.
keys_values = ( (('k1','k2'), 0), (('k3','k4','k5'), 1) )
d = { key : value for keys, value in keys_values for key in keys }
You can also update the dictionary similarly.
keys_values = ( (('k1',), int), (('k3','k4','k6'), int) )
d.update({ key : value for keys, value in keys_values for key in keys })
I don't think this really gets to the heart of your question but in light of the title, I think this belongs here.

The most straightforward way to do this is to construct your dictionary using the dict.fromkeys() method. It takes a sequence of keys and a value as inputs and then assigns the value to each key.
Your code would be:
dict = dict.fromkeys(['k1', 'k2', 'k3'], 'v1')
dict.update(dict.fromkeys(['k4'], 'v2'))
And the output is:
print(dict)
{'k1': 'v1', 'k2': 'v1', 'k3': 'v1', 'k4': 'v2'}

You can build an auxiliary dictionary of objects that were already created from the parsed data. The key would be the parsed data, the value would be your constructed object -- say the string value should be converted to some specific object. This way you can control when to construct the new object:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
obj = existing.setdefault(v, MyClass(v)) # could be made more efficient
result[k] = obj
Then all the result dictionary duplicate value objects will be represented by a single object of the MyClass class. After building the result, the existing auxiliary dictionary can be deleted.
Here the dict.setdefault() may be elegant and brief. But you should test later whether the more talkative solution is not more efficient -- see below. The reason is that MyClass(v) is always created (in the above example) and then thrown away if its duplicate exists:
existing = {} # auxiliary dictionary for making the duplicates shared
result = {}
for k, v in parsed_data_generator():
if v in existing:
obj = existing[v]
else:
obj = MyClass(v)
existing[v] = obj
result[k] = obj
This technique can be used also when v is not converted to anything special. For example, if v is a string, both key and value in the auxiliary dictionary will be of the same value. However, the existence of the dictionary ensures that the object will be shared (which is not always ensured by Python).

I was able to achieve similar functionality using pandas MultiIndex, although in my case the values are scalars:
>>> import numpy
>>> import pandas
>>> keys = [numpy.array(['a', 'b', 'c']), numpy.array([1, 2, 3])]
>>> df = pandas.DataFrame(['val1', 'val2', 'val3'], index=keys)
>>> df.index.names = ['str', 'int']
>>> df.xs('b', axis=0, level='str')
0
int
2 val2
>>> df.xs(3, axis=0, level='int')
0
str
c val3

I'm surprised no one has mentioned using Tuples with dictionaries. This works just fine:
my_dictionary = {}
my_dictionary[('k1', 'k2', 'k3')] = 'v1'
my_dictionary[('k4')] = 'v2'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

LEFT JOIN dictionaries in python based on value - python

Extracting the conn value for simplicity: conn_data = dict_1['conn'] conn_data['class'] = mapper[conn_data['orig_h']]

Related

Understanding the use of defaultdict in Python [duplicate]

Python: add to a dictionary in a list

How to merge multiple dictionaries

Initializing a dictionary in python with a key value and no corresponding values

Multiple keys per value

Categories

Resources