how to safely append to a dictionary of dictionaries - python

I know there's a similar question:
How to append to a dictionary of dictionaries
but the answers aren't working for me. My problem is as follows. If I need to add a new key: value pair to a Python dictionary, then
my_dict[key] = value
is always safe (as long as key is not a mutable type), whether my_dict had been already initialized or not.
However, if I want my_dict to be a dictionary of dictionaries, then
my_dict[keyA][keyB] = value
doesn't work, unless I already initialized my_dict[keyA] as an empty dictionary. So what I'm doing right now is:
class dict_of_dict():
def __init__(self):
self.ddict = {}
def update(self, keyA, keyB, value):
if not(keyA in self.ddict.keys()):
self.ddict[keyA] = {}
self.ddict[keyA][keyB] = value
a = dict_of_dict()
a.update(0, 3, "foobar")
a.ddict
This works, but I feel like it's overkill. Is there a more compact/Pythonic but still readable solution?

Related

How to reset value of multiple dictionaries elegantly in python

I am working on a code which pulls data from database and based on the different type of tables , store the data in dictionary for further usage.
This code handles around 20-30 different table so there are 20-30 dictionaries and few lists which I have defined as class variables for further usage in code.
for example.
class ImplVars(object):
#dictionary capturing data from Asset-Feed table
general_feed_dict = {}
ports_feed_dict = {}
vulns_feed_dict = {}
app_list = []
...
I want to clear these dictionaries before I add data in it.
Easiest or common way is to use clear() function but this code is repeatable as I will have to write for each dict.
Another option I am exploring is with using dir() function but its returning variable names as string.
Is there any elegant method which will allow me to fetch all these class variables and clear them ?
You can use introspection as you suggest:
for d in filter(dict.__instancecheck__, ImplVars.__dict__.values()):
d.clear()
Or less cryptic, covering lists and dicts:
for obj in ImplVars.__dict__.values():
if isinstance(obj, (list, dict)):
obj.clear()
But I would recommend you choose a bit of a different data structure so you can be more explicit:
class ImplVars(object):
data_dicts = {
"general_feed_dict": {},
"ports_feed_dict": {},
"vulns_feed_dict": {},
}
Now you can explicitly loop over ImplVars.data_dicts.values and still have other class variables that you may not want to clear.
code:
a_dict = {1:2}
b_dict = {2:4}
c_list = [3,6]
vars_copy = vars().copy()
for variable, value in vars_copy.items():
if variable.endswith("_dict"):
vars()[variable] = {}
elif variable.endswith("_list"):
vars()[variable] = []
print(a_dict)
print(b_dict)
print(c_list)
result:
{}
{}
[]
Maybe one of the easier kinds of implementation would be to create a list of dictionaries and lists you want to clear and later make the loop clear them all.
d = [general_feed_dict, ports_feed_dict, vulns_feed_dict, app_list]
for element in d:
element.clear()
You could also use list comprehension for that.

Multi-level defaultdict with variable depth and with list and int type

I am trying to create a multi-level dict with variable depth and with list and int type.
Data structure is like below
A
--B1
-----C1=1
-----C2=[1]
--B2=[3]
D
--E
----F
------G=4
In the case of above data structure, the last value can be an int or list.
If the above data structure has the only int then I can be easily achieved by using the below code:
from collections import defaultdict
f = lambda: defaultdict(f)
d = f()
d['A']['B1']['C1'] = 1
But as the last value has both list and int, it becomes a bit problematic for me.
Now we can insert data in a list using two ways.
d['A']['B1']['C2']= [1]
d['A']['B1']['C2'].append([2])
But when I am using only the append method it is causing the error.
Error is:
AttributeError: 'collections.defaultdict' object has no attribute 'append'
so Is there any way to use only the append method for a list?
There's no way you can use your current defaultdict-based structure to make d['A']['B1']['C2'].append(1) work properly if the 'C2' key doesn't already exist, since the data structure can't tell that the unknown key should correspond to a list rather than another layer of dictionary. It doesn't know what method you're going to call on the value it returns, so it can't know it shouldn't return a dictionary (like it did when it first looked up 'A' and 'B').
This isn't an issue for bare integers, since for those you're as assigning directly to a new key (and all the earlier levels are dictionaries). When you're assigning, the data structure isn't creating the value, you are, so you can use any type you want.
Now, if your keys are distinctive in some way, so that given a key like 'C2' you can know for sure that it should correspond to a list, you may have a chance. You can write your own dict subclass, defining a __missing__ method to handle lookups of keys that don't exist yet in your own special way:
def Tree(dict):
def __missing__(self, key):
if key_corresponds_to_list(key): # magic from somewhere
result = self[key] = []
else:
result = self[key] = Tree()
return result
# you might also want a custom __repr__
Here's an example run with a magic key function that makes any even-length key default to a list, while an odd-length key defaults to a dict:
> def key_corresponds_to_list(key):
return len(key) % 2 == 0
> t = Tree()
> t["A"]["B"]["C2"].append(1) # the default value for C2 is a list because it's even length
> t
{'A': {'B': {'C2': [1]}}}
> t["A"]["B"]["C10"]["D"] = 2 # C10's another layer of dict, since it's length is odd
> t
{'A': {'B': {'C10': {'D': 2}, 'C2': [1]}}} # it didn't matter what length D was though
You probably won't actually want to use a global function to control the class like this, I just did that as an example. If you go with this approach, I'd suggest putting the logic directly into the __missing__ method (or maybe passing a function as a parameter, like defaultdict does with its factory function).

Prevent evaluating default function in dictionary.get or dictionary.setdefault for existing keys

I'd like to keep track of key-value pairs I've processed already in a dictionary (or something else if it's better), where key is some input and value is the return output of some complex function/calculation. The main purpose is to prevent doing the same process over again if I wish to get the value for a key that has been seen before. I've tried using setdefault and get to solve this problem, but the function I call ends up getting executed regardless if the key exists in the dictionary.
Sample code:
def complex_function(some_key):
"""
Complex calculations using some_key
"""
return some_value
# Get my_key's value in my_dict. If my_key has not been seen yet,
# calculate its value and set it to my_dict[my_key]
my_value = my_dict.setdefault(my_key, complex_function(my_key))
complex_function ends up getting carried out regardless if my_key is in my_dict. I've also tried using my_dict.get(my_key, complex_function(my_key)) with the same result. For now, this is my fixed solution:
if my_key not in my_dict:
my_dict[my_key] = complex_function(my_key)
my_value = my_dict[my_key]
Here are my questions. First, is using a dictionary for this purpose the right approach? Second, am I using setdefault correctly? And third, is my current fix a good solution to the problem? (I end up calling my_dict[my_key] twice if my_key doesn't exist)
So I went ahead and took Vincent's suggestion of using a decorator.
Here's what the new fix looks like:
import functools
#functools.lru_cache(maxsize=16)
def complex_function(some_input):
"""
Complex calculations using some_input
"""
return some_value
my_value = complex_function(some_input)
From what I understand so far, lru_cache uses a dictionary to cache the results. The key in this dictionary refers to argument(s) to the decorated function (some_input) and the value refers to the return value of the decorated function (some_value). So, if the function gets called with an argument that's previously been passed before, it would simply return the value referenced in the decorator's dictionary instead of running the function. If the argument hasn't been seen, the function proceeds as normal, and in addition, the decorator creates a new key-value pair in its dictionary.
I set the maxsize to 16 for now as I don't expect some_input to represent more than 10 unique values. One thing to note is that the arguments for the decorated function are required to be non-mutable and hashable, as it uses the arguments as keys for its dictionary.
original_dict = {"a" : "apple", "b" : "banana", "c" : "cat"}
keys = a.keys()
new_dict = {}
For every key that you access now, run the following command :
new_dict[key] = value
To check if you have already accessed a key, run the following code :
#if new_key is not yet accessed
if not new_key in new_dict.keys() :
#read the value of new_key from original_dict and write to new_dict
new_dict[new_key] = original_dict[new_key]
I hope this helps
Your current solution is fine. You are creating slightly more work, but significantly reducing the computational workload when the key is already present.
However, defaultdict is almost what you need here. By modifying it a little bit we can make it work exactly as you want.
from collections import defaultdict
class DefaultKeyDict(defaultdict):
def __missing__(self, key):
if self.default_factory is None:
raise KeyError(key)
self[key] = value = self.default_factory(key)
return value
d = DefaultKeyDict(lambda key: key * 2)
assert d[1] == 2
print(d)

Unique constant reference

Let's take as an example the following code :
ALL = "everything"
my_dict = {"random":"values"}
def get_values(keys):
if keys is None:
return {}
if keys is ALL:
return my_dict
if not hasattr(keys, '__iter__')
keys = [keys]
return {key: my_dict[key] for key in keys}
The function get_values returns a dict with the given key, or keys if the parameter is an iterable, an empty dictionary if the parameter is None or the whole dictionary if the parameter is the constant ALL.
The problem with this happens when you would want to return a key called "everything". Python might use the same reference for ALL and the parameter (since they're both the same immutable), which would make the keys is ALL expression True. The function will therefore return the whole dict, so not the intended behavior.
It would be possible to assign ALL to an instance object of a class defined specifically for that purpose, or to use the type method to generate an object inline, which would make ALL a unique reference. Both solutions seem a little overkill though.
I could also use a flag in the function declaration (i.e. : def get_values(keys, all=False)), but then I can always derive the value of a parameter from the other (if all is True, then keys is None, if keys is not None, then All is not False), so it seems overly verbose.
What is your opinion on the previously mentioned techniques, and do you see other possible ways of fixing this ?
Don't use a value that could be (without extreme effort) a valid key as the sentinel.
ALL = object()
However, it seems much simpler to define the function to take a (possibly empty) sequence of keys.
def get_values(keys=None):
if keys is None:
keys = []
rv = {}
for key in keys:
# Keep in mind, this is a reference to
# an object in my_dict, not a copy. Also,
# you may want to handle keys not found in my_dict:
# ignore them, or set rv[key] to None?
rv[key] = my_dict[key]
return rv
d1 = get_all_values() # Empty dict
d2 = get_all_values([]) # Explicitly empty dict
d3 = get_all_values(["foo", "bar"]) # (Sub)set of values
d4 = get_all_values(my_dict) # A copy of my_dict
In the last case, we take advantage of the fact that get_all_values can take any iterable, and an iterator over a dict iterates over its keys.

Unwanted general class dictionaries updating instead of single class dictionary updating python

I tryed to read something on the topic but I cannot figure out a possible solution.
I have a dictionary of this type:
class flux(object):
def __init__(self, count_flux=0, ip_c_dict=defaultdict(int), ip_s_dict=defaultdict(int), conn_dict=defaultdict(int)):
self.count_flux = count_flux
self.ip_c_dict = ip_c_dict if ip_c_dict is not None else {}
self.ip_s_dict = ip_s_dict if ip_s_dict is not None else {}
self.conn_dict = conn_dict if conn_dict is not None else {}
Every time I try to update the dictionary in this way:
dictionary[key].ip_c_dict[some_string]+=1
not only the dictionary of the current key is updated, but also all the others. And of course it happens with all the three dictionary in the class, ip_c_dict=defaultdict(int), ip_s_dict=defaultdict(int), conn_dict=defaultdict(int).
How can I fix it?
I said in that answer that you shouldn't put dicts in the default arguments, because then the dicts end up shared between all the instances. The defaultdict(int) in the default argument is evaluated only once (when the method is first created) and then all the times the method is called use that same dict as the default.
So put back ip_c_dict=None in the argument list, and below put
self.ip_c_dict = ip_c_dict if ip_c_dict is not None else defaultdict(int)
That way a new defaultdict(int) is created each time, if the ip_c_dict argument is None.

Categories